MongoDB

Document-oriented NoSQL database

Non-relational databases are the most feature-rich and resemble relational databases the most

First, the selection of prerequisite conditions

Mongo is a good choice if two of the criteria are met

○ If three conditions are met, choose Mongo

Choose a relational database without transaction, complex join-strong transaction scenarios
The data model is inconclusive
More than 2000 read and write QPS are required
TB and PB data stores are required
You need to be able to scale horizontally quickly

○ Horizontal scaling: Increases the number of servers to improve system performance

○ Vertical expansion: Improves the processing capability of a single machine
Requires 99.999% high availability
Lots of location, text lookup

Two, applicable scenarios

○ Suitable for one-to-many scenarios in relational databases

○ Suitable for large-size, low-value BSON data storage

Game scene – player equipment, points
Logistics scenario – Logistics order status
Social scene – nearby people, places
Iot scenario – Intelligent device log information
Live scene. – Anchor gift

3. Explanation of nouns

NoSQL: Not Only SQL – Structured storage is Not required

SQL is the structured lookup language
BSON : Binary JSON
- Json-like binary storage format
- There are some data types that JSON does not have
- Advantages – High flexibility
- Disadvantages – Poor space utilization – redundant information such as duplicate key names
○ JSON stores data as a string

○ BSON stores data in binary mode

4. Comparison of concepts

RDBMS	MongoDB
Database database	Database database
Table table	The collection collection
The row row	The document BSON document
The column column	The field field
The index index	The index index
Join Primary foreign key association	Embedded Document nested documents
Primary key Specifies the primary key for 1 to N columns	Primary key Specifies the _ID field as the primary key

5. BSON type

[Insert data]

○ db. Set name. Insert (document) – If the primary key for inserting data already exists, an exception is reported

○ db. Set name. InsertOne – Data is updated if the primary key exists, and is inserted if the primary key does not exist

String: String – {key: “CBA”}
Integer: Integer – {key: 1}
Boolean : 布尔值 – { key : true }
Double: Double – {key: 3.14}
ObjectId: ObjectId – {_id: new ObjectId()}

By the time stamp, machine code, process ID, random number composition
Array: Array – {arr: [“a”, “b”]}
Timestamp: Timestamp – {ts: new Timestamp()}
Object: embedded document – {o: {foo: “bar”}}
Null: Null – {key: Null}
Date, ISODate: GMT – {birth: new Date()}
– {x: function(){}}
File: document-fs. files, fs.chunks

Six, condition search

Db.set name.find({condition}).sort({sort: sort}).skip(number of rows skipped).limit(number of rows displayed on a page)

operation	Conditional formatting	sample	Semantics in RDBMS
Is equal to the	{ key : value }	Db. Collection name. Find ({field name: value}).pretty()	Where field name = value
Is greater than	{ key : { $gt : value } }	Find ({$gt: value}}).pretty()	Where field name > value
Less than	{ key : { $lt : value } }	Find ({$lt: value}}).pretty()	Where field name < value
Greater than or equal to	{ key : { $gte : value } }	Find ({$gte: value}}).pretty()	Where field name >= value
Less than or equal to	{ key : { $lte : value } }	Find ({$lte: value}}).pretty()	Where field name <= value
Is not equal to	{ key : { $ne : value } }	Find ({$ne: value}}).pretty()	Where field name! = value
and	{ key1 : value1, key2 : value2 }	Find ({key1: value1, key2: value2}).pretty()	Where username 1 = username 1 and username 2 = username 2
or	{ $or : [ { key1 : value1 }, { key2 : value2 } ]	Find ({$or: [{key1: value1}, {key2: value2}]}).pretty()	Where username 1 = value 1 or username 2 = value 2
not	{ key : { $not : {$ Operator: value}}	Db. Set name. Find ({key: { $not : {$ Operator: value}}).pretty()	Where not field name $operator value

7. Data update

Db. Set name. Update (<To find the condition>.<Update the way>,
  {
    upsert: <boolean>,
    multi: <boolean>,
    writeConcern: <document>})Copy the code

1. Update mode

$set: Sets the field value
$unset: Deletes a field
$inc: Add or subtract the value of the modified field – can be positive or negative

2. The attribute

Upsert: Whether to insert if there is no record to update – [default] false
Multi: updates only the first record found – [default] false
WriteConcern: reliability attribute
- w
  - [Default] 1: confirm all copies
  - 0: no copy confirmation is required
  - Majority: confirms the majority of copies
- j
  - True: An ACK is returned only when the operation is written to the hard disk
  - False: The operation returns an ACK without writing to the disk
- Wtimeout: indicates the operation time limit

Db. Col. Update ({$inc: {expectSalary:3000 } }, { upsert : ture } )
Copy the code

8. Data deletion

Db. Set name. Remove (<To find the condition>,
	{
    	justOne: <boolean>,
    	writeConcern: <document>})Copy the code

1. The attribute

JustOne: Whether to delete only one document – [default] false
WriteConcern: reliability attribute

Db.col. remove({name: "zhang SAN"})Copy the code

9. Aggregation operation

1. Single-destination aggregation

Count () : indicates the number of statistics
Distinct () : Removes duplicates and returns attribute values

Db. collection name. find({}).count(a)Copy the code

2. Polymerization pipe

1. The operation

$group: documents are grouped
- Used for statistics of different types of data
$project: Modifies the output document structure
- rename
- Add or delete domains
- Create computed results
- Nested document
$match: filters the data and outputs only the documents that meet the conditions
$limit: Limits the number of documents returned by the aggregation pipe
$skip: Skip the specified number of documents and return the remaining documents
$sort: outputs sorted documents
$geoNear: Outputs an ordered document close to a geographic location

2. The expression

$sum: indicates the calculation sum
$AVg: Calculate the average value
$min: Gets the minimum value for all documents in the collection
$Max: Gets the maximum value of all documents in the collection
$push: Inserts the value of the resulting document into an array
$addToSet: Inserts the value of the resulting document into an array without duplicating the data
$FIRST: Gets the first document data
$last: Gets the last document data

db.col.aggregate([ 
  {$group : {_id: "$city"/* Group by city field */, city_count /* Aggregate result field name */ : { $sum /* Aggregation mode */ : 1 } } } 
])

db.col.aggregate([
  {$group : {_id: "$city", avgSal:{$avg:"$expectSalary"}}},
  {$project : {city: "$city", salary : "$avgSal" }} /* change avgSal to salary*/
])

db.col.aggregate([
  {$group:{_id: "$city",count:{$sum : 1}}},
  {$match:{count:{$gt:1}}} /* Display statistics only when the number of statistics is greater than 1 */
])
Copy the code

3. MapReduce

○ Run aggregation logic on multiple Servers in parallel

○ If an aggregation operation consumes more than 20% of memory, it is aborted with an error message

db.col.mapReduce(
  function() {emit(key,value); },//The Map methodfunction(key,values) {return reduceFunction}, //The Reduce method {out: collection,
    query: document,
    sort: document,
    limit: number,
    finalize: <function>,
    verbose: <boolean>
  }
)

db.col.mapReduce(
  function() { emit(this.city,this.expectSalary); },
  function(key, value) {return Array.avg(value)},
  {
    query:{expectSalary:{$gt: 15000}},
    out:"cityAvgSal"
  }
)
Copy the code

Map method: Javasript method that converts one input document into zero or more output documents as parameters to Reduce method
Reduce method: Javasript method that merges the output of the Map method – the same key goes to the same Reduce method
Out: indicates the collection for storing statistical results
Query: The Map function will be called only if the document meets the criteria
Sort: Sorts documents sent to the Map function
Limit: limits the number of packets sent to the Map function

Sort and limit must be used together to make sense
Finalize: Modify the output result of Reduce
Verbose: indicates whether to contain the time information in the result information. – [default] false

Index type

○ [Purpose] Improve search efficiency

○ [Default] Create a unique index for the _ID

○ [Underlying implementation] B tree

○ [Sorting Method] 1: ascending, -1: descending

A single bond index
- An index of a single field
- Db.createindex ({” createIndex “: sort})
Overdue index
- Documents are automatically deleted after a certain amount of time
- Fields must be of date type
- CreateIndex ({” date “: sort}, {expireAfterSeconds: number of seconds})
The composite index
- Create indexes on multiple fields
- Note the index sequence and sorting method
- Db.createindex ({” createIndex “: sort,” createIndex “: sort})
Many key index
- Create an index for each element in the array
- There can be only one multi-key index in a composite index
- The creation method is the same as that of a common index
- Db.createindex ({” createIndex “: sort})
Geospatial index
- Create indexes against geospatial coordinates
- 2dsphere: points on a sphere
- 2D: points on a plane
- EnsureIndex ({” field name “: “2dsphere”})
The full text indexing
- A collection can have at most one full-text index
- Chinese word segmentation is not ideal – use ES instead
- Db.createindex ({” createIndex “: “text”})
The hash index
- Only equivalence lookup is supported, not range lookup
- Db. set name. createIndex({” field “: “hashed”})

11. Data model

embedded

Documents nested documents

Applicable scenario

Documents have one-to-one and one-to-many relationships
Data that is often read together
Data with map-reduce aggregation requirements – Map-reduce can operate only a single set

{
    "_id":"ObjectId("xxxxx")"."name":"Zhang san"."classes":[
        {
            "class":"Math"."credits":"5"."room":"204"
        },
        {
            "class":"English"."credits":"5"."room":"305"}}]Copy the code

reference

A document stores references to another document
Too much data duplication occurs when embedding

The documents have a many-to-many relationship

{
    "_id":"ObjectId("xxxxx")"."name":"Zhang san"."classes":[
        {
            "_id":"ObjectId("xxx")"."class":"Math"
        },
        {
            "_id":"ObjectId("xxx")"."class":"English"}}]Copy the code

High availability

1. The role

Primary master node
- Responsible for read and write operations
- Add, delete, and modify operations to Oplog – Oplog is idempotent
  
  Idempotent: The result of multiple runs is the same as that of only one run
Secondary from node
- Read operations
- Synchronize data with Primary by copying oplog
ArbiterOnly Arbitration node
- Responsible for voting Primary
- Data cannot be added, deleted, or changed
- Cannot become Primary

2. Synchronization type

Initial synchronization
- Synchronize data from the Primary database
- Trigger the situation
  - Secondary joins for the first time
  - The amount of data left behind by Secondary exceeds the size of oplog
Keep Replication synchronization
- Synchronizes data incrementally from Primary

3. Heartbeat detection

Ping packets are sent to other nodes every two seconds. If no response is received within 10 seconds, the node is marked as inaccessible
Each node maintains a state mapping table, recording the roles of other nodes, log timestamps… Information such as
The Primary finds itself unable to communicate with most nodes and drops itself to Secondary

4. Main section click lift

trigger
- Secondary finds that the weight is higher than that of Primary and initiates a replacement election
- Secondary initiates an election when it finds no Primary in the cluster
- Primary Demotes the node when most other nodes cannot be accessed
The election process
1. Check if you qualify as a candidate, if you do, be the initiator and do FreshnessCheck
2. The initiator sends an Elect request to the surviving node
3. The arbiter runs a legitimacy check upon receiving the request, and if the check passes, the arbiter votes for the originator
4. If the initiator receives more than half of the votes, it becomes a Primary

5. Shard

1. The role

Shard Server
- Responsible for storing data
- Consists of one or more Mongod processes, each of which holds the same data shard
- Shard Servers that store different data fragments can be combined into a Shards Server cluster
Router Server
- Cluster entry, which forwards requests to the corresponding Shard Server
Config Server
- Store database routing and fragment configuration

Concept 2.

Shard Key Indicates the fragment primary Key
- Used to determine which Chunk of data should be stored
The Chunk Chunk
- Part of the data in the Shard Server
- A Shard Server consists of multiple chunks
- Based on the range of left closed and right open intervals

3. Sharding policy

The scope of fragmentation
- The data is split based on the value of the Shard Key, and each Chunk is allocated to a range
- Fit range search
- Disadvantages: When the Shard Key is increasing or decreasing (for example, the ID timestamp is automatically generated), the added data will be allocated to the same Chuck, and the Chunk is under heavy write pressure
Hash shard
- Distribute documents randomly among chunks to expand write capacity
- Disadvantages: Not efficient range lookup

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

MongoDB Knowledge Brain Map – document-oriented NoSQL database

MongoDB

First, the selection of prerequisite conditions

Two, applicable scenarios

3. Explanation of nouns

4. Comparison of concepts

5. BSON type

Six, condition search

7. Data update

1. Update mode

2. The attribute

8. Data deletion

1. The attribute

9. Aggregation operation

1. Single-destination aggregation

2. Polymerization pipe

1. The operation

2. The expression

3. MapReduce

Index type

11. Data model

High availability

1. The role

2. Synchronization type

3. Heartbeat detection

4. Main section click lift

5. Shard

1. The role

Concept 2.

3. Sharding policy

MongoDB Knowledge Brain Map – document-oriented NoSQL database

MongoDB

First, the selection of prerequisite conditions

Two, applicable scenarios

3. Explanation of nouns

4. Comparison of concepts

5. BSON type

Six, condition search

7. Data update

1. Update mode

2. The attribute

8. Data deletion

1. The attribute

9. Aggregation operation

1. Single-destination aggregation

2. Polymerization pipe

1. The operation

2. The expression

3. MapReduce

Index type

11. Data model

High availability

1. The role

2. Synchronization type

3. Heartbeat detection

4. Main section click lift

5. Shard

1. The role

Concept 2.

3. Sharding policy

Related Posts

Sources of complexity for solution design

Netty Source Code Analysis: Stripping the Reactor Thread

Git (1)