MongoDB

Document-oriented NoSQL database

Non-relational databases are the most feature-rich and resemble relational databases the most

First, the selection of prerequisite conditions

Mongo is a good choice if two of the criteria are met

○ If three conditions are met, choose Mongo

  • Choose a relational database without transaction, complex join-strong transaction scenarios

  • The data model is inconclusive

  • More than 2000 read and write QPS are required

  • TB and PB data stores are required

  • You need to be able to scale horizontally quickly

    ○ Horizontal scaling: Increases the number of servers to improve system performance

    ○ Vertical expansion: Improves the processing capability of a single machine

  • Requires 99.999% high availability

  • Lots of location, text lookup

Two, applicable scenarios

○ Suitable for one-to-many scenarios in relational databases

○ Suitable for large-size, low-value BSON data storage

  • Game scene – player equipment, points
  • Logistics scenario – Logistics order status
  • Social scene – nearby people, places
  • Iot scenario – Intelligent device log information
  • Live scene. – Anchor gift

3. Explanation of nouns

  • NoSQL: Not Only SQL – Structured storage is Not required

    SQL is the structured lookup language

  • BSON : Binary JSON

    • Json-like binary storage format
    • There are some data types that JSON does not have
    • Advantages – High flexibility
    • Disadvantages – Poor space utilization – redundant information such as duplicate key names

    ○ JSON stores data as a string

    ○ BSON stores data in binary mode

4. Comparison of concepts

RDBMS MongoDB
Database database Database database
Table table The collection collection
The row row The document BSON document
The column column The field field
The index index The index index
Join Primary foreign key association Embedded Document nested documents
Primary key Specifies the primary key for 1 to N columns Primary key Specifies the _ID field as the primary key

5. BSON type

[Insert data]

○ db. Set name. Insert (document) – If the primary key for inserting data already exists, an exception is reported

○ db. Set name. InsertOne – Data is updated if the primary key exists, and is inserted if the primary key does not exist

  • String: String – {key: “CBA”}

  • Integer: Integer – {key: 1}

  • Boolean : 布尔值 – { key : true }

  • Double: Double – {key: 3.14}

  • ObjectId: ObjectId – {_id: new ObjectId()}

    By the time stamp, machine code, process ID, random number composition

  • Array: Array – {arr: [“a”, “b”]}

  • Timestamp: Timestamp – {ts: new Timestamp()}

  • Object: embedded document – {o: {foo: “bar”}}

  • Null: Null – {key: Null}

  • Date, ISODate: GMT – {birth: new Date()}

  • – {x: function(){}}

  • File: document-fs. files, fs.chunks

Six, condition search

Db.set name.find({condition}).sort({sort: sort}).skip(number of rows skipped).limit(number of rows displayed on a page)

operation Conditional formatting sample Semantics in RDBMS
Is equal to the { key : value } Db. Collection name. Find ({field name: value}).pretty() Where field name = value
Is greater than { key : { $gt : value } } Find ({$gt: value}}).pretty() Where field name > value
Less than { key : { $lt : value } } Find ({$lt: value}}).pretty() Where field name < value
Greater than or equal to { key : { $gte : value } } Find ({$gte: value}}).pretty() Where field name >= value
Less than or equal to { key : { $lte : value } } Find ({$lte: value}}).pretty() Where field name <= value
Is not equal to { key : { $ne : value } } Find ({$ne: value}}).pretty() Where field name! = value
and { key1 : value1, key2 : value2 } Find ({key1: value1, key2: value2}).pretty() Where username 1 = username 1 and username 2 = username 2
or { $or : [ { key1 : value1 }, { key2 : value2 } ] Find ({$or: [{key1: value1}, {key2: value2}]}).pretty() Where username 1 = value 1 or username 2 = value 2
not { key : { not : { Operator: value}} Db. Set name. Find ({key: {not : { Operator: value}}).pretty() Where not field name $operator value

7. Data update

Db. Set name. Update (<To find the condition>.<Update the way>,
  {
    upsert: <boolean>,
    multi: <boolean>,
    writeConcern: <document>})Copy the code

1. Update mode

  • $set: Sets the field value
  • $unset: Deletes a field
  • $inc: Add or subtract the value of the modified field – can be positive or negative

2. The attribute

  • Upsert: Whether to insert if there is no record to update – [default] false
  • Multi: updates only the first record found – [default] false
  • WriteConcern: reliability attribute
    • w
      • [Default] 1: confirm all copies
      • 0: no copy confirmation is required
      • Majority: confirms the majority of copies
    • j
      • True: An ACK is returned only when the operation is written to the hard disk
      • False: The operation returns an ACK without writing to the disk
    • Wtimeout: indicates the operation time limit
Db. Col. Update ({$inc: {expectSalary:3000 } }, { upsert : ture } )
Copy the code

8. Data deletion

Db. Set name. Remove (<To find the condition>,
	{
    	justOne: <boolean>,
    	writeConcern: <document>})Copy the code

1. The attribute

  • JustOne: Whether to delete only one document – [default] false
  • WriteConcern: reliability attribute
Db.col. remove({name: "zhang SAN"})Copy the code

9. Aggregation operation

1. Single-destination aggregation

  • Count () : indicates the number of statistics
  • Distinct () : Removes duplicates and returns attribute values
Db. collection name. find({}).count(a)Copy the code

2. Polymerization pipe

1. The operation

  • $group: documents are grouped
    • Used for statistics of different types of data
  • $project: Modifies the output document structure
    • rename
    • Add or delete domains
    • Create computed results
    • Nested document
  • $match: filters the data and outputs only the documents that meet the conditions
  • $limit: Limits the number of documents returned by the aggregation pipe
  • $skip: Skip the specified number of documents and return the remaining documents
  • $sort: outputs sorted documents
  • $geoNear: Outputs an ordered document close to a geographic location

2. The expression

  • $sum: indicates the calculation sum
  • $AVg: Calculate the average value
  • $min: Gets the minimum value for all documents in the collection
  • $Max: Gets the maximum value of all documents in the collection
  • $push: Inserts the value of the resulting document into an array
  • $addToSet: Inserts the value of the resulting document into an array without duplicating the data
  • $FIRST: Gets the first document data
  • $last: Gets the last document data
db.col.aggregate([ 
  {$group : {_id: "$city"/* Group by city field */, city_count /* Aggregate result field name */ : { $sum /* Aggregation mode */ : 1 } } } 
])

db.col.aggregate([
  {$group : {_id: "$city", avgSal:{$avg:"$expectSalary"}}},
  {$project : {city: "$city", salary : "$avgSal" }} /* change avgSal to salary*/
])

db.col.aggregate([
  {$group:{_id: "$city",count:{$sum : 1}}},
  {$match:{count:{$gt:1}}} /* Display statistics only when the number of statistics is greater than 1 */
])
Copy the code

3. MapReduce

○ Run aggregation logic on multiple Servers in parallel

○ If an aggregation operation consumes more than 20% of memory, it is aborted with an error message

db.col.mapReduce(
  function() {emit(key,value); },//The Map methodfunction(key,values) {return reduceFunction}, //The Reduce method {out: collection,
    query: document,
    sort: document,
    limit: number,
    finalize: <function>,
    verbose: <boolean>
  }
)

db.col.mapReduce(
  function() { emit(this.city,this.expectSalary); },
  function(key, value) {return Array.avg(value)},
  {
    query:{expectSalary:{$gt: 15000}},
    out:"cityAvgSal"
  }
)
Copy the code
  • Map method: Javasript method that converts one input document into zero or more output documents as parameters to Reduce method

  • Reduce method: Javasript method that merges the output of the Map method – the same key goes to the same Reduce method

  • Out: indicates the collection for storing statistical results

  • Query: The Map function will be called only if the document meets the criteria

  • Sort: Sorts documents sent to the Map function

  • Limit: limits the number of packets sent to the Map function

    Sort and limit must be used together to make sense

  • Finalize: Modify the output result of Reduce

  • Verbose: indicates whether to contain the time information in the result information. – [default] false

Index type

○ [Purpose] Improve search efficiency

○ [Default] Create a unique index for the _ID

○ [Underlying implementation] B tree

○ [Sorting Method] 1: ascending, -1: descending

  • A single bond index
    • An index of a single field
    • Db.createindex ({” createIndex “: sort})
  • Overdue index
    • Documents are automatically deleted after a certain amount of time
    • Fields must be of date type
    • CreateIndex ({” date “: sort}, {expireAfterSeconds: number of seconds})
  • The composite index
    • Create indexes on multiple fields
    • Note the index sequence and sorting method
    • Db.createindex ({” createIndex “: sort,” createIndex “: sort})
  • Many key index
    • Create an index for each element in the array
    • There can be only one multi-key index in a composite index
    • The creation method is the same as that of a common index
    • Db.createindex ({” createIndex “: sort})
  • Geospatial index
    • Create indexes against geospatial coordinates
    • 2dsphere: points on a sphere
    • 2D: points on a plane
    • EnsureIndex ({” field name “: “2dsphere”})
  • The full text indexing
    • A collection can have at most one full-text index
    • Chinese word segmentation is not ideal – use ES instead
    • Db.createindex ({” createIndex “: “text”})
  • The hash index
    • Only equivalence lookup is supported, not range lookup
    • Db. set name. createIndex({” field “: “hashed”})

11. Data model

  • embedded

    • Documents nested documents

    • Applicable scenario

      • Documents have one-to-one and one-to-many relationships
      • Data that is often read together
      • Data with map-reduce aggregation requirements – Map-reduce can operate only a single set
      {
          "_id":"ObjectId("xxxxx")"."name":"Zhang san"."classes":[
              {
                  "class":"Math"."credits":"5"."room":"204"
              },
              {
                  "class":"English"."credits":"5"."room":"305"}}]Copy the code
  • reference

    • A document stores references to another document

    • Too much data duplication occurs when embedding

    • The documents have a many-to-many relationship

      {
          "_id":"ObjectId("xxxxx")"."name":"Zhang san"."classes":[
              {
                  "_id":"ObjectId("xxx")"."class":"Math"
              },
              {
                  "_id":"ObjectId("xxx")"."class":"English"}}]Copy the code

High availability

1. The role

  • Primary master node

    • Responsible for read and write operations

    • Add, delete, and modify operations to Oplog – Oplog is idempotent

      Idempotent: The result of multiple runs is the same as that of only one run

  • Secondary from node

    • Read operations

    • Synchronize data with Primary by copying oplog

  • ArbiterOnly Arbitration node

    • Responsible for voting Primary
    • Data cannot be added, deleted, or changed
    • Cannot become Primary

2. Synchronization type

  • Initial synchronization
    • Synchronize data from the Primary database
    • Trigger the situation
      • Secondary joins for the first time
      • The amount of data left behind by Secondary exceeds the size of oplog
  • Keep Replication synchronization
    • Synchronizes data incrementally from Primary

3. Heartbeat detection

  • Ping packets are sent to other nodes every two seconds. If no response is received within 10 seconds, the node is marked as inaccessible
  • Each node maintains a state mapping table, recording the roles of other nodes, log timestamps… Information such as
  • The Primary finds itself unable to communicate with most nodes and drops itself to Secondary

4. Main section click lift

  • trigger
    • Secondary finds that the weight is higher than that of Primary and initiates a replacement election
    • Secondary initiates an election when it finds no Primary in the cluster
    • Primary Demotes the node when most other nodes cannot be accessed
  • The election process
    1. Check if you qualify as a candidate, if you do, be the initiator and do FreshnessCheck
    2. The initiator sends an Elect request to the surviving node
    3. The arbiter runs a legitimacy check upon receiving the request, and if the check passes, the arbiter votes for the originator
    4. If the initiator receives more than half of the votes, it becomes a Primary

5. Shard

1. The role

  • Shard Server
    • Responsible for storing data
    • Consists of one or more Mongod processes, each of which holds the same data shard
    • Shard Servers that store different data fragments can be combined into a Shards Server cluster
  • Router Server
    • Cluster entry, which forwards requests to the corresponding Shard Server
  • Config Server
    • Store database routing and fragment configuration

Concept 2.

  • Shard Key Indicates the fragment primary Key
    • Used to determine which Chunk of data should be stored
  • The Chunk Chunk
    • Part of the data in the Shard Server
    • A Shard Server consists of multiple chunks
    • Based on the range of left closed and right open intervals

3. Sharding policy

  • The scope of fragmentation

    • The data is split based on the value of the Shard Key, and each Chunk is allocated to a range
    • Fit range search
    • Disadvantages: When the Shard Key is increasing or decreasing (for example, the ID timestamp is automatically generated), the added data will be allocated to the same Chuck, and the Chunk is under heavy write pressure
  • Hash shard

    • Distribute documents randomly among chunks to expand write capacity

    • Disadvantages: Not efficient range lookup