MongoDB interview questions

The article directories

1. What is MongoDB?

MongoDB is an open source database system based on distributed file storage written in C++ language. Adding more nodes can ensure server performance under high load conditions. MongoDB aims to provide scalable high-performance data storage solutions for WEB applications. Data is stored as a document, and the data structure is composed of key=>value pairs. MongoDB documents are similar to JSON objects. Field values can contain other documents, arrays, and document arrays.Copy the code

2. What are the features of MongoDB?

(1) MongoDB is a document storage-oriented database, which is relatively simple and easy to operate; (2) Index of any attribute can be set in MongoDB record; (3) Data mirroring can be created locally or on the network, which makes MongoDB more scalable. (4) If the load increases (requiring more storage space and stronger processing power), it can be distributed to other nodes in the computer network, which is called sharding. (5) Support rich query expression, query instructions using JSON form of markup, you can easily query the object and array embedded in the document.Copy the code

3. What are the basic differences between MySQL and MongoDB?

MySQL and MongoDB are both free and open source databases. There are many basic differences between MySQL and MongoDB including data presentation queries, relationships, transactions, schema design and definition, standardization, speed and performance. By comparing MySQL and MongoDB, we are actually comparing relational and non-relational databases, i.e. the data store structure is different.Copy the code

4. What does sharding in Monogodb mean?

Sharding is the process of horizontally dividing data into different physical nodes. As the application data grows larger, so does the amount of data. As the volume of data grows, a single machine may not be able to store the data or have acceptable read and write throughput. Sharding technology can be used to add more machines to cope with the increased volume of data and read and write operations.Copy the code

5. What do namespaces in MongoDB mean?

Mongodb stores Bson objects. In a collection, the database name and the collection name are connected by a period, which is called a namespace. A collection namespace has multiple data fields (extents). The location of the first and last data fields of the collection, and so on. And a data domain is composed of several documents (document), each data domain has a head, record the first document and the last document information, as well as some metadata of the data domain. Between extents, documents are connected through a bidirectional linked list. The storage data structure of the index is a B-tree, and the index namespace stores Pointers to the root node of the B-tree.Copy the code

6. What are indexes in MongoDb?

An index is used to execute queries efficiently. MongoDB without an index scans all documents in the entire collection, which is inefficient and requires a large amount of data to be processed. An index is a special data structure that stores a small piece of data in a form that is easy to traverse. An index can store the values of a particular field or set of fields and sort the field values in a manner specified by the index.Copy the code

7. What makes MongoDB the best NoSQL database?

The following characteristics make MongoDB the best NoSQL database: file-oriented; High performance; High availability; Scalability; Rich query language.Copy the code

Explain what is GridFS in MongoDB?

To store and retrieve large files, such as images, video files, and audio files, use GridFS. By default, it uses two files, fs.files and fs.chunks, to store the metadata and chunks of the file.Copy the code

9. What is the role of profilers in MongoDB?

MongoDB includes a database analyzer that shows the performance characteristics of each operation in the database. With this analyzer you can find queries (or writes) that are slower than expected; You can use this information to determine, for example, whether you need to add an index.Copy the code

10. Do MongoDB updates fsync to disk immediately?

No, disk writes are deferred by default. Write operations may reach disk after two or three seconds (60 seconds by default). For example, if the database receives a thousand incrementing operations on an object in a second, the disk is flushed only once. (Note that although the fsync option is valid on the command line and through getLastError_old)Copy the code

What are the election criteria for MongoDB replica sets?

1. Initialize the replication set. 2. The active node is down. 3. The primary node is removed from the replica set (possible network cause). 4. The number of nodes participating in the election must be more than half of the total number of nodes in the replica set. If the number is less than half, all nodes remain read-only.Copy the code

12. Briefly describe the MongoDB election process

1. The primary node election in the replica set must meet the principle of "majority", the so-called "majority" means more than half of the members in the replica. A member in a replica can become a master node only if the majority of members vote for it. For example, if there are N member nodes in a replica set, N/2+1 members must vote for a node to become the master node. Note: If a member node in a replica is unavailable, it does not affect the "majority" of the replica set, which is calculated by the configuration of the replica set. 2, arbitration node (Arbiter) it does not save data, and can not be elected as the main node, but has voting rights. Arbiter nodes use the smallest resources and cannot be deployed in the same data set node. 3. It is better to have an odd number of member nodes in the replica set. If there is an even number of nodes, it is better to add a quorum node. If there are an even number of member nodes in the replica set, as shown in Figure 2, IDC1 network is disconnected from IDC2, and the member nodes in IDC1 and IDC2 will elect the primary node respectively. However, the primary node cannot be elected because the primary node cannot meet the majority principle. Once a quorum node is added, the replica set satisfies most of the rules and the master node is selected from it. 4. If the replica set has an odd number of member nodes, no arbitrator is needed. However, if you force arbiters when the member nodes are odd, the election will take longer. The addition of an arbiter makes it possible for two member nodes to have the same vote, leading to longer elections. 5. If a member node in the replica is voted against in a round of voting, this round cannot be elected as the primary node. For example, in A replica set of 10 member nodes, in A voting round, member node A was voted against by A member node due to large data delay, then A received 9 votes for it, but A still could not be selected as the main node. 6. The node whose priority is 0 in the cluster cannot become the primary node and cannot trigger an election, but has the right to vote and has the same data set as the primary node.Copy the code

13. What are MongoDB sharding clusters?

Sharding Cluster is a mode that can be expanded horizontally. It is especially useful when there is a large amount of data. Practical large-scale applications generally adopt this architecture to build. Sharding sharding solves the limitation of hardware resources such as disk space, memory and CPU of a single server, separates data horizontally and reduces the access pressure of a single node. Each shard is an independent database, and all shards are combined to form a logically complete database. Therefore, sharding reduces the amount of data operation and data storage for each shard, enabling multiple servers to cope with the increasing load and data.Copy the code

14. Why is horizontal sharding needed in MongoDB?

1) Reduce the number of single machine requests to increase the total load of single machine. 2) Reduce the storage space of single machine to increase the total storage spaceCopy the code

15. What is the significance of sharding keys in MongoDB?

1. A good slice key is essential for sharding. The slice key must be an index, which is created automatically with sh.shardCollection plus. A self-increasing slice key pair write and uniform distribution of data is not very good, because self-increasing slice keys will always be written to one shard and may be written to other shards when a certain threshold is reached. But querying by slice key can be very efficient. Random slice keys are good for uniform distribution of data. Avoid queries on multiple shards. 2. Mongos will merge and sort the results to improve the query efficiency and speedCopy the code

16. When do you need MongoDB sharding?

1) The machine is running out of disks. Use sharding to solve disk space problems. 2) A single Mongod can no longer meet the performance requirements of writing data. Sharding spreads write pressure across shards, using the shard server's own resources. 3) Want to put a lot of data into memory to improve performance. As above, the shard server's own resources are used by shard.Copy the code

17, What roles are needed to build a shard cluster? What are they?

1) Mongod instance of Shard Server, used to store actual data blocks. In actual production environment, a Shard Server role can be assumed by several machines in a Relica set to prevent single point of failure of the host This is a separate normal Mongod process that stores data information. It can be a replica set or a single server. 2) Config Server Mongod instance, which stores the entire Cluster Metadata, including chunk information. This is a separate Mongod process that holds metadata about clusters and shards, that is, information about what data each shard contains. Start building first and enable logging. Start the configuration server as normal Mongod, specifying the ConfigSvr option. It doesn't require much space or resources; 1KB of configuration server space is equivalent to 200MB of real data. All that is kept is a table of distribution of the data. 3) Route Server mongos instance, front-end Route, from which the client access, and make the whole cluster look like a single database, front-end application plays a Route function, for the program to connect. When starting mongos, you need to know the address of the configuration server. Specify the configDB option.Copy the code

How to perform transaction/lock?

MongoDB does not use traditional locks or complex roll-back transactions because it is designed to be lightweight, fast, and predictably high performance. It can be likened to the automatic commit mode of MySQL MylSAM. By streamlining transaction support, performance is improved, especially in a system that may traverse multiple servers.Copy the code

19. What are the characters in the replica set? What is doing?

1. The Primary node receives all write requests and synchronizes changes to all Secondary nodes. A Replica Set can only have one Primary node. When the Primary fails, other Secondary or Arbiter nodes will re-elect a Primary node. The default read request is also sent to the Primary node for processing. The client connection configuration can be modified to support reading from the Secondary node. 2. The replica node (Secondary) retains the same data set as the primary node. When the primary node fails, participate in the primary election. 3. Arbiters do not retain data, do not participate in the selection of the main, only vote. The use of Arbiter can reduce the data storage hardware needs, Arbiter almost no big hardware resource needs, but the important point is that in the production environment it and other data nodes do not deploy on the same machine.Copy the code

What are the types of non-relational databases?

1, key-value storage Eg:Amazon S3 2, charts Eg:Neo4J 3, document storage Eg:MongoDB 4, column-based storage Eg:CassandraCopy the code

21. When will data be spread across multiple shards?

MongoDB sharding is range based. So all the objects in a collection are stored in a chunk. The option to fragment data is available only if there is more than one block. Right now, each default block size is 64Mb, so you need at least 64Mb space to perform a migration.Copy the code

22. What if I launch a query when a shard is stopped or slow?

If a shard stops, the query will return an error unless the Partial option is set. If a shard is slow to respond, MongoDB waits for its response.Copy the code

23, How to understand the GridFS mechanism in MongoDB, why MongoDB use GridFS to store files?

GridFS is a file specification for storing large files in MongoDB. GridFS allows large files to be split into smaller documents, which allows us to store large documents efficiently and eliminates the limitations of BSON objects.Copy the code

24. Should I start a sharded or non-sharded MongoDB environment?

For ease of development, we recommend starting a MongoDB environment unsharded, unless one server is not enough to hold your initial data set. Upgrading from non-clustered sharding to clustered sharding is seamless, so there is no need to consider sharding when your data set is not yet large.Copy the code

25. What scenarios is MongoDB suitable for?

From the current users of Aliyun MongoDB cloud database, MongoDB applications have penetrated into various fields, such as games, logistics, e-commerce, content management, social networking, Internet of things, live video, etc. The following are several practical application cases. For game scenarios, MongoDB is used to store game user information, user equipment, points and other directly stored in the form of embedded documents, which is convenient to query and update logistics scenarios. MongoDB is used to store order information, order status will be constantly updated in the delivery process, and stored in the form of MongoDB embedded array. All changes to an order can be read in a single query. In social networking scenarios, MongoDB is used to store user information and the information published by users in the circle of friends. In The Internet of Things scenario, the nearby people and places are realized through geographical location index. MongoDB is used to store information of all connected smart devices and log information reported by devices. And the multi-dimensional analysis of these information is live video, using MongoDB to store user information, gift information, etcCopy the code

26. What are the components of “ObjectID”

It consists of four parts: the timestamp, the client ID, the client process ID, and a three-byte increment counter. The _id is a 12-byte hexadecimal number that guarantees the uniqueness of each document. When inserting a document, you need to provide the _ID. If you don't, MongoDB will provide a unique ID for each document. The first four bytes of the _id represent the current timestamp, the next three bytes represent the machine ID number, the next two bytes represent the MongoDB server process ID, and the last three bytes represent the increment value.Copy the code

27, How to use “AND” OR” OR” conditional loop to query documents in a collection

In the find() method, if multiple keys are passed in AND separated by commas (,), MongoDB treats this as an AND condition. >db.mycol.find({key1:value1, key2:value2}). Pretty () >db.mycol.find( { $or: [ {key1: value1}, {key2:value2} ] } ).pretty()Copy the code

28. How do I sort things in MongoDB?

Document sorting in MongoDB is implemented through sort(). The sort() method can specify the fields to sort by taking some parameters and specifying the sort using 1 and -1, where 1 is ascending and -1 is descending. >db.connectionName.find({key:value}).sort({columnName:1})Copy the code

29. What is aggregation?

Aggregation operations can process data records and return computed results. Aggregation operations can combine values from multiple documents, perform various operations on groups of data, and return a single result. It is equivalent to the count(*) group by in SQL. For aggregate operations in MongoDB, the aggregate() method should be used. >db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)Copy the code

30,raftThe election process, the voting rules?

Election process: After the system is started, the initial post-election system consists of one Leader and several followers. Then, the Leader service is abnormal due to an exception. As a result, the last RPC update time between the Follower role and the Leader exceeds the specified threshold. At this point, the Follower will think that the Leader service is abnormal, and then it will initiate a new Leader election behavior and change its status from Follower to Candidate. Other followers are then asked to vote for themselves. Voting rules: a candidate becomes leader when he or she receives a majority of the votes for the same term number. Each node can cast a maximum of one vote in one term. And on a first come, first served basis. Once the candidate wins the election, immediately becomes the Leader and sends heartbeat to maintain authority, while preventing the birth of a new Leader from being detected and the Leader's last RPC update time exceeds a given threshold time. At this point, the Follower will think that the Leader service is abnormal, and then it will initiate a new Leader election behavior and change its status from Follower to Candidate. Other followers are then asked to vote for themselves. Voting rules: a candidate becomes leader when he or she receives a majority of the votes for the same term number. Each node can cast a maximum of one vote in one term. And on a first come, first served basis. Once a candidate wins an election, he or she becomes the leader and sends a heartbeat to maintain authority while preventing the emergence of a new leaderCopy the code