This article appears on Github Blog.

Several topics

The following topics will be discussed and explained in this article. The table of contents is not completely consistent with these topics, but when you finish reading this article, you should have the answers as well.

  • Why and when do you use cursors?
  • Keep an eye on server memory. When does the cursor close?
  • Cursor timeouts and fault-tolerant handling need attention
  • Why not adjust the number of batchSize at will?
  • Note the differences between Mongoose and the native Node.js MongoDB driver
  • A cursor Bug found when answering group friends’ questions
  • Extension – Why can I use for await of to iterate over a cursor object?

Why use cursors?

**collection.find().toarray ()** the client driver automatically loads all the returned data into the application memory at one time, which is relatively easy to understand. In some data processing scenarios, the specific number of data may be unknown, it is possible to return a large number of data, if all hold in memory, in the server side memory cost of the place, waste service memory not to say, memory occupation is too high may cause OOM service.

The cursor in MongoDB is similar to Stream processing in Node.js, which is much more beneficial than reading the entire file into storage.

Very image of a figure, source: www.cnblogs.com/vajoy/p/634…

Basic working principle of cursor

When we use **collection.find()**** or **collection.aggregate()**, we return a pointer to the collection, also called cursor **, and cannot access the data directly. It is only when the loop iterates over the cursor that data is actually read from the database collection.

In Node.js, it is easy to iterate over the data set returned by the cursor as long as the for await of syntax is supported. It is similar to the normal for await of array, except that the data source for await of traversal is asynchronous. When the iteration starts, the driver will use getMore() to fetch a batch of data from the database set and cache it first. For example, the Node.js MongoDB driver will default to fetch 1000 data at a time (note that the first getMore() request was 101). Depending on the batchSize parameter setting, after this batch of data is processed, getMore() is executed to MongoDB Server to continue requests until cursors are exhausted.

The following are two examples of node.js usage. Personally, I would recommend **for await of**. There is a Bug in the MongoDB node.js driver version of the while loop, which is described below.

const userCursor = await collection.find();

Usercursor.count () or usercursor.hasnext () if no data is returned and you need to do some special processing
if (!await userCursor.count()) {
  // TODO:Let's finish early and do something else
  return;
}

// Method 1:
for await (const user of userCursor) {
}

// Method 2:
while (await userCursor.hasNext()) {
	const doc = userCursor.next();
}
Copy the code

For example, if the database set has 10,000 pieces of data and each batch fetch 1000 pieces, the I/O consumption should also be 10. Db. SetProfilingLevel (0, {slowms: 0}) record all operation logs. After opening the MongoDB Server console log and executing the application, you will see the following log information: each time getMore points to the same cursor ID getMore: 5098682199385946244.

If you need to modify the batchSize result, you can specify the batchSize attribute through options or call the batchSize method.

collection.find().batchSize(1100)
// Or the following methods
collection.find({}, {
  batchSize: 1100
})
Copy the code

Remember not to set batchSize to 1, for example, if every 10000 data is fetched, the client will connect to the server to read, which will generate 10000 network IO. Mongostat monitoring below shows getMore times per second when querying the cursor.

The cursor timeout

Mongod –setParameter cursorTimeoutMillis=300000; mongod –setParameter cursorTimeoutMillis=300000 See the document cursortimeoutzipcodes #Default: 600000 (10 minutes).

For example, the first time getMore () gets 1000 data in a batch, the cursor is closed if the 1000 data is not processed within the default 10 minutes. Cursor ID 4011961159809892672 not found cursor id 4011961159809892672 not found

If you encounter cursor timeouts, you can adjust the cursorTimeoutMillis parameter or reduce the number of batchSizes to choose the program configuration that is appropriate for you. The default configuration usually does not need to be adjusted. For example, if an external interface is invoked during traversal of cursor data and the cursor times out due to interface timeout, you should optimize the service before adjusting the configuration.

AddCursorFlag (‘noCursorTimeout’, true) disables the cursor timeout limit. Cursors cannot be released until they run out or manually close cursor.close(). Disabling timeout is not recommended, as each cursor incurs additional memory consumption, which is not worth the cost of memory leaks to MongoDB Server if you forget to manually close the cursor.

State of the cursor

Log in to the MongoDB client and run db.serverStatus().metrics. Cursor to view the cursor usage status. If cursors do cause MongoDB server memory leakage, the following data indicators will be helpful for o&M personnel to troubleshoot the problem.

  • TimedOut: indicates the number of cursor timeouts since the MongoDB Server process was started. This indicator reflects the number of cursor timeouts caused by the application processing time-consuming tasks or the cursor is not closed due to an error after the cursor is opened.
  • Open. NoTimeout: To prevent cursor timeout, MongoDB provides a configuration dbquery.option. noTimeout that never times out. However, if you forget to close the cursor after processing, the cursor will live in memory, and the larger the number, the greater the risk of memory leakage.
  • Open-pinned: “fix” the number of open cursors.
  • Open. total: indicates the number of cursors currently opened by the MongoDB Server for clients. When cursors run out, the total number decreases.
{
	"timedOut" : NumberLong(4),
	"open" : {
		"noTimeout" : NumberLong(0),
		"pinned" : NumberLong(0),
		"total" : NumberLong(0)}}Copy the code

Cursors and asynchronous iterators

JavaScript provides a function called iterator in ES6 syntax, which defines a uniform Set of interfaces. As long as the data type of the interface is implemented, the keyword for of can be used to traverse, such as array, Map, Set type, etc. Iterator returns an iterator. The iterator’s next() value contains vlaue and done. If done is true, the data is iterated. But symbol. iterator only supports synchronous data sources.

Retrieving data from a database collection involves network I/O, which is an asynchronous operation that symbol. iterator does not support. The ECMAScript 2018 standard provides a new property symbol. asyncIterator, which is an asynchronous iterator. In contrast to symbol.iterator, symbol.Asynciterator’s next() method returns a Promise object containing {value, done}. If an object sets this property, it is an asynchronous iterable. Correspondingly we can use for await… The of loop iterates over the data.

The MonogoDB node. js driver implementation in v4.2.2 also provides the Symbol. AsyncIterator interface, which is why we can use for await… The of loop traverses.

// mongodb/lib/cursor/abstract_cursor.js
class AbstractCursor extends mongo_types_1.TypedEventEmitter {[Symbol.asyncIterator]() {
    return {
      next: () = > this.next().then(value= >value ! =null ? { value, done: false }:	{ value: undefined.done: true})}; }}Copy the code

Fault-tolerant processing

The cursor will not be destroyed immediately if an error occurs in the for loop that causes the loop to terminate prematurely. You can either close the cursor manually or wait for the default cursor timeout to expire.

If noCursorTimeout is set to never timeout, it is important to turn off the cursor, so it is recommended not to do so.

const userCursor = await collection.find();
try {
  for await (const user of userCursor) {
    // Throw new Error('124')}}catch (e) {
  // Processing error
} finally {
	userCursor.close();  
}
Copy the code

What Mongoose needs to watch out for

Mongoose’s find() method does not return a cursor object by default. It is required to display cursor() after find. And there is no cursor.count(), cursor.hasNext() method support, for some people want to determine if the cursor does not have data to do some special processing, processing is not very friendly.

const userCursor = await User.find({}).cursor();

for await (const user of userCursor) {
}
Copy the code

A cursor Bug

In node.js group, a group member sent a message using the cursor encountered a problem, later also do some search and verification of this problem, the following will be introduced, based on a specific version and specific application scenarios will occur this problem, put here is also hope to use friends can step on a pit.

The MongoDB node.js driver has a Bug in version 3.5.4 when iterating data based on cursors, if the limit is used to limit the returned data items and hasNext() is used, the first is that the count from the returned cursor object is not correct. MongoError: Cursor is closed: MongoError: Cursor is closed: MongoError: Cursor is closed: MongoError: Cursor is closed

If you need to adjust the number of getMore() each time, the cursor can be used in conjunction with batchSize. Why use limit when using cursor? You can think about that as well.

const userCursor = await collection.find({}).limit(5);
console.log('cursor count: '.await userCursor.count());
try {
  while (await userCursor.hasNext()) {
    const doc = await userCursor.next();
    console.log(doc); }}catch (err) {
  console.error(err.stack);
}
userCursor.close();
Copy the code

Mongodb @^3.5.4 The following output is displayed:

cursor count:  10000
{ _id: 61d6590b92058ddefbac6a14, userID: 0 }
{ _id: 61d6590b92058ddefbac6a15, userID: 1 }
null
MongoError: Cursor is closed
    at Function.create (/test/node_modules/mongodb/lib/core/error.js:43:12)
    at Cursor.hasNext (/test/node_modules/mongodb/lib/cursor.js:197:24)
    at file:///test/index.mjs:42:27
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
Copy the code

NPM package mongo affected version for 3.5.4 see issue jira.mongodb.org/browse/NODE-2483 NPM package mongoose affected version for 5.9.4 see issue github.com/Automattic/mongoose/issues/8664