Welcome to Tencent cloud technology community, get more Tencent mass technology practice dry goods oh ~

With the popularity of service sharing in China, a variety of services such as shared bikes, shared umbrellas and shared charging banks have mushrooming. The LBS service positioning problem has become a challenge for back-end services. MongoDB is friendly to SUPPORT LBS query and is also the preferred database of various LBS service providers. Tencent Cloud MongoDB team found in the operation, the original MongoDB has a large performance bottleneck in the LBS service scenario, after Tencent cloud team professional positioning analysis and optimization, cloud MongoDB in the LBS service comprehensive performance, has more than 10 times. Tencent Cloud MongoDB provides excellent comprehensive performance, providing a strong guarantee for domestic LBS service providers, such as Mobike.

LBS business Features

Taking bike-sharing service as an example, LBS business has two characteristics, namely periodicity of time and uneven coordinate distribution.

one Periodicity of time

There is a significant difference between the QPS quantity in peak period and trough period, and the time point of peak period and trough period is relatively fixed.

Two. The coordinate distribution is not uniform

Commuters who take the subway, if they pay attention, may notice that during the morning rush hour, the subway is surrounded by shared bikes, while the number of shared bikes around the subway is very small at the end of the day. In the figure below, more than 99% of coordinates are concentrated around latitude and longitude (121,31.44). In addition, some special events can also cause uneven distribution of points, such as shenzhen Bay Park on special family holidays when a large number of customers, at the same time, this area will be a large number of bikes. Some areas have a high concentration of bikes, while others are sparse.

LBS service principle of MongoDB

MongoDB uses 2d_index or 2d_sphere_index to create geoIndex. There is little difference between the two. Let’s take 2d_index as an example.

one Creation and use of 2D indexes

db.coll.createIndex({"lag":"2d"}, {"bits":int})) to create a 2D index, the precision of the index is specified by bits, the larger the bits, the higher the precision of the index. Db. RunCommand ({geoNear: tableName, maxDistance: 0.0001567855942887398, distanceMultiplier: 6378137.0, num: 30, NEAR: [113.8679388183982, 22.58905429302385], Spherical:true|false})
Copy the code

Through the above command to query an index, of which the spherical: true | false indicates how should understand to create 2 d index, said false understanding index as a planar 2 d index, index of true said it will understand for spherical index of latitude and longitude. This is interesting because a 2D index can have two meanings, and the different meanings are understood at query time, not index creation time.

two The theory of 2D indexing

Mongo GeoHash technology to build 2 d index (see https://en.wikipedia.org/wiki/Geohash wiki GeoHash text chain). MongoDB generates GeoHashId by partition of flat quadtree. Each record has one GeoHashId, which is stored in Btree by index mapping of GeoHashId->RecordId.

PI
<<

Storage of 2D indexes in Mongodb

We talked about how Mongodb uses a flat quadtree to compute the Geohash. In fact, flat quadtree only exists in the process of operation, not in the actual storage.

Insert For a latitude and longitude coordinate [x,y], MongoDb calculates the grid number of the coordinate in 2d plane. The number is a 52bit INT64 type, which is used as the key of btree. So the actual data is inserted into the btree as {GeoHashId->RecordValue}.

Query geoNear and geoWithin are commonly used to query a geo2D index. GeoNear finds coordinates of N points closest to a point and returns them. This requirement can be said to form the basis of LBS services (Momo, Didi and Mobike), while geoWithin queries all points within a polygon and returns them. We highlight the most widely used geoNear query.

The query process of geoNear is as follows

db.runCommand(
{
geoNear: "places", // Table Name near: [-73.9667, 40.78], // Central Point Spherical:true, // treat the index as a spherical index
query: { category: "public" } // filters
maxDistance: 0.0001531 // distance in about one kilometer
}
Copy the code

GeoNear can be understood as a circular search process that starts at the starting point and spreads outward. As shown below:

Due to the nature of the circle itself, the distance between any point in the outer ring and the center of the circle must be greater than that between any point in the inner ring and the center of the circle. Therefore, the benefits of expanding iteration by circular ring are as follows:

1) Reduce the number of points for sorting and comparison; 2) The point that meets the condition can be found as soon as possible and returned to avoid unnecessary search.

In the implementation details of MongoDB, if the number of points searched by the inner ring is too small, the step length of each expansion of the ring will be doubled

Problems encountered in MongoDB LBS service

When some large customers use the geoNear function of MongoDB to search for nearby objects, slow queries often occur. The pressure in the morning peak is 10-20 times that in the off-peak period, and the coordinate is not uniform. Slow queries are serious and are on the edge of avalanches. Preliminary analysis found that these queries scanned too many point sets.

In the figure below, it took 500ms to find the 10 nearest records within 500 meters and scanned 24,000 + records. Similarly slow queries accounted for about 5% of peak queries

Test environment reappearance and positioning

Troubleshooting database performance problems, mainly from lock wait,I/O wait,CPU consumption three cover analysis. The screenshot above scans too many records and is an intuitive bottleneck for CPU or IO consumption. We in the test environment for the sake of precise repetition, found the log has no obvious timeAcquiringMicroseconds item ruled out the mongo lock contention problem on the level of execution, and choose larger memory resident machine makes data memory, found that the above case still needs more than 200 ms execution time. The 10-core CPU can only support 50QPS for the case shown in the screenshot.

Why is the scan set so large

As mentioned above, MongoDB’s search for the nearest point is a process of ring expansion. If there are not enough points in the inner ring that meet the conditions, the radius of expansion will be doubled each time. Therefore, it is easy to fall into BadCase when encountering a scene with few inner ring points and dense outer ring points. In the figure below, we want to find the three points closest to the center. Because the ring expands too fast, the outer ring does a lot of useless scanning and sorting.

Such cases are accord with the actual scene, the earlier peak traffic crowded around the subway, you set off from home a road subway, walking and looking, and Shared cycling on dynamic search software from your recent 10 cars, only three two nearby, then expand the search radius around the subway, the subway around all the thousands of car of scanning calculation again, return from the rest of your recent seven or eight units.

Problem solving

We already know the problem, and the way we optimize this is by controlling the amount of searches per circle, so we add two parameters to the geoNear command and pass them into NearStage. HintCorrectNum can control the lower limit of the quality of the result. The first N points returned must be the N points closest to the center point. HintScan is used to control the upper limit of the scan set size.

HintScan: If the size of the scanned point set is larger than hintScan, blur is performed. HintCorrectNum: If the number of returned results is greater than hintCorrectNum, fuzzy processing is performed.

Contrast with Redis3.2

Redis3.2 also added location queries, and we compared open source Redis to MongoDB. Redis usage: GEORADIUS AppName 120.993965 31.449034 500 m count 30 ASC. In the scenario of dense data set, Tencent Cloud MongoDB and open source Redis are used for performance comparison. The following is a test comparison of MongoDB singleton versus Redis singleton on a dense data set on a 24-core CPU machine. It is important to note that Redis itself is a single-threaded in-memory cache database. MongoDB is a multi-threaded database with high availability and persistence. The usage scenarios of the two are quite different.

conclusion

The native geoNear interface of MongoDB is the mainstream choice of domestic LBS applications. In the case of dense point set of native MongoDB, the efficiency of geoNear interface will decrease sharply, even less than 1000QPS in a single machine. The Tencent Cloud MongoDB team has continued to optimize this. Under the premise of not affecting the effect, geoNear’s efficiency has been improved by more than 10 times, which provides strong support for our customers such as Mobike and has a large performance advantage compared with Redis3.2.

Recommended reading

Tang Liang: Cloud architecture brings innovation to e-commerce Industry Huang Rongkui: How to develop small programs quickly and conveniently How to build businesses based on cloud? Tencent cloud + future summit said so

Has been authorized by the author tencent cloud community released, reproduced please indicate the article source The original link: https://cloud.tencent.com/community/article/723875