Now, after reading about your skill in using data types to get statistics of hundreds of millions of people, I learned how to use different data types (String, Hash, List, Set, Sorted, Hyperloglog, Bitmap) to solve statistics problems in different scenarios.

The product manager said he had an idea to provide a chance for teenagers to connect with each other

Let in this most beautiful age of boys and girls can meet that TA in every twelve hours.

So I want to develop an App, users can log in and find the nearby TA, connect with each other.

How do I find people nearby? I also hope to meet goddesses through this App…

In my memory, one night after work, she was moving lightly from the crowd, her tall and slim figure floating like an elegant note in the space. Her eyes were full of clear sunlight and life, and they were shining with the stars of the Milky Way.

The opening remarks

Practice your presentation skills, especially at work. Many people say that “the work is not as good as those who make PPT”, in fact, the boss is not stupid, why would they more recognize those who make PPT?

Because they see things from the boss’s point of view, for him, what is needed is a “solution.” Think from a creator’s point of view, not a programmer’s point of view;

Think more about what value this thing provides to the person, rather than “how am I going to achieve this?” Of course, how you do it is a must, but it’s usually not the most important thing.

What is LBS oriented application

Longitude and latitude are the combination of longitude and latitude to form a coordinate system. Also known as the geographic coordinate system, it is a spherical coordinate system that uses the spherical surface of the three-dimensional space to define the space on the earth, and can mark any position on the earth (7 decimal places, the accuracy can be up to 1 cm).

The range of longitude is (-180, 180), and the range of latitude is (-90, 90). The latitude plus or minus is bounded by the equator, and the north is due to the south. The longitude plus or minus is bounded by the prime meridian (Greenwich Observatory), and the east is positive and the west is negative.

Nearby people are often referred to as LBS (Location Based Services), which centers on the user’s current geographic Location data and provides accurate encounter Services for users.

Nearby people’s core ideas are as follows:

  1. With “I” as the center, search for nearby TA;
  2. Calculate the distance between others and me based on my current geographical position;
  3. Sort by the distance between “me” and others, and screen out the users nearest to me.

MySQL implementation

How do you do this? You compute the “people in the neighborhood” by calculating the other data in the vicinity of a coordinate, sorted by distance?

With the user as the center and given a radius of 1000 meters to draw a circle, the users within the circle are the “nearby people” we want to meet.

MySQL > store latitude and longitude in MySQL >

CREATE TABLE 'nearby_user' (' id 'int(11) NOT NULL AUTO_INCREMENT,' name 'varchar(255) DEFAULT NULL COMMENT, 'longitude' double DEFAULT NULL COMMENT 'longitude ',' latitude 'double DEFAULT NULL COMMENT' latitude ', 'create_time' DATETIME DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT ', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

However, it is impossible to traverse all the “goddess” longitude and latitude data and calculate the longitude and latitude data of oneself by sorting according to the distance, which is too much calculation.

We can filter out the limited “goddess” coordinate data through the region, and then carry out the full distance calculation and then sort the data within the rectangular region, so that the calculation amount is significantly reduced.

How do you divide the rectangular area?

In a square on the circular coat, according to the maximum and minimum value of the user’s longitude and latitude (longitude and latitude + distance), as a filtering condition to filter data, it is easy to search out the “goddess” information in the square.

What about the extra areas?

The distance between the users in the extra area and the circular point must be larger than the radius of the circle, so we calculate the distance between the center point of the users and all users in the square, and screen out all users whose distance is less than or equal to the radius. Users in the circular area are the people nearby who meet the requirements.

In order to satisfy the high performance rectangular region algorithm, the data table needs to have a compound index (longitude, latitude) on the longitude and latitude coordinates to maximize query performance.

In actual combat

To get the maximum and minimum longitude and latitude of the outer rectangle based on longitude, latitude and distance, and to calculate the distance based on longitude and latitude, a third-party class library is used:

<dependency dependency> <groupId>com.spatial4j</ artifactId> </version> 0.5</version> </dependency>

After obtaining the outer rectangle, the user in the square area is searched by the maximum, minimum longitude and latitude of the rectangle, and then the user exceeding the specified distance is eliminated to be the final person in the vicinity.

/** * Get people in the vicinity of x meters ** @param distance search range unit km * @param userLNG current user's longitude * @param userLat current user's latitude */ public String nearBySearch(double distance, double userLng, double userLat) { //1. Rectangle = getRectangle(Distance, userLNG, userLat); List<User> users = UserMapper.selectUser (Rectangle. GetMinX (), Rectangle. GetMaxx (), Rectangle, Rectangle, Rectangle, Rectangle, Rectangle, Rectangle, Rectangle, Rectangle, Rectangle, Rectangle, Rectangle, Rectangle. rectangle.getMinY(), rectangle.getMaxY()); Filter (a-> getDistance(a.getlongitude (), a.getlonglatitude (), a.getlongitude (), a.getlongitude (), a.getlongitude (), a.getlongitude (), userLNG, userLat) <= distance) .collect(Collectors.toList()); return JSON.toJSONString(users); } private Rectangle getRectangle(double distance, double userLng, double distance); double userLat) { return spatialContext.getDistCalc() .calcBoxByDistFromPt(spatialContext.makePoint(userLng, userLat), distance * DistanceUtils.KM_TO_DEG, spatialContext, null); } /*** * in the sphere, * @Param longitude 1 * @Param longitude 1 * @Param userLat latitude 2 * @Param userLat latitude 1 * @Param userLat latitude 2 * @return Private double getDistance(double longitude, double longitude, double userLng, private double getDistance(double longitude, double longitude, double userLng, double userLat) { return spatialContext.calcDistance(spatialContext.makePoint(userLng, userLat), spatialContext.makePoint(longitude, latitude)) * DistanceUtils.DEG_TO_KM; }

Since the sorting of distances between users is implemented in the business code, you can see that the SQL statement is also very simple.

SELECT * FROM nearby_user
WHERE 1=1
AND (longitude BETWEEN #{minlng} AND #{maxlng})
AND (latitude BETWEEN #{minlat} AND #{maxlat})

However, database query performance is limited, and if the “nearby people” query requests are very large, this may not be a good solution in high concurrency situations.

Try Redis Hash without success

Let’s analyze the characteristics of LBS data:

  1. Each goddess is given an ID number, and each ID corresponds to latitude and longitude information.
  2. Otaku has landedapp When it comes to getting the “girl I’m attracted to,”app According to the latitude and longitude of the “otaku” to find the nearby “goddess”.
  3. After obtaining the “Goddess” ID list in line with the position, the “Goddess” information corresponding to ID is obtained from the database and returned to the user.

The data feature is a goddess (user) corresponding to a set of latitude and longitude, which reminds me of the Redis Hash structure. That is, a key corresponds to a value.

Hash seems to be possible, but in addition to recording longitude and latitude, the LBS application also needs to conduct range query on the data in the Hash set and convert it into distance sorting according to longitude and latitude.

The data in a Hash set is unordered, which is obviously not desirable.

Sorted sets are beginning to show up

Is Sorted Set the right type? Because it can be sorted.

Sorted Set is also a key that corresponds to a value, the contents of the key element, and value ‘is the weighted score for that element.

Sorted sets sort the elements based on their weighted scores, which looks like we need a Sorted Set.

For example, in a Sorted Set, the elements are “goddess ID,” and the weights for those elements are score, which is latitude and longitude information.

Now, what do we do if we take the weights of the Sorted element as a floating point number and the latitude and longitude are both longitude and latitude? Can you convert latitude and longitude to a floating point number?

The idea is right. In order to compare longitude and latitude, Redis adopts the GeoHash encoding widely used in the industry to encode longitude and latitude respectively, and finally combines the respective codes of longitude and latitude into a final code.

This turns latitude and longitude into a single value, and the underlying data structure of Redis’s GEO type uses Sorted sets.

Let’s look at how Geohash encodes latitude and longitude.

GEOHash encoding

For Geohash, refer to:
https://en.wikipedia.org/wiki…

The Geohash algorithm maps the two-dimensional longitude and latitude data to the one-dimensional integers, so that all elements are mounted on a line, and the two-dimensional coordinates that are close together are mapped to one-dimensional points that are close together.

When we want to calculate “nearby people”, we first map the target position to this line, and then we get the nearby points on this one-dimensional line.

The Geohash encoding encodes a longitude value into an n-bit binary value. Let’s do N bipartitioning operations on the longitude range [-180,180], where N is customizable.

During the first dicision, the longitude range [-180,180] is divided into two subsections: [-180,0] and [0,180] (which I call the left and right partitions).

At this point, we can check whether the longitude value we want to encode is in the left or right partition. If it’s in the left partition, we use 0; If it’s in the right partition, it’s a 1.

This way, after each bipartitioning, we get a 1 bit coded value (either 0 or 1).

Again on the longitude value belongs to the partition to do two partitions again, at the same time again to check the longitude value fell in the two partitions after the left partition or the right partition, according to the rules just do 1 bit coding. After the N times of bipartitioning, the longitude value can be expressed by a number of N bits.

All map element coordinates will be placed in a single square. The smaller the grid, the more accurate the coordinates. These squares are then coded as integers, so that the closer the squares are, the closer they are.

After encoding, the coordinate of each map element will be changed to an integer, from which the coordinate of the element can be restored. The longer the integer is, the less the loss of the restored coordinate value will be. For the Nearby feature, the loss of accuracy is negligible.

For example, the longitude value of 169.99 is encoded in 4 bits (N = 4, partitioning 4 times), and the longitude interval [-180,180] is divided into left partitioning [-180,0] and right partitioning [0,180].

  1. 169.99 belongs to the right partition1Represents the first partition code;
  2. Then, 169.99 was further divided into [0, 180] interval after the first division, and [0, 90] and [90, 180]. 169.99 was still in the right interval, encoding ‘1’.
  3. Divide [90, 180] into [90, 135] and [135, 180], this time in the left partition, encoding ‘0’.

So we end up with a four digit code.

Latitude is coded in the same way as longitude.

Merge longitude and latitude codes

If the calculated longitude and latitude codes are 11011 and 00101 ‘respectively, the 0th bit of the target code takes the value 1 of the 0th bit of the longitude as the target value, and the 1st bit of the target code takes the value 0 of the 0th bit of the latitude as the target value, and so on:

In this way, the latitude and longitude (35.679, 114.020) can be represented by 1010011011, and this value can be used as the weight value of the SortedSet for sorting.

Redis GEO implementation

The GEO type takes the geohash-encoded combined values of latitude and longitude as the score weights for the elements in the Sorted Set. What instructions do Redis’s GEO have?

We need to get the ID of the girls who log in to the app and their latitude and longitude into the Sorted Set.

More types of GEO command may refer to: https://redis.io/commands#geo

GEOADD

Redis provides the Geoadd Key Longitude Latitude Member command to record a set of longitude and latitude information and the corresponding “goddess ID” in a collection of GEO types, as follows: record the longitude and latitude information of multiple users (sora aoi, bodo noxie) at once.

Geoadd Girl: Localtion 13.361389 38.115556 "sora aoi" 15.087269 37.502669 "Bodo Yie Yie"

GEORADIUS

I logged in the APP to get my own latitude and longitude information. How to find other users within a certain range centered on this latitude and longitude?

The Redis Geo type provides the GEORADIUS directive: it looks for other elements in a range centered around the latitude and longitude of the input location.

Assuming their longitude and latitude are (15.087269 37.502669), it is necessary to obtain the nearby “goddess” of 10 km and return it to LBS application:

GEORADIUS girl:locations 15.087269 37.502669 km ASC COUNT 10

ASC can realize the “goddess” information according to the distance from their longitude and latitude from near to far.

The COUNT option specifies the number of goddesses to be returned to prevent too many goddesses nearby and save bandwidth resources.

If you feel you need more goddesses, then you can have no limit, but you need to pay attention to your health and eat more eggs to make up for it.

After the user referral, such as delete referral of the “goddess” latitude and longitude?

That’s a good question. Geo types are implemented based on Sorted sets, so you can use the ZREM command to do this.

For example, delete the location information of “sora aoi” :

Zrem girl:localtion "sora aoi"

summary

Instead of designing a new underlying data structure, GEO uses the Sorted collection type directly.

The Geo type uses the Geohash encoding method to convert the latitude and longitude to the weighted scores of the elements in the Sorted Set. The two key mechanisms involved are to segment the ranges of the two-dimensional maps and to encode the ranges.

If a group of latitude and longitude falls into an interval, they’re represented by the coded values of the interval, and they serve as weighted scores for the Sorted elements.

In a mapping application, there might be millions of pieces of data about cars, restaurants, and people, and if you use Redis’s GEO data structure, they all fit into a zset.

In the cluster environment of Redis, the set may migrate from one node to another. If the data of a single key is too large, the migration of the cluster will be greatly affected. In the cluster environment, the data amount corresponding to a single key should not exceed 1M, otherwise the cluster migration will be stuck. Affect the normal operation of online services.

Therefore, it is recommended that GEO data be deployed using a separate Redis cluster instance.

If you have hundreds of millions of data or more, you need to break the GEO data down, by country, by province, by city, and even by district in megacity.

This can significantly reduce the size of a single ZSet collection.

The giant shoulder

  1. https://segmentfault.com/a/11…
  2. https://juejin.cn/book/684473…
  3. https://cloud.tencent.com/dev…
  4. REDIS core technology and actual combat