1 introduction

Mobile Internet has been integrated into every aspect of our life.

We can search for businesses, houses and cars through various apps. As a writer at 👨💻, I habitually wonder how these functions are implemented.

For example, the most intuitive idea is to look up the table in the database to screen out the vehicles less than 3 kilometers away from the user and send the data back to the client.

One serious problem with this method is that it takes too long to calculate the relative distances for every term in the entire table. Since there’s a lot of data in the table can we narrow down the scan? Then we will think of whether we can narrow the scanning scope according to the business characteristics, for example, we can only scan the vehicles in the city where the user is currently located. According to this idea, we can expand and find that the data volume is still very large and cannot solve the problem when the user is at the boundary of two cities.

How to quickly index data is the key to solve this problem. When browsing the Redis API, I found that it can directly realize the nearby XXX function. In the following, I will introduce how to realize such function by Redis and deeply analyze the realization principle behind it.

2 Redis GEO API

2.1 Adding geographical location Information

geo add key longitude latitude member [longitude latitude member ...]
Copy the code

eg:

Add location information for vehicle 1 and vehicle 2 to CARS: Locations.

127.0.0.1:6379> geoadd cars:locations 120.346111 31.556381 1 120.375821 31.560368 2 
Copy the code

2.2 Obtaining geographical location information

eg:

Get the location information of the vehicle numbered 1

Geopos cars: Locations 1) 1)"120.34611314535140991"
   2) "31.55637987511895659"
Copy the code

2.3 Obtain the distance between two geographical locations

eg:

Gets the distance between vehicle number 1 and vehicle number 2

127.0.0.1:6379> Geodist cars: Locations 2 km"2.8504"
Copy the code

2.4 Obtain the geographic information location set of a specified location range

GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [ASC|DESC] [COUNT count]
Copy the code

Returns all of the location elements contained by the key that are within a given maximum distance from the center, centered at a given latitude and longitude.

eg:

Look for vehicles within 5 km with longitude 120.375821 latitude 31.556381 as the center

Locations: 1) 1) 1) 1) 1) 1) 1) 2) 1) 2) 1) 2) 1) 2) 1)"2"
   2) "0.4433"(3)integer), 4054421167795118 (4) 1)"120.37582129240036011"
      2) "31.5603669915025975"
2) 1) "1"
   2) "2.8157"(3)integer), 4054421060663027 (4) 1)"120.34611314535140991"
      2) "31.55637987511895659"
Copy the code

Returns all of the location elements contained by the key that are within a given maximum distance from the center, centered at a given latitude and longitude.

A range can use one of the following units:

  • M is in meters.
  • Km is expressed in kilometers.
  • Mi means miles.
  • Ft is in feet.

The command returns additional information when given the following options:

  • WITHDIST: The distance between the location element and the center is also returned. The units of distance are the same as the units of range given by the user.

  • WITHCOORD: Returns the longitude and dimension of the positional element as well.

  • WITHHASH: Returns the original Geohashed ordered set score of the positional element as a 52-bit signed integer. This option is mainly used for low-level applications or debugging, and is not useful in practice. The command returns unsorted positional elements by default. The user can specify the order of the elements of the returned position with the following two parameters:

  • ASC: Returns the position element from near to far based on the position of the center. DESC: Returns the location element from far to near based on the location of the center.

  • By default, the GEORADIUS command returns all matching positional elements. Although the user can use the COUNT option to retrieve the first N matched elements, because the command may need to process all matched elements internally, even using the COUNT option to retrieve only a few elements in a very large area of a search can be very slow. On the other hand, using the COUNT option to reduce the number of elements that need to be returned is still very useful for reducing bandwidth.

2.5 Get the location set of geographic information for the specified element range

GEORADIUSBYMEMBER key member radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [ASC|DESC] [COUNT count]
Copy the code

eg:

Search for GEORA within 5 km based on vehicle number 1

Locations: 1) 1) 1) 1) 1) 1) 2) 1) 2) 1) 2) 1)"2"
   2) "0.0000"(3)integer), 4054421167795118 (4) 1)"120.37582129240036011"
      2) "31.5603669915025975"
2) 1) "1"
   2) "2.8157"(3)integer), 4054421060663027 (4) 1)"120.34611314535140991"
      2) "31.55637987511895659"
Copy the code

The related optional parameters are the same as those in 2.4.

3 Redis GEO implements nearby XXX functions

After studying the Redis GEO API, you can find that as long as the Redis client is called

2.4 Obtain the geographic information location set of a specified location range

The API can fulfill the requirements. So easy!!

4 Principles behind Redis GEO

4.1 Storage Structure

Redis has corresponding encoding methods when storing data of different data types. What kind of encoding does Redis GEO use for storage?

When browsing the Redis GEO API, I found that it did not delete the instruction, because its underlying is implemented using Zset. We can use ZREM to delete data.

Try to use zset query command to query the GEO information added above

127.0.0.1:6379> ZRANGE cars: 0 -1 WITHSCORES 1)"1"
2) "4054421060663027"
3) "2"
4) "4054421167795118"
Copy the code

The location information of vehicle no. 1 is 4054421060663027. The location information of vehicle 2 is 4054421167795118. Review the zset add members directive again

ZADD key score member [[score member] [score member] ...]
Copy the code

So far, it can be inferred that the process of Redis GEO adding longitude and latitude location information is as follows

ZADD cars:locations 4054421060663027 1
Copy the code

4054421060663027 is the encoded value of latitude and longitude. Using 4054421060663027 as score can quickly realize the index of latitude and longitude.

A review of the documentation shows that Redis uses geohash to encode latitude and longitude information.

4.2 GeoHash Analysis

This article provides a good analysis of the core principles of GeoHash

So that sums it up

  • How do you uniquely represent a piece of space on earth?
  • How do you slice the earth into roughly sized chunks that support different granularity representations?

To solve these two problems, we need three steps.

  1. Step one, turn a three-dimensional earth into a two-dimensional one;
  2. The second step is to convert two dimensions into one;
  3. Finally, one dimension is represented as a binary code store.

4.2.1 How to change 3d into 2D?

The latitude range is [-90,90] and the longitude range is [-180,180]. Let’s expand it out and imagine a rectangle that’s long

4.2.2 How to change two dimensions into one dimension?

By doing this, we can transform the surface of the earth into a plane in two dimensions. So what we’re going to do is convert two dimensions into one. If you cut a two-dimensional space, you can cut a lot of squares. How do I represent this square? The easiest way to do this is to walk through the plane. Each time a point is traversed, a value is assigned to it, such as 00, 01, 10, 11, which corresponds to traversing different positions on the surface as the binary number increases.

When the space is divided into four blocks, the coding sequence is 00 in the lower left corner, 01 in the upper left corner, 10 in the lower right corner, and 11 in the upper right corner, which is a z-like curve.

How do I represent the different granularity?

When we recursively decompose each block into smaller sub-blocks, a smaller space range can be identified (as shown in Figure 2). For example, the sequence of coding from 0000 to 1111 is self-similar (fractal), and each sub-fast also forms a Z-curve. This type of curve is called Peano space filling curve.

4.2.3 How to Express one-dimensional as binary code for Storage

Geohash also has several encodings, with two common ones, Base 32 and Base 36. Encodes a string of binary data that falls on the grid

The tail

After analyzing the realization principle of Redis GEO, it is found that the core behind it is GeoHash. Geohash is used to encode two-dimensional longitude and latitude data into one-dimensional data, and then b-tree index is used to quickly find the required data.

The above Redis GEO is used to realize the functions of nearby people, nearby vehicles, and nearby businesses, which can only be searched by radius.

Q: What if the demand is that I want to find coffee in all the shops within 5 kilometers?

A: Of course we can filter all the data queried by Reids at the application layer.

Q: When Redis returns a large amount of data and user requests, it consumes very much memory resources. Is there a better solution? Is there a better solution already in the database implementation industry?

A: With such A question, I searched relevant materials and found that GeoHash is actually A kind of realization of spatial index. MySql and MongoDB, which we often use, have the realization of spatial index.

  1. MongoDB

The GeoJSON object in Mongo has dots, lines, polygons, multiple line segments, multiple points, and multiple polygons. Support include, intersection, adjacent query, and support multi-condition query. (Feeling very powerful is really a solution that may have qualitative benefits)

  1. MySql

MySql InnoDB only added support for spatial indexing in 5.7.4 LABS, both of which implement spatial indexing through R trees.

The cost of upgrading MySql is very high. After understanding the geoHash principle, we can add the GeoHash field to the MySql table to quickly locate data through the binary search method of B numbers.

The next blog post will summarize the practice of manually calculating the geoHash + MySql B-tree index implementation and compare the storage space and query efficiency of MySql’s own spatial index.