Today I would like to introduce the implementation principle and usage method of Redis GEO data type.

I’m sure you’ve used the Meituan APP, often used “nearby restaurants”, and I’m sure you’ve taken a taxi from Didi, all location-based messaging services. The data that LBS applications access is a set of latitude and longitude information associated with a person or object. The GEO data type is well suited to the LBS service scenario, so let’s take a look at its underlying structure.

Underlying structure of GEO:

In general, when designing the underlying structure of a data type, we first need to know what access characteristics of the data we are dealing with. I take ride-hailing service as an example to analyze the access characteristics of latitude and longitude in LBS applications.

  1. Each ride-hailing vehicle has a number (such as 10), and the ride-hailing vehicle needs to send its latitude and longitude information (110,35) to the ride-hailing app in real time.
  2. When a user calls for a car, the ride-hailing app will search for nearby vehicles based on the user’s latitude and longitude information (such as longitude 100, latitude 38) and make a match.
  3. After matching a vehicle with a user in a similar location, the ride-hailing app will obtain the vehicle information based on the vehicle number and return it to the user.

As you can see, a car (or user) corresponds to a set of longitude and latitude information, and the corresponding longitude and latitude changes as the car (or user) moves. The characteristic of this type of data is that a key (such as car number) corresponds to a value (latitude and longitude information). When there is a lot of vehicle information to save, there needs to be a set to hold a series of keys and values. The Hash collection type in Redis can quickly access a series of keys and values, which can be used to record a series of vehicle IDS and the corresponding longitude and latitude information. Therefore, we store the ids of different vehicles and their corresponding latitude and longitude values in the Hash set. As shown in the figure below:

Hash set type. You can use the HSet command to set the corresponding key to the corresponding value. Therefore, we can use this command to quickly update the changing longitude and latitude information of the vehicle. From this point of view, the Hash type seems like a good choice, but there is a problem. In addition to recording the latitude and longitude information, a LBS application needs to perform a range query in the Hash collection of vehicles based on the user’s latitude and longitude information. Once a range query is involved, we require that the data in the collection be ordered. But the Hash element is unordered, so it won’t meet our needs. Now let’s see if we can get the Sorted Set to do what we want. The Sorted Set type also supports the form key-value, where key is an element in the Sorted Set and value is the element’s weight score. The Sorted Set can be Sorted by weight and supports range queries. Therefore, the need to find adjacent locations in LBS services can be met. Since the Sorted Set has a floating point weight, it can’t store a Set of latitude and longitude values. What do you do? This is where you use the GeoHash encoding in the GEO type.

How the GeoHash is encoded

The basic principle of GeoHash encoding is “binary interval, interval encoding”. When we want to GeoHash a set of longitude and latitude, we first encode the longitude and latitude separately, and then combine the respective codes for the longitude and latitude into a final code. Let’s look at the separate coding process for latitude and longitude.

For a geographic location information, its longitude range is [-180,180]. The GeoHash encoding encodes a longitude value as an N-bit binary number (the longitude range is dipartitioned N times). When we do the first dichotomy, the longitude range [-180,180] will be divided into two sub-intervals [-180,0] and [0,180]. Then we will see which range the longitude falls into. If it falls into [-180,0], we will use 0, if it falls into (0,180), we will use 1. You get the 1-bit code value. You do it N times and you get the n-bit code value.

For example, if the longitude value we want to encode is 110, we encode it with a 5-bit code value.

Let’s do the first partition, divide the longitude [-180,180] into the left partition [-180,0] and the right partition [0,180]. At this time, 110 is in the right partition, so let’s use 1 to represent the encoding value after the first partition. At this time, the longitude value falls within the interval of [90,180], so the code value after the second partition is still 1. Next, we partition [90,180] into two partitions, and the value 110 falls between [90,135]. [90,112.5], [90,112.5], [90,112.5], [90,112.5], [90,112.5], [90,112.5], [90,112.5] The value 110 falls between [101.25,112.5], so the code after the fifth partition is 1.

Next, we also encode latitude 35 with a 5-bit coding value.

When a set of latitude and longitude values is encoded, we put them together. The rule of combination is that the even-numbered bits of the final encoding value are followed by the encoding value of longitude, and the odd bits are followed by the encoding value of latitude. Where the even digit starts at 0 and the odd digit starts at 1.

With the GeoHash encoding, instead of representing a Set of longitude and latitude (110,35) with a single weight score, you can represent it with the value 1110010110 and save it as the Sorted Set’s weight score. With GeoHash encoding, we are essentially dividing the entire geographic space into small squares, each of which corresponds to a partition in the GeoHash.

For example, if we dipartition the longitude interval [-180,180] once and the dimension interval [-90,90] once, we get four partitions. Let’s take a look at their latitude and longitude ranges and the corresponding GeoHash combination encoding.

  • Partition 1: [-180,0) and [-90,0), code 00.
  • Partition 2: [-180,0) and [0,90], code 01.
  • Partition 3: [0,180] and [-90,0), code 10.
  • Partition 4: [0,180] and [0,90], code 11.

The four zones correspond to four squares, each of which covers a certain range of latitude and longitude values. The more zones there are, the smaller the geographic space covered by each square is, and the more accurate it is. When we map the encoding values of all squares to one-dimensional space, the GeoHash encoding values of adjacent squares are basically close, as shown in the figure below:

So, we use the Sorted Set range query to get a similar code value, which is also geographically adjacent. Therefore, the function of taxi hailing based on LBS application can be realized.

Well, so far, we know that the GEO type codes the longitude and latitude ranges as the weight scores of the elements in the Sorted Set, and stores the vehicle IDS associated with the longitude and latitude as the values of the elements themselves in the Sorted Set. The query for adjacent latitude and longitude can then be implemented by querying the size range of the encoded values. Let’s look at how to manipulate the GEO type.

GEO operation commands

The GEO type often uses two commands, GEOADD and GEORADIUS.

  • GEOADD command: used to record a set of latitude and longitude information and a corresponding ID to a collection of GEO types.
  • The GEORADIUS command: looks for other elements within a certain range centered on the entered latitude and longitude. Of course, we can define the scope ourselves.
GEOADD locations 110 35 10
Copy the code

Store the current latitude and longitude (110,35) of the vehicle with ID no. 10 into the GEO collection with key = locations.

GEORADIUS locations 110 35 5 km ASC COUNT 10
Copy the code

Redis will look for the vehicle information within 5 km centered on the user’s longitude and latitude information (110,35) according to the input. Using the ASC option, the returned vehicle information is sorted in order of nearest to farthest from the central location to facilitate the selection of the nearest vehicle. Use the COUNT option to specify the number of vehicle information to return.

So far, we’ve learned all about GEO data types. More hardcore knowledge, please pay attention to the public number “programmer senior”.