The original requirements

Recently I received a demand: the operation needs to send text messages to promote some H5 pages, but one message will limit the number of words, so we need to find a way to shorten the URL of the H5 link. The length of the SHORT message is limited to 70 characters, that is, the LENGTH of the URL and the operation message must be limited to 70 characters. If the URL is too long, the operation message space will be crowded, easy to express unclear.

Demand analysis

At a glance, this is to do a short chain service. What is a short link? When accessing a short link, it automatically jumps to a configured address. There are many such short link tools on the Internet, such as bitly, a short link tool, the website is bitly.com/, you can experience.

About short link, there are many articles on the Internet, some of which also involve more complex short link algorithm. But our scenario here is relatively simple and doesn’t require too much technology. We need to analyze the needs carefully and not copy others’ plans mechanically. Here our requirements have two characteristics:

1. Short links are generated infrequently

Because it is the operation that manually generates the short links that need to be promoted, even if the whole operation of the company generates the short links together, the frequency is very low, so the performance requirement of short chain generation is very low.

2. The amount of short chain is very small

Since all the short chains are generated manually by people, the total amount of generated short chains must be very small, so our data volume is also relatively small.

3. Short link parsing needs to be as high-performance as possible

Because the operation sends a wave of text messages, when it just sends, it must be a peak time to open, and soon the amount of open will drop, and the amount of the whole process is concentrated in the first few minutes. So short link restores should be as responsive as possible and should not consume too many resources.

Short link system design

Through the above demand analysis, Frank made the following design:

1. Storage design

Mysql is only used for data storage, and a linked table is established. The table contains only two fields, one is the self-increasing ID, and the other is the long link URL. Add a unique index to the long link URL to ensure that the same long link will not generate two short links.

2. How do long links get shorter

Based on the storage design above, each long link record will have a unique ID that can be used to uniquely locate the long link. The only problem is that this ID may be a little long, for example, the operation has promoted 10000 URLS, so the ID behind will occupy at least 5 characters, how to optimize? The answer is to convert a base 10 number to base 62. Why base 62? 26 uppercase letters + 26 lowercase letters + 10 numbers = 62. Why not choose another self character? All the special characters in the URL are recoded, not these 62 characters. How much shorter can I make this choice? Take a look at the table below:

Hexadecimal number 62 It corresponds to a base 10 number
1 – digit 62-base maximum number Z 61
The largest 2-bit base 62 ZZ 3843
The maximum number ZZZ in 3-digit base 62 238327
The maximum number in 4-digit 62-base ZZZZ 14776335

As can be seen from the above table, for the 230,000 ids within the range are limited to the range of 3 characters, which is basically enough for operation. Even if more, the operation is expected to need long links will never exceed 14 million, that is to say, the most short chain ID is enough with 4 characters, relative to 10 million 8 characters, only 4 characters is enough, fully compressed twice.

3. Base conversion

How do we do base conversion? Firstly, the corresponding relationship is shown in the following table:

Decimal number Hexadecimal number 62
0 0
. .
9 9
a 10
. .
z 35
A 36
. .
Z 61

Build two maps based on the table above: one map converts base 10 to base 62, called map_to_62, and the other map converts base 62 to base 10, called map_to_10. The Python code for building two maps looks like this:

import string
chars = string.digits + string.ascii_lowercase + string.ascii_uppercase
map_to_62 = dict(zip(range(62), chars))
map_to_10 = dict(zip(chars, range(62)))
Copy the code

Base 10 to base 62 Python implementation code is as follows:

def int_to_62chars(n) :
    chars = []
    while True:
        c = map_to_62[n % 62] // Generated above10Base number conversion62Base charactermap
        chars.append(c)
        n = n // 62
        if n == 0:
            break
    chars.reverse()
    return ' '.join(chars)
Copy the code

Base 62 to base 10 python implementation code is as follows:

def chars_to_10int(chars) :
    rv = 0
    for c in chars:
        rv = rv * 62+ map_to_10[c62Base character transfer10Base numericmap
    return rv
Copy the code

4. Short chain generation

Short chain generation interface call scenario: the operation fills in the H5 to be converted in the management background and calls the short chain generation interface. If this long link does not become too short, a record is inserted into the database and the short chain is constructed by its ID. If it has already been generated, build the short chain with the generated ID. The pseudocode is as follows:

def generate_short_url(url) :
    try:
        id= insert(url) // Try to writeexcept DuplicateKeyError:
        id= find_by_URL (url) // This error indicates a unique index conflict, that is, it has been insertedidShort_id = int_to_62chars(id) // The above implementation10Base number conversion62A function of base charactersreturn short_domain + '/' + short_id
Copy the code

5. short chain reduction

After the user clicks the short link, the server converts the short link ID into the database ID, and then checks the long link from mysql, and redirects to the long link. The pseudocode is as follows:

def parse_short_url(short_id) :
    id= chars_to_10int(short_id) // Previously implemented62Base string transfer10Base number url = find(id// Query the database for the long link redirect(url) //302redirectCopy the code

Summary analysis

advantages

  1. The implementation is very simple, with very few dependencies, only one mysql, and the database library tables are very simple
  2. Short chain restore only needs to perform one mysql query, considering the actual business scenario, many people will look up a short chain at the same time, considering the memory cache of mysql, mysql performance should be very good. A few thousand QPS queries are expected to be completely stress-free.

disadvantages

  1. Too many queries to have enough mysql (which is rarely the case)
  2. Since it is a short chain constructed by primary key ID, it is possible to guess all urls. However, whether this is important depends on the scene, such as the one Frank encountered.

Overall rating

This implementation is undoubtedly a very low cost of the short chain system, but for most companies is also fully enough, we must combine the actual business scene in the actual work to design the system, can not blindly invest resources to do a very awesome but not practical system.

Old article recommended:

  1. Wechat small program and double 叒 to change the important interface!
  2. Wechat standard version of the transaction component usage
  3. How to make money from small program advertising?

PS: Welcome to my mall project, Github, Gitee