Today’s article is mainly about fonts crawling backwards.

At present, the few known font reverse crawling websites are Maoyan, Autohome, Tianyan, Qidian.com and so on.

Before also read this aspect of the article, today with an old brother in the exchange, finally a real operation, understand the font reverse crawling is what play. Listen to me slowly.

We have created a community of 2,000 people to communicate with people who are working in Python. There are also e-books and videos to download. If you are interested in Python, or are studying or working in Python, you can join us. Group: 877169862

Third-party libraries used in this article

fontTools

1. Target website

Url = “https://su.58.com/qztech/”

2. Anti-crawler mechanism

I saw it on the website

Inside the background source code

As can be seen from the above, the word “sheng” has become garbled. Please pay special attention to the number indicated by the arrow.

3, solve

1. Determine the anti-crawl method

After looking at other people’s analysis of the article, it is determined to adopt the font crawl mechanism, that is, the website defines the font file, and then carries on the corresponding search and replace, in the front end, there is no difference. In fact, from the review element can also be seen:

Similar to Dianping’s reverse crawl, it’s all done through CSS.

2. Look for font files

Search for “customFont” in the box above and find it in the source code:

And there are base64, direct decryption, but decryption is actually garbled, this time actually to do very simple, the decryption of the content can be saved as. TTF format.

TTF file: *. TTF is a font file format. TTF(TrueTypeFont) is a font file format jointly launched by Apple and Microsoft. With the popularity of Windows, it has become the most commonly used font file representation. @font-face is a module in CSS3 that implements the embedding of custom Web fonts into specified Web pages.

Because we need to study the font, we have to open it. Here I use FontCreator, and after opening it, it looks like this (in fact, there are many words here, so I only cut the following picture in order to see clearly) :

Obviously, each word can see the glyphs and glyphs encoded.

See if the arrow is the same number as the previous arrow. Yeah, that’s how you map it.

So the idea now seems to be to find the arrow number in the source code, and then find the font and replace it.

Congratulations. If that’s what you think, you’re in a hole.

For each access, the font font is the same, but the character encoding does change. Therefore, we need to dynamically parse font files on a per-access basis.

Type 1:

Type 2:

So trying to die by writing is not going to work.

At this point we are going to take a deeper look at font files.

3. Study font files

We can’t see the inside of the.ttf file, so we need to convert the font file to XML format, and then check:

Specific operations are as follows:

from fontTools.ttLib import TTFont
font_1 = TTFont('58_font_1.ttf')
font_base.saveXML('font_1.xml')Copy the code

The XML format is as follows:

It’s a long file. I only took part of it.

If you look closely, you will find that the x,y and ON values below these two are exactly the same. So our idea is to take a known font file as the base, and then compare the x, Y and ON values corresponding to each word in the new font file. If they are the same, it means that the new text pair can find the corresponding word in the base font, which is a bit convoluted. Here is a small example.

Suppose: “I” in the basic font named uni1, corresponding x=1,y=1,n=1 in the new font file, a named uni2 corresponding x,y,n are equal to the above, then it can be determined that uni2 corresponding text is “I”.

When checking the data, I found that in special cases, sometimes the corresponding x and Y of the two fonts are not equal, but the difference is within a certain threshold value, and the processing methods are similar, but the above is equal. In this case, it is necessary to compare.

In fact, if you use the drawing tool to connect the dots according to the x and y values above, you will find that it is the Chinese character ~

So, to summarize:

1. Save the font file obtained by a request to the local [Basic font]; Two, open with software, artificial to find out the corresponding coding of each number [must ensure the correct order, otherwise there will be an accident]; Third, we need to save the new font file when we visit the web page in the future; Fourth, use Fonttools library to process the basic font and the new font, find the mapping between the new font and the basic font; 5. Replacement;

4, the code

The code in wechat is really ugly.

Or forget it, wechat background keywords “font encryption” can get github address.

Take a look at the results

conclusion

In fact, the biggest problem with this process is that the dictionary data of the basic fonts we manually input may change, which leads to manual modification later.

Now, if you’ve read this article, why don’t you try a few other sites?

If you have any questions, please let us know.