Recently listen to next to several development are talking about the matter of buying a car, so to the car of the home of the forum stepped on…. Based on programmer inertia, you just hit F12, and you see this? This is not to do the font crawl means, so the story began.

The first step

First, we need to get the overall article data

  • Send the request
  • Receive data
  • Analytical data
Import requests from LXML import etree "Https://club.autohome.com.cn/bbs/thread/665330b6c7146767/80787515-1.html" # build request headers head = {" the user-agent ": "Mozilla / 5.0 (Windows NT 6.1; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"} Get (url=url,headers=headers) # res_html = res.text etree.HTML(res_html) content_list = html.xpath('//div[@class="tz-paragraph"]//text()') print(content_list)Copy the code

The printed result is

[' from the new big G released ', '\uedff', 'instant, then give me the first ', '\uedff',' the feeling of time, I have been deeply in love, I feel ', '\ Ued83 ', 'big G is every ', '\ Uedff ',' a man dream, and I will not be the same exception, So in terms of car selection, I also had some contradictions, "\ Uedff", "\ UEC5D", "can choose too", "\ UEC25", such as Bentley, and Tianyue, "can choose. When I was young, I really drove enough cars, from the BMW 7 series, To the present GLS, when you drive on the SUV, you don't want to drive a car, then want to ', '\ uedFF ', 'want to compare has been 30',.... '\ UEC25 ',' also can't always be the water army, I want to go to the Huo Huo big wave little sister ']Copy the code

From the output, we can see that there is some coding involved, but leave that aside for now.

The second step

Let’s see how to concatenate the elements of this list into a text string using str.join(iterable)

content_str = "".join(content_list) print(content_str) -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - 'released from the new big G \ uedff moments, At that time give me the first \ UEDFF time feeling, I have been deeply in love, I feel \ UED83 big G is every \ UEDFF man dream, and I will not be the same exception, so in terms of car selection, I also have \ UEDFF some contradictions, after all, in 200W left \ UEC5D can choose too \ UEC25, such as bentley, And add more, can choose, in the young is really enough to drive cars, from BMW 7 series, to the present GLS, when you drive on the SUV, do not want to drive a car, then want to \ UEDFF want to compare has been 30\ UEC25 years old, should also be their own \ UEDFF a dream, childhood classic car, square, I really like this matchbox so much that I finally decide \ued22 hands. Then decided \ UED22 hand big G period ran over several 4S shop, no car! Don't have a car! Don't have a car! Or let me wait, I this person's personality is what \ UEC89, like must \ Ued83, I do not consider the later price ah, I personally feel \ Ued83 early buy is early enjoy, car is not to lose money, who let us like \ UEC89. Car buying Experience: As far as I know, the first BATCH of UEDFF cars are all foreign-owned cars with their own procedures. There is no such procedure as qualification certificate. After all, uED3C is urgent. If uED3C is in a hurry, it can also wait for the 3C procedure. In that way, it can directly go to the local DMV to apply for a license, saving the trouble of transfer of ownership like mine. Because I mentioned UEDFF GLS in Tianjin port before, I also bought UED8C successfully at that time. Sales is also in the GLS forum at that time: Tianjin Dazhuang, this car or choose in his mention, the two lift car are \ UED8C satisfaction, let a person should earn, mainly on the line, but also save the risk of being cheated. At present, my car has driven 2000\ UEC25km, I drive \ UED3C or particularly comfortable, wide vision, as for fuel consumption, about 16 left \ UEC5D on the highway, if the inner region of the uEC2520, there is 550 sound wave is not as big as 63, also more comfortable, after all, I am not young. There is a sense of science and technology of the large wide screen, there is the new G back row space than the old comfortable uEC25, seat than the old comfortable UED8C uEC25, I personally feel uED83 old G back row sit UED3C special hard, the new model does not have this phenomenon, at present all aspects are particularly satisfied, To say a point of weakness, I just feel \ UED83 high - speed wind noise is a little rough work. \ uEC25 said, I think \ Ued83 should be a big movie, weekend with my own SLR to take \uedff some photos, \uecd0 \uecd0, just make do. This is the day of the car took a few photos of the bonded area, and tianjin landmark buildings, mobile phone photography, I personally feel \ UED83 pretty beautiful \ UED22 is a blockbuster coming, HD no code coming! Then \ uED22 is the interior, do \ uECd0 preparation. Recently the most popular online \ uedff words, cheat philandering man open big G, slag female Mercedes E, \ uecd0, write how \ uec25, thanks to the home of car of the car to me help \ ued8c \ uec25, also can't always when the water army, I want to make big waves small sister 'Copy the code

The third step

Now we look at the key problem, is how to crack the font crawl. In fact, what font reverse crawl means is that the text in a web page is rendered by calling a custom TTF file, and the text in the web page is no longer text, but the corresponding font encoding.

Is it? First, we need to find the site’s custom TTF file

  • See the front TAB for the associated style for the font: myFONT

  • Look at the web source and search for myFont

  • Open the link in a new window to download the TTF file

  • Using Baidu font platform, it is found that the encoding and font are one-to-one correspondence.

The fourth step

To write it down, we need to replace the codes in the font with corresponding Chinese characters

  • Build code list
  • Build a list of corresponding Chinese characters
  • Encoding and word – by – word replacement
Font = TTFont(" wkghfvsuz1eah_vraabj9ps-ubk57.. TTF ") font.savexml ("fonts.xml") # Get list of names corresponding to all fonts uniList = font.getglyphOrder () "" uniEDFF-->'\uedff' The string is processed to correspond to the Unicode encoding 1. UniEDFF -->'\uedff' 3 -->'\uedff' 3 """ UNI_list = [] for I in uniList[1:]: I = eval(r"'\u"+ I [3:]+"'") # append(I) to uni_list Front end to coordinate representation word_list = [' is', '5', 'more', 'well,' big ', '10', 'a', 'a', 'the', 'short', 'no', 'less', '9', '3', '8', 'a', 'right', 'bad', 'nearly', 'a' and '? 'and' left 'and' is', 'long', '6', 'on', 'short', '7', 'high', '2', 'to' and 'good', 'the', 'and', '4', 'to' and 'small', For I in range(len(uni_list)): # replace the code of the corresponding position with the word of the corresponding position, for example: Replace (uni_list[I],word_list[I]) # print(content_str) print(content_str)Copy the code

This article is constantly updated. You can search “Geek night Reading” on wechat to read it for the first time. More Python learning articles are your motivation!