Make writing a habit together! This is the fifth day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

This article all tutorials and source code, software only for technical research. Does not involve the computer information system function to delete, modify, increase, interference, will not affect the normal operation of the computer information system. Do not use the code for illegal purposes, such as erasing!

Python understands cherry-king’s comprehensive word of mouth data

demand

Understand the comprehensive word of mouth advantages and disadvantages of all models of car di statistical data

The operating environment

  • win10
  • Google nexus5x (root)
  • Python3.9
  • Charles

Demand analysis

First, try the Web terminal to see if you can find the required data interface. Then, open the word-of-mouth page F12 of any car model and check the NetworkAccording to the page keyword search, no obvious data interface was found. Although request or Selenium can be used to parse data directly on the page, after all, this is not the preferred solution. Let’s analyze the solution from APP and decide which one to use. PS: The configuration of mobile phone environment and packet capture environment will not be described here. If you are interested, please refer to the previous articleConfigure the APP packet capture environment

Download the SMART Car APP and install it on your phonePostern on your phone, Charles on your PC

At this point, the bag capture work is ready to complete, open the “Understand car emperor” APP, randomly find a vehicle into the “understand car page”It’s the same old way of doing a keyword searchIt is obvious that the last two data are not needed, and the first four are returned by the same interface, so they should be the required data. Double-click to see the detailed data

The data structure and specific values are similar to the data in the page. Charles interface is too small, so the data will be copied to the page for analysis and convenient analysis, and a common one will be sharedJson data online parsing site After carefully comparing the data in the page, we found that this interface is the comprehensive word-of-mouth interface we need:

https://*******/get_detail/?series_id=4182&car_id=0&only_owner=0&year_id=all&iid=2467735824764398&device_id=400112114862 15 & ac = wifi&channel = DCD useful - and - yd - 11-74 & aid & app_name = = 36 automobile&version _code = 693 & version_name = 6.9.3 & device_platform = the and roid&os=android&ab_client=a1%2Cc2%2Ce1%2Cf2%2Cg2%2Cf7&ab_group=3167590%2C3577236%2C3333988&ssmix=a&device_type=Nexus+5X& Device_brand = google&language = zh&os _api = 27 & os_version = 8.1.0 & manifest_version_code = 693 & resolution = = 420 * 1794 & 1080 dpi & update _version_code=6931&_rticket=1648907286543&cdid=f3163204-7faf-45d7-89c4-e82215c3216c&city_name=%E8%81%8A%E5%9F%8E&gps_cit y_name=%E8%81%8A%E5%9F%8E&selected_city_name&rom_version=27&longi_lati_type=1&longi_lati_time=1648907102913&content_sort _mode = 0 & total_memory = 1.77 & cpu_name = Qualcomm + Technologies % 2 c + + Inc MSM8992 & overall_score = = 4.8872 & 4.873 & cpu_score host_abi =Copy the code

Right! You read that right, it’s this long, verify the data interface, request the URL directly in the web pageHere it is recommended to install a webpage JSON visualization plug-in, which I neglected to install. I have analyzed the JSON data online, which is the same as the data Charles caught. After analysis, it is known that series_id is the vehicle id, and you can modify this parameter

Get all vehicle id

It’s very easy to get the vehicle id, you get the brand ID and then ask for the vehicle information based on the brand ID, and notice that this is a POST interface

Def get_series(self, brand_id): """ headers = {' user-agent ': 'Mozilla/5.0 (Windows NT 10.0); Win64; X64) AppleWebKit Chrome/85.0.4183.121 Safari/ 577.36 '} param = {'offset': 0, 'limit': 1000, 'is_refresh: 1,' city_name ':' Beijing ', 'brand' : brand_id } response = requests.post(url=url, data=param, headers=headers) rep_json = json.loads(response.text) # print(response.text) if rep_json['status'] == 'success': return rep_json['data']['series'] else: raise Exception("get car series has exception!" )Copy the code

Obtain vehicle comprehensive word of mouth rating

Def get_score(self, series_id): "" Json () tag_list = response.get('data').get('tab_info').get('tag_list') data = List (#) advantages merits = [i.g et (' tag_name) + "(" + STR (i.g et (' count ')) +"), "for I in tag_list if i.g et (' sentiment ') = = 1) Data.append (merits) # defects = [i.et ('tag_name')+"("+ STR (i.et ('count'))+")" for I in tag_list if i.get('sentiment') == -1] data.append(defects) return dataCopy the code

Running effect

Download resources

Download.csdn.net/download/qq…


This article is only used for learning and communication, such as erudite!