It takes about 7 minutes to read the text.



01 Capture the Target

This time our goal is to climb the top 10 coldest cities in the country.

First open the target website – China Weather network.

“Http://www.weather.com.cn/textFC/hb.shtml”

We’re going to climb the lowest temperatures in all the cities in the country, and then we’re going to get the top 10 cities with the lowest temperatures, and we’re going to create a pie chart.


02 Preparations

Because of the small amount of data to crawl, consider using the “Beautiful Soup” method to crawl.

In addition, the last need to generate according to the data pie chart, need to install “Pyecharts” and compatible library, otherwise the console will report an error.

# Install delicious soup
pip3 install bs4

Html5lib improves compatibility
pip3 install html5lib

Install the graphics library
pip3 install pyecharts

Install the graphics compatibility library
pip3 install pyecharts-snapshotCopy the code


Note: General crawlers use LXML to parse, but because there are many tags in the source code of “Hong Kong, Macao and Taiwan” page of China Weather net are not closed correctly, so html5lib is used to parse data.



03 Crawl to get ideas

First of all, we can see the country is divided into north China, northeast China, East China, central China, South China, northwest China, Southwest China, Hong Kong, Macao and Taiwan, 8 regions to display the weather data.

There are 8 regions in total, including: North China, Northeast China, East China, Central China, South China, Northwest China, Southwest China, Hong Kong, Macao and Taiwan
# in north China
url_hb = 'http://www.weather.com.cn/textFC/hb.shtml'

# the northeast
url_db = 'http://www.weather.com.cn/textFC/db.shtml'

# east China
url_hd = 'http://www.weather.com.cn/textFC/hd.shtml'

# huazhong
url_hz = 'http://www.weather.com.cn/textFC/hz.shtml'

# of south China
url_hn = 'http://www.weather.com.cn/textFC/hn.shtml'

# the northwest
url_xb = 'http://www.weather.com.cn/textFC/xb.shtml'

# southwest
url_xn = 'http://www.weather.com.cn/textFC/xn.shtml'

# Hong Kong, Macao and Taiwan
url_gat = 'http://www.weather.com.cn/textFC/gat.shtml'
Copy the code

We first need to obtain the weather data of all cities in each region, and then sort the data and write it into the graphics file.


04 Code Implementation

First, weather data for cities in “North China” were collected.

The following rules can be obtained:

All 6 divs of “class=conMidtab” store all cities in North China, including the weather data within today and the next week.

Each div with “class=conMidtab2” represents weather information for a province.

The weather data of cities under provinces are included under the table label. Starting with the third TR tag is the weather data for each city.

soup = BeautifulSoup(text, 'html5lib')

div_conMidtab = soup.find('div', class_='conMidtab')

# 3. Retrieve all subtags of the table
tables = div_conMidtab.find_all('table')

# 4. Traverse the provinces below the zone
for table in tables:
     Filter out two tr data in the header
     trs = table.find_all('tr') / 2:# 5. Traverse the urban areas below the province
     for index, tr in enumerate(trs):
          tds = tr.find_all('td')

          # 5.1 City name [1st TD tag]
          # note: the first city in a province takes the second TD tag; Other cities take the first TD tag
          city_td = tds[1] if index == 0 else tds[0]

          city = list(city_td.stripped_strings)[0]

          # 5.2 Minimum air temperature
          temp_low_td = tds[-2]

          temp_low = list(temp_low_td.stripped_strings)[0]

          ALL_DATA.append({"city": city, "temp_low": int(temp_low)})
Copy the code

Next, loop through the list of eight regions across the country to get the names and temperature data for all the cities.

for index, url in enumerate(url_areas):
      print('Start climbing the {} region'.format(index + 1))
      parse_page(url)
      time.sleep(1)
Copy the code

Once you have a list of all the cities and temperatures, you can sort the data in “ascending” order by temperature.

def analysis_data():
    """Analyzing the data climbing down :return:"""

    # 1. The default sort is ascending [sort by lowest temperature]
    ALL_DATA.sort(key=lambda data: data['temp_low'])

    # 2. Get the first 10 data
    top_10 = ALL_DATA[:10]

    return top_10
Copy the code

Finally, the data is written to the pie chart.

def show_with_chart(top_10):
    """10 cities and temperature generated pie chart :param top_10: :return:"""
    # Take each item in the list TIP_10 and put it into an anonymous function, then assemble a new list
    # 1. Get the list of cities
    citys = list(map(lambda item: item['city'], top_10))

    # 2. List of minimum temperatures
    temp_lows = list(map(lambda item: item['temp_low'], top_10))

    # 3. Generate a pie chart and write it to an HTML file
    bar = Bar("Lowest temperature List")

    bar.add("Minimum temperature", citys, temp_lows)

    # rendering
    bar.render('temperature.html')Copy the code



05 Crawl results

Finally, open the generated pie chart, you can intuitively view the 10 cities with the lowest temperature today.


This article was first published on the public account “AirPython”. You can get the full code by replying to “Cold” in the background.