Autumn September, osmanthus fragrance, in this crisp autumn, sunny harvest season, we sent away a summer balance exhausted crying to the campus of the children, and will soon usher in the annual great mother’s birthday party (no work, can’t wait to celebrate the mother of the motherland).


Which begs the question, where to play? Baidu lost a “National Day”, out of the first actually is “where to travel less”… Emmmmmmm, due to lack of thinking hall.


So I came up with the (very dangerous) idea of judging recent tourist sites by their sales.


So the goal of this time is to crawl to the site page, and get the information of the site, we can think about the need for a few steps.




This article is recommended for those who have some foundation in Python and front-end (HTML, JS). Cough cough, can not always more small white text, so it seems that I do not enough professional (xi).


Baidu map API and Echarts


Because the first few crawlers are crawling some text information, do a word cloud and so on, I think: no! Meaning! Scarlett! ! This time I happened to be climbing data, I decided to use data’s good gay friend — graph to output the data I climbed, that is to say, I want to use the sales volume of scenic spots and the specific location of scenic spots to generate some visual data.


Take a look at The Map API of Baidu and Echarts. The former is a dedicated map API tool, I heard that many apps are using it, and the latter is a good partner for data processing and home travel. After using it, it is good and I am good (I feel something is wrong).


What is an API? An API is a programming interface to an application. Like a plug and socket, our program needs electricity (what is this program?). Electricity is provided in the socket, we just need to write a plug interface matching the socket in the program, and we can use electricity to do what we want to do, without knowing how the electricity is generated.



Baidu thermal map after data introduction



Developers and service providers connected by API

Determine the output file

Some people might say, I already know what API means, but how to use it. I’m going to tell you this responsibly: neither will I.


But! Baidu Map provides a lot of API usage examples, which can be roughly understood if they are based on HTML. If they are based on JS, they can try to change the function (if they are not JS, I silently copy the source code). After carefully observing the source code, we can know that the main data generated by the thermal map are stored in the variable points.


[{x:x,x:x},{x:x,x:x}] is a kind of JSON format data, because it is self-descriptive, so it is easy to understand, probably can know here three values, the first two are latitude and longitude, the last should be the weight (I guess).


That is to say, if I want to generate the popularity of a scenic spot into a heat map, I need to get the longitude and latitude of the scenic spot as well as its weight. The sales volume of the scenic spot can be used as the weight, and the data should be presented in JSON format.


Echarts does the same (*^__^*).

Crawl data

This time the crawler part is relatively easy. Analyze the url (where to go to scenic spots)→ climb the information in the page (longitude and latitude of scenic spots, sales volume)→ convert to JSON file.


Analysis where attractions page web site, can be concluded that structure: http://piao.qunar.com/ticket/list.htm?keyword= search site ® ion = & the from = mpl_search_suggest & page = number of pages


Instead of using regex to match the content, xpath matches are used, which is very useful.



def getList():

Place = raw_input(' please enter the search region, type (e.g. Beijing, hotspot, etc.) : ')

Url = 'http://piao.qunar.com/ticket/list.htm?keyword=' + STR (place) + '® ion = & the from = mpl_search_suggest & page = {}'

    i = 1

    sightlist = []

    while i:

        page = getPage(url.format(i))

        selector = etree.HTML(page)

Print ' '+ STR (I) +' '

        i+=1

        informations = selector.xpath('//div[@class="result_list"]/div')

For INF in informations

            sight_name = inf.xpath('./div/div/h3/a/text()')[0]

            sight_level = inf.xpath('.//span[@class="level"]/text()')

            if len(sight_level):

Sight_level = sight_level[0].replace(' scenic ', ")

            else:

                sight_level = 0

            sight_area = inf.xpath('.//span[@class="area"]/a/text()')[0]

Sight_hot = inf. Xpath ('. / / span [@ class = "product_star_level"] / / span/text () ') [0]. Replace (' heat ', ')

            sight_add = inf.xpath('.//p[@class="address color999"]/span/text()')[0]

Sight_add = re. Sub (' address: | (. *?) | \ (. *? \ |,. *? $| \ /. *? $','',str(sight_add))

            sight_slogen = inf.xpath('.//div[@class="intro color999"]/text()')[0]

            sight_price = inf.xpath('.//span[@class="sight_item_price"]/em/text()')

            if len(sight_price):

                sight_price = sight_price[0]

            else:

                i = 0

                break

            sight_soldnum = inf.xpath('.//span[@class="hot_num"]/text()')[0]

            sight_url = inf.xpath('.//h3/a[@class="name"]/@href')[0]

sightlist.append([sight_name,sight_level,sight_area,float(sight_price),int(sight_soldnum),float(sight_hot),sight_add.rep Lace (' address: ','), SIGHT_slogen, SIGHT_URL])

        time.sleep(3)

    return sightlist,placeCopy the code


  • All the information for each attraction is climbed down here (just to practice using xpath…). .
  • The while loop is used, and the for loop breaks by assigning zero to the value of I when no sales are found, so the while loop ends at the same time.
  • The matching of addresses uses the re.sub() function to strip out n bits of complex information, as explained later.


Output local text


The text text is stored in an Excel file to protect the code from errors and to keep it peaceful. The text text is stored in an Excel file for future reference.

def listToExcel(list,name):

Df = pd DataFrame (list, the columns = [' names' and 'level', 'area', 'start at', 'sales',' heat ', 'address', 'sign', 'website for details'])

Df.to_excel (name + 'XLSX ')Copy the code


Baidu latitude and Longitude API


Very sadly, I couldn’t find the longitude and latitude to go to any scenic spots, so I thought my bi plan would be aborted. (If anyone knows where the longitude and latitude of scenic spots are, please let me know)


However, Enhahhahahaha, how can I give up, I found baidu latitude and longitude API.


http://api.map.baidu.com/geocoder/v2/?address= address & output = json&ak = baidu key, modify the url in the “address” and “baidu key”, open the browser, you can see the json information of the latitude and longitude.

# Latitude and longitude information of Shanghai Oriental Pearl

{" status ": 0," result ": {" location" : {" LNG ": 121.5064701060957," lat ": 31.245341811634675}," precise ": 1," confidence ": 70," level ": "UNKNOWN"}}Copy the code


Baidu key application methods: http://jingyan.baidu.com/article/363872eccda8286e4aa16f4e.html


So I can according to climb to the scenic spot address, check the corresponding latitude and longitude hot! Python gets latitude and longitude JSON data as follows:

def getBaiduGeo(sightlist,name):

Ak = 'key'

    headers = {

'the user-agent' : 'Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'

    }

Address = address

    url = 'http://api.map.baidu.com/geocoder/v2/?address=' + address  + '&output=json&ak=' + ak

    json_data = requests.get(url = url).json()

    json_geo = json_data['result']['location']Copy the code


Observe the obtained JSON file, the data in location is basically the same as the JSON format required by Baidu API, and the sales volume of scenic spots needs to be added to the JSON file. Here you can learn about the shallow copy and deep copy of JSON. Finally, the compiled JSON file is exported to a local file.

def getBaiduGeo(sightlist,name):

Ak = 'key'

    headers = {

'the user-agent' : 'Mozilla / 5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'

    }

    list = sightlist

    bjsonlist = []

    ejsonlist1 = []

    ejsonlist2 = []

    num = 1

    for l in list:

        try:

            try:

                try:

                    address = l[6]

                    url = 'http://api.map.baidu.com/geocoder/v2/?address=' + address  + '&output=json&ak=' + ak

                    json_data = requests.get(url = url).json()

                    json_geo = json_data['result']['location']

                except KeyError,e:

                    address = l[0]

                    url = 'http://api.map.baidu.com/geocoder/v2/?address=' + address  + '&output=json&ak=' + ak

                    json_data = requests.get(url = url).json()

                    json_geo = json_data['result']['location']

            except KeyError,e:

                    address = l[2]

                    url = 'http://api.map.baidu.com/geocoder/v2/?address=' + address  + '&output=json&ak=' + ak

                    json_data = requests.get(url = url).json()

                    json_geo = json_data['result']['location']

        except KeyError,e:

            continue

        json_geo['count'] = l[4]/100

        bjsonlist.append(json_geo)

        ejson1 = {l[0] : [json_geo['lng'],json_geo['lat']]}

        ejsonlist1 = dict(ejsonlist1,**ejson1)

        ejson2 = {'name' : l[0],'value' : l[4]/100}

        ejsonlist2.append(ejson2)

Print (num) + STR (num) + num

        num +=1

    bjsonlist =json.dumps(bjsonlist)

    ejsonlist1 = json.dumps(ejsonlist1,ensure_ascii=False)

    ejsonlist2 = json.dumps(ejsonlist2,ensure_ascii=False)

    with open('./points.json',"w") as f:

        f.write(bjsonlist)

    with open('./geoCoordMap.json',"w") as f:

        f.write(ejsonlist1)

    with open('./data.json',"w") as f:

        f.write(ejsonlist2)Copy the code


In the setting of the longitude and latitude of the address, in order to match the more accurate latitude and longitude, I chose to match the scenic spot address, however, there are all kinds of magical scenic spot address, with parentheses to explain in XX opposite, said a pile of you should turn left turn right turn all kinds of turn can go to, and English……


Hence the complex removal information in Chapter 3 (I’m finally back!). .


However, even after removing the complex information, there are still some scenic spots that don’t match, so I use a nested try. If the scenic spots don’t match; Match the scenic spot name, if the scenic spot name does not match; I’ll match the location of the scenic spot, and if I still can’t, THEN I… Then I can… Then I will leap over the remaining…


How can you, as a destination, be so hard to find! I don’t want you!


Here generated three JSON files, one is for baidu map API introduction, the other two are for Echarts introduction.


The web page reads the JSON file


Copy the source code in the Baidu Map API example described in chapter 2 into the interpreter, add the key, save it as AN HTML file, and open it to see the same display effect as on the official website.


For echarts, click EN in the upper right corner of the page to switch to the English version, and then click Download Demo to download the full source code.


Modify the page source code according to the HTML import JSON file, import json file.

# Baidu Map API sample code in each location modification part

<head>

< script SRC = "http://libs.baidu.com/jquery/2.0.0/jquery.js" > < / script >

</head>

<script type="text/javascript">

    $.getJSON("points.json", function(data){

        var points = data;

Existing functions in script;

        });

</script>Copy the code


If you are using jQuery, you will not be able to display the web page in your browser. If you are using jQuery, you will not be able to display the web page in your browser.



To create a server, go to the HTML file folder on the terminal, type python -m SimpleHTTPServer, and open http://127.0.0.1:8000/ in the browser. Remember to set the HTML file name to index.html.



Afterword.


I only get 6K latitude and longitude apis per day because I’m registered but don’t have a certified developer account (a good excuse to be lazy), so I chose the top 400 pages (15 per page) of top attractions.


As can be expected, to debug additional bugs due to the increase in data, the final scenic data is about 4500 pieces (crawl time is September 10, 2017, crawl keywords: popular scenic spots, only represent the sales at that time).


Hot spots heat map




Schematic diagram of hot Spots


These are the hot places on the map, and I think it looks like this on The Fourth of July



such




And something like this








When extracting the Top20 selling spots on the map, most of them are familiar locations, with the imperial capital’s Forbidden City taking the first place, while greater sichuan takes three of the top five, and sichuan province takes six of the Top20 spots.


If not because of the earthquake, I think there will be more hot spots into the list of the ~ so if you want to go to sichuan this National Day, can imagine the scene: all all all all all all all all all all all all all all all all all all all all all all all all all…



Top20 sales of popular scenic spots


So I made a list of the number of hot scenic spots in each city. Unexpectedly, out of more than 4,000 hot scenic spots, the largest number turned out to be Zhejiang, which is 1.5 times that of the second city. And Beijing, as the capital, is also… It can be said that the number of scenic spots/total area of the first.





Number of popular attractions in major cities


How many hot attractions do these cities have, and what level of attractions are they? As can be seen from the figure below, scenic spots at all levels in each city are positively correlated with the total popular scenic spots in the city, and are mainly contributed by 4A scenic spots.



Level of popular attractions in major cities


Now that you know where to go and where to go, let’s take a look at the places that cost the most money.


The figure below is a circle fan-shaped by the maximum and minimum value of the starting price of scenic spots in each city, among which Hubei province occupies the first place with the starting price of single scenic spot of 600.


But it can also be seen that the average sales price of scenic spots in Hubei is not high (the dark blue line in the red fan). And if you spend the holiday in Hong Kong, be prepared to lose weight psychologically and physically (•̀ω•́)✧.




Provincial tourist attractions sales starting price


ヾ(* φ ω φ) Halloween Everyone can have fun.

PS: I wrote a web page to show the thermal map effect of Baidu Map and the scenic spots list of Echarts for everyone to check.


Thermal force effect: easyinfo.online

Gayhub source: github.com/otakurice/n…


After writing this article, I found that Echarts has modules for Python, so I plan to learn some Web frameworks like Django and Flask, and I will learn some pure stream of consciousness articles in the near future


References:

1. Map API:developer.baidu.com/map/referen…

2. echarts:echarts.baidu.com/

3. The API usage examples: developer.baidu.com/map/jsdemo….

4. json:www.runoob.com/json/json-t…

5. xpath:www.runoob.com/xpath/xpath…

6. pandas:python.jobbole.com/84416/

7. Baidu’s longitude and latitude api:lbsyun.baidu.com/index.php?t…

8. Shallow and deep copy: python.jobbole.com/82294/

9. HTML Import JSON file: www.jb51.net/article/366…



Via by Great Luck Millet sauce