Xiamen is really more than Gulangyu

Today’s article is about Xiamen. This is because I spent many years in Xiamen. From graduation to work, I spent a very beautiful youth in Xiamen. As a foreigner, I really think Xiamen is a garden city with beautiful environment and artistic atmosphere. Once by the United States president Nixon as the “Oriental Hawaii”, in the author’s opinion, is really worthy of the name 😃

Xiamen has always been a popular 🔥 tourist city in China, many people are going to Gulangyu, after all, it is famous. In my memory, I have been to Gulangyu for at least 7 times. Every time my family members or friends came, I would take them or provide them with a simple guide. But Xiamen is really more than Gulangyu.

Gulangyu island

Let’s start with a picture of Gulangyu. Once, the author took a friend to the top of The Sunlight Rock (the highest peak of Gulangyu). There are so many tourists on the top that you can see Gulangyu Island, Xiamen landmark Shimao Twin Towers and Xiamen Bay. It is indeed very beautiful 😃

The data source

The data used in this article is obtained from a website, and the specific process is explained in detail.

Crawl field

A total of 6 fields have been climbed, which are:

  1. Chinese name cn_title
  2. English name en_title
  3. Guide the strategy
  4. Comment on the comment
  5. Top ranking
  6. Brief Introduction of scenic spots

Web rule

1, enter the website: travel.qunar.com/p-cs299782-… We found up to 10 attractions per page, totaling 126 pages:

Webpage rules can be constructed as:

for i in range(1.127):
  url = "https://travel.qunar.com/p-cs299782-xiamen-jingdian-1-{}".format(i)  # Rules of the web
Copy the code

2, Let’s look at the location of the six fields in the source code, right-click and choose “Check”, in the review element elements to find 10 spots per page, where each

  • pair represents a spot.

    3. Look at the position of each field

    We can locate each field through the following three pictures. Only when the field is located can we parse it out.

    Let’s parse each field out

    Import corresponding libraries

    To crawl a field, you need to import various libraries. The main functions of these libraries include:

    • Send network request requests
    • Re module for regular parsing data
    • Json packages handle Python dictionary type data
    • CSV is used to save the crawled data
    • Pandas and numpy handle the crawled data
    • Plotly_express and Pyecharts
    import pandas as pd
    import numpy as np
    import re
    import csv
    import json
    import requests
    import random
    
    # participle
    import jieba
    
    # drawing
    # import plotly. Express as px
    import plotly_express as px
    import plotly.graph_objects as go
    
    from wordcloud import WordCloud  # word cloud
    import matplotlib.pyplot as plt
    from plotly.graph_objects import Scatter,Bar
    Copy the code

    First page data request

    Let’s take a look at the first page of data request, look at the specific source code

    Some webpage data are as follows:

    Match the field

    Here is a regular expression to match each field using findAll () in the re module:

    1. Names of scenic spots in Chinese

    Sometimes we need to check if the number of pages is 10, so we print out the length, which is exactly 10

    2. English names of scenic spots

    3. Number of walkthroughs

    4. Number of comments

    5. Scenic spots ranking

    The ranking of scenic spots is quite special. When I climbed the data, I found that some scenic spots were not ranked. For example, many of the scenic spots on page 16 were not ranked and needed special ranking.

    In the code above, if the ranking of attractions does not exist, 0 is used instead:

    6. Introduction to scenic spots

    Cut in crawl

    Here is the source code to crawl the entire site, including:

    • Web URL construction
    • Send a request to get the source code
    • Field resolution process, special case handling
    • Save the file

    The data processing

    Pandas is used to read the text in pandas:

    df = pd.read_csv("Scenic spots in Xiamen. CSV")
    df.head()  Extract the first 5 data items
    Copy the code

    Check the length of the data and the field type, and see if there are any missing values:

    • You can see that three fields are of type string: Object
    • The other three fields are of type INT64

    df.isnull().sum(a)# Check for missing values
    
    The results showed that 369 English names were missing, and 1121 scenic spots had no introduction
    Copy the code

    Chinese name cn_title

    First, let’s look at the names of scenic spots. Xiamen is a garden city. There are many parks on the island. Let’s look at the statistics to see how many parks there are in Xiamen:

    • str.contains(): String contains functions
    • reset_index(): Rearranges the indexes in pandas
    # 1- How many parks
    
    park = df[df["cn_title"].str.contains("Park")].reset_index(drop=True)
    park
    Copy the code

    Conclusion: Data show that there are 107 parks in and around Xiamen Island

    Let’s take a look at the parks by ranking field and see which ones are popular:

    new_park = park[park["ranking"] != 0].sort_values(by=["ranking"]).reset_index(drop=True)
    new_park[:20]  # Take out the top 20 parks
    Copy the code

    From the data, the top three are: railway culture Park, Zhongshan Park, Wuyuanwan Wetland Park

    1. Railway Culture Park: THE author has been to it once. Near Jinbang Park, near Jinbang Park, is the extension line of Yingxia (Jiangxi Yingtan to Xiamen) railway, as well as the old railway tracks

    2, Zhongshan Park: many places should have Zhongshan Park, to commemorate Dr. Sun Yat-sen, there will be a lot of activities here

    3. Wuyuanwan Wetland Park: Wuyuanwan reminds people of tuhao people in Xiamen. Wuyuan Bay can be said to be the tuhao district of Xiamen

    Baeknojoo Park, Chunggrun Park, and Mt. Cheonjoo Forest Park outside the island are all worth visiting. Let’s take a look at some famous streets in Xiamen:

    # 2- Famous Street
    street = df[df["cn_title"].str.contains("Street")].reset_index(drop=True)
    street.head(10)
    Copy the code

    There were 37 pieces of data, and we took the first 10 pieces of data

    Zhongshan Road pedestrian street is really popular: a variety of local Southern Fujian snacks, milk tea, Taiwan specialties, Xiamen landmark building – arcade building. Every holiday is traffic jam, people mountain sea.

    Ting O Tsai Cat Street is also popular. Not far from the south gate of Xiamen University, I have been there several times. Inside, there is a shop selling cats, which is super popular 🔥 finally, let’s look at the attractions related to college:

    It can be seen that basically 17 scenic spots related to xiamen university have been contracted by 3 universities:

    • Xiamen University: The most beautiful university in China
    • Jimei university
    • Huaqiao university

    Xiamen University used to be accessible to tourists. In recent years, the number of students has been limited, and you need to make an appointment to enter the university, so if you want to go to Xiamen University, please make an appointment in advance

    But if you have relatives or friends reading in it, I heard that you can take in oh 😃 secretly tell you. Put a picture of xiamen shangxian field, the author has taken.

    Scenic spot ranking

    Let’s go straight to the ranking to see which scenic spots are popular:

    # remove ranking=0; Rank in ascending order;
    # Take out the top 10 attractions
    ranking  = df[df["ranking"] != 0].sort_values(by=["ranking"],ascending=True)[:20].reset_index(drop=True)
    
    px.bar(ranking,  # Incoming data box
           x="cn_title".# horizontal field
           y="ranking".# vertical axis field
           color="ranking"  # Color display field
          )
    Copy the code

    Top of the list is gulangyu 😭. Shell 🐚 Dream World, Xiamen Undersea World, Xiamen Da Deji Bathing Beach, Sunshine Rock and so on are the scenic spots on Gulangyu Island. So Gulangyu is really very popular

    Xiamen University and the Adjacent Nanputuo Temple are also popular with tourists. A few years ago, Xiamen built a new landmark: the Twin Towers, and a lot of tourists went there.

    Strategy for the strategy

    Many tourists like to write some tourist guides for others to refer to after they arrive at scenic spots. Let’s take a look at the number of popular tourist guides:

    px.scatter(df,  # Drawing data
               x="cn_title".# transverse and longitudinal axis
               y="strategy",
               color="strategy"  # color tag
              )
    Copy the code

    Conclusion: The data shows that Xiamen University is the most popular scenic spot for tourists to write guidebooks, followed by Nanputuo Temple and Zhongshan Road pedestrian Street.

    Number of comments on the comment

    Let’s take a look at the number of tourist comments; Take out the top 20 scenic spots after descending order, and display the top 10 data:

    comment = df[df["comment"] != 0].sort_values(by=["comment"],ascending=False)[:20].reset_index(drop=True)
    comment.head(10)
    Copy the code

    px.scatter(comment,  # data frame
               x="cn_title".# transverse and longitudinal axis
               y="comment",
               color="comment"  # color
              )
    Copy the code

    According to the number of comments and ranking, we draw a multi-graph combination:

    fig = px.scatter(comment,   # data frame
                     x="ranking".# transverse
                     y="comment".# on the vertical
                     color="ranking".# color
                     marginal_y="violin".# y graph
                     marginal_x="box".# X-axis graph
                     trendline="ols".# the trend line
                     template="simple_white")  # template
    fig.show()
    Copy the code

    Ranked the first or drum wave island 😭 Xiamen University, South Putuo Temple, zhongshan Road pedestrian street followed.

    Brief introduction to the abstract

    Finally, we analyze the brief introduction of scenic spots on the website. Here we use WordCloud to draw the WordCloud. Start by populating the empty values in the introduction.

    abstract = df.fillna(value="")  # Missing value fill
    
    abstract_list = abstract["abstract"].tolist() # Show the top 10 profiles
    abstract_list[:10]
    Copy the code

    Next we use jieba participle and add each participle to a big list:

    jieba_list = []
    
    for i in range(len(abstract_list)):
      	# jieba participle
        seg_list = jieba.cut(str(abstract_list[i]).strip(), cut_all=False)
        for each in list(seg_list):
            jieba_list.append(each)
    
    jieba_list[:10]
    Copy the code

    First time: Drawing directly using Wordcloud

    from wordcloud import WordCloud
    import matplotlib.pyplot as plt
    
    text = "".join(i for i in jieba_list)   # String to be processed
    
    # Download the simhei. TTF font and place it in one of your own directories
    font = r'/Users/peter/Desktop/spider/SimHei.ttf'
    
    wc = WordCloud(collocations=False, 
                   font_path=font, # path
                   max_words=2000,width=4000,
                   height=4000, margin=2).generate(text.lower())
    
    plt.imshow(wc)
    plt.axis("off")
    plt.show()
    
    wc.to_file('xiamen.png')  # Save the word cloud
    Copy the code

    Seen from the word cloud, xiamen and Gulangyu are very prominent in the introduction. Of course, there are also a lot of invalid words, such as: located, here, etc., next we use to stop times table for processing, stop words table is collected online:

    Draw it again: the stop words list is self-collected
    
    Create a list of stop words
    def StopWords(filepath) :
        stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()]
        return stopwords
    
    Pass the path to the stop word table
    stopwords = StopWords('/Users/peter/Desktop/Publish/nlp_stopwords.txt')
    
    stopword_list = []
    for word in jieba_list:  
        if word not in stopwords:
            ifword ! ="\t" andword ! ="":  
                stopword_list.append(word)
    
    stopword_list[:10]
    Copy the code

    After using the stop list, I find that commas and other punctuation have been removed, and many worthless words have also been removed. Next, we use the picture of the beautiful woman as the background to draw the word cloud map:

    from os import path
    from PIL import Image
    import numpy as np
    import matplotlib.pyplot as plt
    
    from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
    
    d = path.dirname('. ')   # Use this code in the IDE
    # d = path.dirname(__file__)
    
    Pass in a new word list
    text = "".join(i for i in stopword_list)
    
    # https://www.deviantart.com/jirkavinse/art/Real-Life-Alice-282261010
    alice_coloring = np.array(Image.open(path.join(d, "wordcloud.jpg")))
    
    # Set the stop word
    stopwords = set(STOPWORDS)
    stopwords.add("said")
    
    # font path
    font = r'/Users/peter/Desktop/spider/SimHei.ttf'
    
    wc = WordCloud(background_color="white", font_path=font,
                   max_words=2000, mask=alice_coloring,
                   height=6000,width=6000,
                   stopwords=stopwords, max_font_size=40, random_state=42)
    
    wc.generate(text)
    
    image_colors = ImageColorGenerator(alice_coloring)  
    
    plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
    plt.axis("off")
    plt.show()
    wc.to_file('xiamen. PNG')  # Save the word cloud
    Copy the code

    The final drawing is shown as follows:

    conclusion

    Based on a data from the Internet, this article analyzes the relevant scenic spots in Xiamen and has a look at what people like to go to in Xiamen:

    • Gulangyu is so popular that it is almost a must for tourists
    • Scenic spots in Xiamen University area: Siming Campus of Xiamen University (Furong Lake, Furong Tunnel, Songen Building, etc.), The South Putuo Temple next to it, the Five Old Peaks above the temple
    • If you like the sea and cycling, go here: Xiamen University Baicheng Beach, Yanwu Bridge, Huandao Road, Hulishan Fortress, Coconut Wind Village
    • If you are a food lover, you should go to: Zhongshan Road pedestrian Street, Zeng Cuo 埯, Taiwan Snack Street
    • If you are a literary young man, shapowei Art West you can not miss

    Last sentence: Xiamen welcomes you! 😃

    Everything that seems to have passed away has never left. The love and warmth you have given me make me guard this place persistently.

    You cottage, a sweet cottage. Cabin master, one hand code for survival, one hand cooking spoon to enjoy life, welcome your presence 😃

    Welcome to scan the code to pay attention to the wechat public number: You and hut, take you to the entry data, take you to learn to do food 😃