Today, we will continue to analyze crawler data analysis articles. Let’s take a look at the acquisition and analysis of Netease Yan Xuan product comments.

Warning: this course is only used for learning communication, do not use for commercial profit, the consequences are responsible! If this article has violated the privacy or interests of any organization group company, please contact radish delete!! Disclaimer: This is a super serious technical article, super! Level! Strict! The ‘! Please read in the spirit of learning and communication, thank you!

Netease commodity review climb

Analysis of web page

Review analysis

Enter the official website of netease Select, search “bra”, first randomly click into a product.

You.163.com/xhr/comment…

It’s not hard to see that all comment data is stored in the commentList, and we just need to save that data.

The following is how to obtain the information of itemId, which is the product ID. Let’s go back to the home page of netease Select and continue the analysis.

Obtaining product ID

When we input the key words in the search box to search, can also be found in the Network, there are a lot of requests, at this time to observe each request, by the name of the request file (here need some experience, the programmers who don’t mess up the name), we can reveal the request of the search results when you search.

Above, we have basically completed the preliminary analysis work, and now we start to write the code.

Write the code

Obtaining product ID

def search_keyword(keyword):
    uri = 'https://you.163.com/xhr/search/search.json'
    query = {
        "keyword": keyword,
        "page": 1
    }
    try:
        res = requests.get(uri, params=query).json()
        result = res['data'] ['directly'] ['searcherResult'] ['result']
        product_id = []
        for r in result:
            product_id.append(r['id'])
        return product_id
    except:
        raise
Copy the code

Here, I have obtained the product ID of page 1. The following is to obtain the comment information of different products by product ID.

From the previous analysis, we can know that the comment information is in the following form. We can easily store the information in this form into MongoDB and then slowly analyze the content in the data.

{
                "skuInfo": [
                    "Color: Skin color"."Cup code: 75 b"]."frontUserName": "1 * * * * 8"."frontUserAvatar": "https://yanxuan.nosdn.127.net/f8f20a77db47b8c66c531c14c8b38ee7.jpg"."content": "Good quality, comfortable to wear."."createTime": 1555546727635,
                "picList": [
                    "https://yanxuan.nosdn.127.net/742f28186d805571e4b3f28faa412941.jpg"]."commentReplyVO": null,
                "memberLevel": 4."appendCommentVO": null,
                "star": 5,
                "itemId": 1680205}Copy the code

For MongoDB, we can either build it ourselves or use free services online. Here I would like to introduce a free MongoDB service website: Mlab. It is very simple to use, but I will not talk more about how to use it.

Now that you have your database, let’s save it.

def details(product_id):
    url = 'https://you.163.com/xhr/comment/listByItemByTag.json'
    try:
        C_list = []
        for i in range(1, 100):
            query = {
                "itemId": product_id,
                "page": i,
            }
            res = requests.get(url, params=query).json()
            if not res['data'] ['commentList'] :break
            print("Crawl page %s comments" % i)
            commentList = res['data'] ['commentList']
            C_list.append(commentList)
            time.sleep(1)
            # save to mongoDB
            try:
                mongo_collection.insert_many(commentList)
            except:
                continue
        return C_list
    except:
        raise
Copy the code

Finally after the completion of the climb, a total of more than 7,000 data, the following can be based on personal needs to do some analysis.

conn = MongoClient(“mongodb://%s:%[email protected]:49974/you163” % (‘you163’, ‘you163’)) db = conn.you163 mongo_collection = db.you163

Product review data analysis

Now comes the exciting moment, check out the girl preference!

Preference for color

Let’s take a look at the color preferences

Then the pie chart is used to observe the proportion of different colors

The size distribution

If you haven’t researched this cup size, don’t worry, I’ve prepared a comparison table for you. Take it with you

Goods comments

Finally, let’s take a look at what women say about the product

What are some of the most popular words women use in the comments

As if into the “boast group”, it seems that the first thing that girls value is comfortable or not, after all, it is close-fitting, quality is the most important!

Well, read the above analysis, single you are not more have the impulse to take off single? If you already have a soft girl near the body, is it time to curry favor with her next to you?

The complete code

Github.com/zhouwei713/…