Hello, I’m Genius.

As we have watched the Fast & Furious series many times, we were expecting it until we were overwhelmed by the illogical and exaggerated plot of this “science fiction” & “superhero” movie.

Open t Douban, we can find that after two days of release, 37,700 people gave the lowest score of 5.6 for The Fast And Furious series.

So pull crotch of word of mouth, so the audience men are how to say? Let’s take a look at the short comments on Douban!

Before we start crawling, we take a sneak peek at some of the exaggerated scenes from the trailer.

Source: Hot movie

A car with a rocket launcher and a space suit is fun

On the collapsing wooden bridge, as long as the speed can safely ashore

He stepped on the gas pedal and hung on the rope of the bridge, swinging safely to the opposite side of the cliff

All right, isn’t that exciting!

1. Reptile explanation

Douban short comments need to log in to Douban to view more, so in the process of climbing, you need to log in to the account to get cookies for later use.

Because the crawler of Douban short comments is relatively simple and there are many cases on the Internet, we will not do a very detailed explanation here, we will do a simple introduction to the three parts of request web pages, parsing data and storing data.

In addition, we will upload the complete code to the case library of the public number, and everyone can reply “955” in the background to get the executable code.

1.1. Importing the tool library

This involves data requests, re re for parsing, and PANDAS (OS file manipulation) for storing data.

import requests
import re
import pandas as pd
import os
Copy the code

1.2. Request web pages

We found the following information according to the basic crawler process F12 — > Page turning — > look at the changes:

About the change rule of the request address URL of short comments, and the meaning of relevant parameters, and then we can build the function of the request page data as follows:

# Request web page data
def get_html(tid,page,headers,_type) :
    """ TID: indicates the id of the product. For example, the product ID of Fast & Furious 9 is 25728006 Page: Short comment page number, 0-24 headers: request header, which requires browser and cookie information _type: Comment type (good comment: h, medium comment: M, Bad comment: L).
    url = f'https://movie.douban.com/subject/{tid}/comments? '
    
    params = {
        'percent_type': _type.'start': page*20.'limit': 20.'status': 'P'.'sort': 'new_score'.'comments_only': 1.'ck': 't9O9',
        }
    
    r = requests.get(url, params= params, headers=headers)
    # Request data is JSON
    data = r.json()
    
    html = data['html']   
    # We re processing, so we first remove null characters
    html = re.sub('\s'.' ',html)
    
    return html
Copy the code

1.3. Parse data

Since we use regular expressions to parse the data, we can find the node area where the required data is located, and then write regular rules.

For example, in the getting reviews section:

comment = re.findall('"short">(.*?) ', html)
Copy the code

The complete analysis of authors, dates, comments, useful data, and stars is as follows:

# Parse data
def get_data(html) :
      
    df = pd.DataFrame(columns=['author'.'date'.'comment'.'vote_count'.'star'])
    df.author = re.findall(', html)
    df.date = re.findall('"comment-time"title=".*?" > (. *?) ', html)
    df.comment = re.findall('"short">(.*?) ', html)
    df.vote_count = re.findall('"votesvote-count">(\d+)</span>',html)
    # df.star = re.findall('<spanclass="allstar(\d+)rating"',html)

    return df
Copy the code

1.4. Data store

The data is stored as a CSV file mainly for the convenience of appending.

In this case, check whether the file exists before storage. If the file exists, the append mode is adopted. Otherwise, the file is directly written.

In addition, set encoding=’ UTF_8_sig ‘. Otherwise, open the file directly and Chinese garbled characters may appear.

# Store data
def save_df(df) :
    if os.path.exists('data.csv'):
        df.to_csv('data.csv',index=None,mode='a',header=None,encoding='utf_8_sig')
    else:
        df.to_csv('data.csv',index=None,encoding='utf_8_sig')
Copy the code

As for data storage, we will consider a special presentation later, mainly about the method of single-tab and multi-tab append storage when stored in Excel.

1.5. Final supplement

Parameter Settings, function execution conditions, and so on. See the following code:

if __name__ == '__main__':
    headers = {
        "User-Agent": "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36"."Cookie": Copy the cookie from your login,}# Product ID, such as fast & Furious 9 product ID is 25728006
    tid = 25728006
    # Evaluation type (good: H, Middle: M, bad: L)
    _type = 'l'
    for page in range(30):
        html = get_html(tid,page,headers,_type)
        df = get_data(html)
        save_df(df)
        print(f'{page+1}Page evaluation has been collected.. ')
Copy the code

Result Preview:

2. Evaluate word clouds

We can see douban short comments, praise only 24%!!

2.1. Praise cloud

In the praise, although we also talked about the plot of the crotch, but more important is the visual sense of racing cars and flying into outer space, Han’s return (although the logic of the plot is not very logical) and the release of feelings for Paul, etc., are all the points that attracted them. For popcorn or a movie, it’s still fine.

In the praise of the more recognized is a call to travel the comments of the audience, he is concerned about the series as a whole, although the story of the ghost but the imagination is the top and the final feelings, this should also be the majority of praise of the man’s unanimous feelings:

No matter how many speed to bring out the department is always my favorite series / / plot nonsense story old-fashioned fantastical imagination but every era is a movie industry benchmarking and action film we don’t just want to use the ceiling of imagination to pursue could never have a second life / / at the end of the familiar with blue appeared tears be stretched don’t live in fact he has been At least always in the hearts of fans

2.2. Chinese comment cloud

To comment on the audience men chat most of the plot is outrageous, the logic of the story is illogical, exaggerated into outer space and the resurrection of the voyage and so on.

In the comment section, in fact, the majority of the recognition is made with sarcastic teasing, such as:

Mom, the next direct fast and Furious 10: Star Wars. — a micron

Any hero movie should end up in space

If there was a teacher card to call, this point should be able to go up some. One hour of chirping would be more exciting. — The traveler of Sodom

2.3. Negative comment cloud

More than 42 percent of the negative comments were mostly about the plot, with the most egregious being the absurdity of going into outer space. Some feelings in the middle of the plot, even let a lot of the audience men fell asleep…

As a matter of fact, the F9 without Dwayne “The Rock” Johnson, I think it is also worth a bad review, you see the plot of the stray to outer space is not.

Musatoshi Yafu, from Chaoyang District, received a lot of thumbs up from the audience. His comments are as follows:

Bang bang bang BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOMBOOM BOOMBOOM blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah Blablabla whew, BOOM! Wearefamily! The end.

The bad reviews from Ning, who took time off to watch the movie, mainly made fun of the plot setting, and also received many likes:

Carsdon’tfly? It’s really good to be rich. If you don’t have money, you can’t burn it up. It’s definitely new material for teasing bloggers. It’s not just Newton’s coffin that can’t be closed, schoolchildren have to jump up and say something.

We talked about selling feelings, in F9 seems to have been magnified a lot, but this seems to be feelings for feelings and feelings of the setting is also a lot of people laugh:

Although I was prodded to save Paul’s seat in the end, I still feel that this food can’t be eaten forever.