Python log-in free Microblogging data crawler (research only)

Weibo data is valuable data, the data can be used as a data source when we do some system development, such as: some time ago in a Python in this essay to identify whether the individual has a tendency to commit suicide, in this article, we use weibo despair tree hole data, using the SVM recognition model to do a simple suicide attempts.

Of course, there’s more to weibo data than that. If you’re bold and detailed, there are plenty of opportunities to use it for research. However, technology is a double-edged sword, both good and bad, I do not want you to take this reptile to do something against morality and law, applied to good affairs, is the original intention of the birth of technology.

This article is about user-specific crawlers. If you want to be able to customize your crawlers, check out this tutorial

1. Prepare

In fact, the principle of login free is very simple, is to bypass the login verification through the mobile version of weibo, we can use the mobile web page to open this website, you will find that in fact, most of the microblog in the case of you do not log in are visible: m.weibo.cn/u/207568677…

Climb when visible. Therefore, we only need to call the JSON interface of the microblog data to get the data. But instead of trying to whip out code from the ground up, take advantage of the Python open source community and save a lot of time by going online and looking for a ready-made wheel rather than trying to build a half-baked version yourself.

After a bit of searching, I found this cookie-free version of twitter crawler, developed by DataABC: github.com/dataabc/wei…

The code is similar to what I thought, just call the JSON data interface to get the data:

You can Download the open source project directly from this webpage, or use git:

git clone https://github.com/dataabc/weibo-crawler.git
Copy the code

If you don’t know either of them, you can download the Python tool.

2. Set collection parameters

Before you start collecting data, you need to make sure you have Python installed on your computer. If not, please visit this article: Super Detailed Python Installation Guide to install it.

After installing Python, open Cmd(start, Run, Cmd) in Windows, and Terminal(command+ space enter Terminal) in apple.

Go to the folder we just downloaded and enter the following command to install the required modules:

pip install -r requirements.txt**
Copy the code

If many Successfully installed XXX messages are displayed, the installation is successful.

2.1 Find the user ID you need to climb

Click on the home page of the user you want to crawl, and then look at the URL at this point, you will find a string of numbers in the link, this is the userID we will use, just copy it.

If it is not a string of numbers, you can click on any weibo comment page, at this time the link above will have a string of numbers, if not, on the mobile version of the weibo page to find, this time you need to be patient and try more.

2.2 change the config. Json

After obtaining the user’s userID, you need to write the ID to the user_ID_list array in config.json, as shown in the figure below:

Other parameters are as follows: filter: controls the range of tweets to be climbed. If the value is 1, all original tweets are climbed; if the value is 0, all tweets are climbed. (Original + forward) Since_date: time after the date when all original tweets are climbed

The following are: whether to download original weibo pictures, whether to download and forward Weibo pictures, whether to download original videos, and whether to download and forward videos. If the value is 1, yes; if the value is 0, no. Later, if you need to write to the database, you can also configure connection parameters for MySQL or MongoDB.

3. Start collection

Once configured, collection is easy. You just need to enter CMD or Terminal into the folder and type:

python weibo.py

Data can be collected. After collection, if you set it as CSV file, a number named the weibo user name will be generated in the Weibo folder under the current folder. CSV file, for example:

Weibo \ Arsenal football club \2075686772.csv

The data you want is in this file.

The open source module is designed to be quite functional, as you can see from the list below.

Really sweet and spicy, must thank this open source author, if you like, remember to go to his warehouse to give him a star oh!

So that’s the end of our article, if you’d like to see our Python tutorial today, please stay tuned and give us a thumbs up/check out if it was helpful. If you have any questions, please leave them in the comments below and we’ll be patient to answer them!

Python Dict.com Is more than a dictatorial model

Python data crawler (for research use only)

Python log-in free Microblogging data crawler (research only)

1. Prepare

2. Set collection parameters

2.1 Find the user ID you need to climb

2.2 change the config. Json

3. Start collection

Related Posts

Fire all over the web! 98 pages of Python office automation tutorial, xiao Bai can get started quickly

IDEA Practical tutorial (3)

Redis cache avalanche, cache penetration, cache breakdown, cache warm-up