QQ crawler - crawl QQ space - Moment For Technology

Background:

In a personal blog to see the relevant knowledge of the crawler, personally, you are interested, you spend a little time to research, mainly through friends space interaction (mutual visits, thumb up, reviews, and other interactive), as well as between friends chat activity, daily amount of thumb up, friends relationships between factors such as comprehensive judgment, In the area of space crawling (the premise is friends to open their own space permissions), through a layer of recursion, crawling friends of the space of the recent dozens of dynamic analysis (it is said that can also be many layers of recursion), the first automatic login, cookie_dict.txt, TXT and like. TXT, which respectively represent the weight of space and the weight of friend relationship. Finally, according to these two files, the third file relationship. Then put this file to the local server, open the PHP file on the browser (browser in the case of building local server parsed, can parse PHP file, of course, the content of the PHP nested in the HTML can be more image), of course, can not be satisfied with a TXT file, after all seem to be very inconvenient, is not straightforward, Therefore, a Baidu open source Echarts is used here, and the relevant support file echarts.js is downloaded and placed in the project file, which will be elaborated in the following. General background is such, small white one, there must be many places to understand, welcome correction.

Let’s start with a picture of the result

(In order to protect personal privacy, code protection, the general outline is like this)

It can be seen that it is mainly divided into three parts. The first part is high school and junior high school students, the second part is college students, and the third part is others.

Let’s look at the big picture

Directory:

1. Use Selenium to implement QQ account password login

2. Use multithreading to climb QQ

3. Visualize data using Echarts

The preparation phase starts with configuring the environment

This is the configuration of Windows10 environment

Configure Python + Selenium in Windows 10

For convenience, here is an article from CSDN

The original link: https://blog.csdn.net/efly2333/article/details/80346426

Once the basic environment is configured, the real work begins

1. Use Selenium login to get the friends list

Save this as cookie.py and run it to see that the browser automatically opens and the cookie_dict.txt file is generated in the directory. Change the chromedriver address of cookie.py to its own (environment variables can not be set), and run the account password as prompted to generate cookies).

2. Crawl space and more

The main code

Run spider. Py to climb the data and preliminary analysis to generate TXT files and save the results to local

Running Process Screenshot

3. Now that you’ve got your list of friends, it’s time to go to each of your friends’ Spaces and crawl through the likes and comments

I get two TXT files

It goes something like this:

Small rain heroine in $| $the heroine in the rain

On the way $| $on the road

Choking choking choking $| $choke choke choke

Al with British Columbia (euro & my eyes only you ~ 】【) $| $al with British Columbia (& my eyes only you ~ 】【 euro)

The details will be given at the end of the paper, but this is just the general outline,

4. Then there is the data processing part

Analyze the data

You get the two files from step 2: comment.txt and like.txt

Each file each line is the format: A $| $B

And the network of friends we want is the graph in the data structure, and the graph here is stored as a triplet, node A, node B, weights

Therefore, we are generating relationship. TXT according to the two files we have obtained

Inside the store data formats: A $| $B $| $value, this value is the relationship of the two values

Use the list type storage: such as [[a, b, value1], [c, d, value2]]…

This variable is named relationship, and we can get this data first. Finally, we can loop this variable into relations.txt.

The idea behind the data is as follows:

From the comment. TXT or the like. TXT file to read A row, A $| $B, then traverse the relationship, if A, B did not appear the same list, then the newly created A child list to record the relationship of the two values, if at the same time appear in the same list, Change the relationship in the sublist to +3 if the data is read from comment or +1 if the file is read from like

5. Run analysis.py under show_relation to obtain the friend relationship value file

In show_relation, modify the path of TXT file obtained in the third step as prompted, and then place the whole folder under the local Web server, and put it in the

http://echarts.baidu.com/download.html

Download the echarts plugin, place it in that folder and open relation.php in your browser

6. Open the PHP file in a browser on the web server you set up

Set up a local server to start the local service using XAMpp, then upload the project file to the HTdocs file of XAMPP, type in the browser: http://localhost:80\ project file name

And now you have the picture from the beginning of the article.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

QQ crawler – crawl QQ space

1. Use Selenium login to get the friends list

3. Now that you’ve got your list of friends, it’s time to go to each of your friends’ Spaces and crawl through the likes and comments

6. Open the PHP file in a browser on the web server you set up

QQ crawler – crawl QQ space

1. Use Selenium login to get the friends list

3. Now that you’ve got your list of friends, it’s time to go to each of your friends’ Spaces and crawl through the likes and comments

6. Open the PHP file in a browser on the web server you set up

Related Posts

Skills: A simple git tutorial

Summary of experience on ivX full-text search function

Network programming encountered pit