Hello, everyone, I am A Chen, today to teach you how to realize the real-time analysis of station B live bullet screen

Train of thought: Collect live bullet screen, then through emotion analysis, the number of comments at different time points, statistics of high-frequency words

1. Collect live bullets

First, open a live room at random at station B

https://live.bilibili.com/22080761?hotRank=0&session_id=acbf4a0396f4c22_E68865D5-1DD6-4677-859D-4A81FC5B86C6&visit_id=64 j0r30ef8c0Copy the code

Room number: 22080761

1. Search for bullet screen links

View network via F12 and find the link below

https://api.live.bilibili.com/xlive/web-room/v1/index/roomEntryAction  
Copy the code

You can see that it is a POST request, after verification only need the room number can obtain the barrage!

2. Construct the POST request

roomid = "22080761"  
url = "https://api.live.bilibili.com/xlive/web-room/v1/dM/gethistory"  
headers = {  
    "Host": "api.live.bilibili.com",  
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",  
}  
data = {  
    "roomid": roomid,  
}  
html = requests.post(url=url, headers=headers, data=data).json()  
Copy the code

Returns json data directly

Second, real-time acquisition

In order to achieve the purpose of real-time collection, it is necessary to encapsulate the code of barrage collection into a function, and then call this function every few seconds (for example, 5 seconds here).

1. Timing code

return hour * 3600 + min * 60 + sec second = sleeptime(0, 0, 5) while 1 == 1: Time. Sleep (second) print("do action") ###Copy the code

Here get_msg is the collection function

TxT with open(filename,"a+", encoding="gb18030") as f: for content in HTML ["data"]["room"]: Text = content["text"] # Obtain speech time = content["timeline"] # Record speech MSG = Timeline +" *" +" *" + text ### if MSG not in msg_data: f.rite (STR (MSG)+"\n")Copy the code

The text is named after the room number

Three, emotional analysis

Emotion judgment, the probability that the return value is positive emotion, the closer to 1 is positive emotion, and the closer to 0 is negative emotion

Let’s take a quick example

Text1 = "That was a great movie. There was no peeing all the way through."Copy the code

SnowNLP library to achieve text sentiment analysis

Roomid =" 22080761" filename = roomid + ".txt" ## Read current TXT with open(filename, "r", encoding="gb18030") as f: Msg_data = f.readlines() Sentiment_list = [] ### count1 = 0 ## count2 = 0 Text = i.split("*")[2] t = SnowNLP(text) if t. stiments > 0.5: count1 = count1 +1 else: count2 = count2 + 1 Sentiment_list.append(count1) Sentiment_list.append(count2) print(Sentiment_list)Copy the code

Read each bullet screen from the text file, then conduct emotional analysis, and finally count the number of positive comments and negative comments

There were 13 positive ones and 10 negative ones.

4. Analysis of comments at different time points

Also read the text file of bullet screen, and then conduct statistics in minutes

Get the current hour

h =  time.strftime("%Y-%m-%d %H", time.localtime())  
Copy the code

Start statistics

for i in msg_data: Time_list. Append (i. plit (" * ") [0]. The split (" : ") [1]) data_time = list (set (time_list)) data_time. Sort () there were only three if # # # len(data_time)>7: data_time = data_time[-7:] name = [h+":"+i for i in data_time] value =[time_list.count(i) for i in data_time] print(name) print(value)Copy the code

V. Statistics of high-frequency words

SnowNLP library is used for word segmentation of text (bullet screen), and then the sorted words will be statistically sorted

First, merge the barrage into text

for i in msg_data: #if ymdhm in i: Join (text_list) text = text.replace("\n","") print(text) s = SnowNLP(text)Copy the code

Start counting, and then take out the top 5

list_all = s.words  
dict_x = {}  
for item in list_all:  
    dict_x[item] = list_all.count(item)  
sorted_x = sorted(dict_x.items(), key=operator.itemgetter(1), reverse=True)  
re_word_list = []  
re_word_list_name = []  
re_word_list_data = []  
count = 0  
for k, v in sorted_x:  
    if count==5:  
        break  
    if len(k)>1:  
        re_word_list_name.append(k)  
        re_word_list_data.append(v)  
        count = count+1  
re_word_list.append(re_word_list_name)  
re_word_list.append(re_word_list_data)  
print(re_word_list)  
Copy the code

Six, summarized

This paper mainly explains any real-time acquisition live barrage, and then makes statistics of barrage data through SnowNLP library.

If you are interested in the source code of this article, scan the code to pay attention to the ** public number,** background reply: live bullet screen, get the complete code!

One last word:The original is not easy toPlease give me a thumbs upAnd looking at the, comments,

— — — — — — — — — — – * * * * * * * * * * – * * * * * * * * * * recommended reading — — — — — — — — — — — — – * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

High quality recommendation

Visual analysis of China’s top 500 ranking data


The actual combat explains four kinds of different crawler analysis data method, must master!


One-click query of stars’ personal information and visualization of knowledge map


Climb to show king of Glory revenue line from 2020 to February 2021

Wechat is added to the background of the public account, and the code can be obtained

​​​​​​​ ‍‍‍‍