1

preface

Recently, iQiyi alone broadcast the hot drama “Zuonxu” is particularly hot, the author has been chasing, with the help of the hands of the technology, want to climb the bullet screen analysis of the play and the specific situation of the netizens’ comments!

In order to let Xiao Bai thoroughly learn to use Python to climb iQiyi barrage technology, so this article introduces how to climb in detail, and then analyze the data below!

2

Analyzing data packets

1. Search for data packets

Press F12\ in your browser

Find such urls

https://cmts.iqiyi.com/bullet/ / 5400/7973227714515400_60_2_5f3b2e24.br
Copy the code

2. Analyze bullet screen links

Among them /54/00/7973227714515400, is useful !!!!

Iqiyi’s barrage address is as follows:

Cmts.iqiyi.com/bullet/ parameter 1 _…

Parameter 1 is /54/00/7973227714515400

Parameter 2 is 1, 2, and 3…..

IQIYI each5- new barrage will be loaded in minutes for each episode46Minutes.46Divided by the5I'm going to round up10
Copy the code

Hence the link to the barrage: \

https://cmts.iqiyi.com/bullet/ / 5400/7973227714515400 _300_1.z
https://cmts.iqiyi.com/bullet/ / 5400/7973227714515400 _300_2.z
https://cmts.iqiyi.com/bullet/ / 5400/7973227714515400 _300_3.z
......
https://cmts.iqiyi.com/bullet/ / 5400/7973227714515400 _300_10.z
Copy the code

3. Decode binary data packets

Danmu package downloaded through danmu link is a file with Z suffix format, which needs to be decoded! \

def zipdecode(bulletold) :
    'Decode zip compressed binary content into text'
    decode = zlib.decompress(bytearray(bulletold), 15 + 32).decode('utf-8')
    return decode
Copy the code

After decoding, the data is saved in XML format

# Write the encoded files into XML files (similar to TXT files), convenient to fetch data laterwith open('./lyc/zx' + str(x) + '.xml'.'a+', encoding='utf-8') as f:
      f.write(xml)
Copy the code

3

To parse the XML

1. Extract data

By looking at the XML file, we need to extract 1. User ID (UID), 2. Content and 3. LikeCount.

Read data from XML file
from xml.dom.minidom import parse
import xml.dom.minidom
def xml_parse(file_name) :
    DOMTree = xml.dom.minidom.parse(file_name)
    collection = DOMTree.documentElement
    Get all entry data in the collection
    entrys = collection.getElementsByTagName("entry")
    print(entrys)
    result = []
    for entry in entrys:
        uid = entry.getElementsByTagName('uid') [0]
        content = entry.getElementsByTagName('content') [0]
        likeCount = entry.getElementsByTagName('likeCount') [0]
        print(uid.childNodes[0].data)
        print(content.childNodes[0].data)
        print(likeCount.childNodes[0].data)
Copy the code

4

Save data \

1. Work before saving

import xlwt
Create a workbook setting encoding
workbook = xlwt.Workbook(encoding = 'utf-8')
Create a worksheet
worksheet = workbook.add_sheet('sheet1')


# write excel
# Argument corresponds to row, column, and value
worksheet.write(0.0, label='uid')
worksheet.write(0.1, label='content')
worksheet.write(0.2, label='likeCount')
Copy the code

Import XLWT library (write CSV) and define title (uid, content, likeCount)

2. Write data

for entry in entrys:
    uid = entry.getElementsByTagName('uid') [0]
    content = entry.getElementsByTagName('content') [0]
    likeCount = entry.getElementsByTagName('likeCount') [0]
    print(uid.childNodes[0].data)
    print(content.childNodes[0].data)
    print(likeCount.childNodes[0].dataWorksheet. write(count,0, label=str(uid.childNodes[0].data))
    worksheet.write(count, 1, label=str(content.childNodes[0].data))
    worksheet.write(count, 2, label=str(likeCount.childNodes[0].data))
    count=count+1
Copy the code

Finally saved into the barrage data set – Li Yunchen.xls

for x in range(1.11):
    l = xml_parse("./lyc/zx" + str(x) + ".xml")


# save
workbook.save('Barrage data set - Li Yunchen.xls')
Copy the code

5

conclusion

1. Through actual cases ** “Long” **, achieve Python to climb iQiyi barrage by hand.

2. Python parses XML data.

3. Write data to Excel. \

Read more

Top 10 Best Popular Python Libraries of 2020 \

2020 Python Chinese Community Top 10 Articles \

5 minutes to quickly master the Python timed task framework \

Special recommendation \

\

Click below to read the article and join the community