preface

Check the latest anti-crawler way, see a WebSocket handshake verification anti-crawler, have not encountered, find a website to try ~ the latest anti-crawler way: blog.csdn.net/qq_26079939…


What is WebSocket?

WebSocket is a protocol for full duplex communication over a single TCP connection provided by HTML5.

WebSocket makes it easier to exchange data between the client and the server, allowing the server to actively push data to the client. In the WebSocket API, the browser and server only need to complete a handshake to create a persistent connection and two-way data transfer.

In the WebSocket API, the browser and the server only need to do a handshake, and then a fast channel is formed between the browser and the server. Data can be transmitted directly between the two.

WebSocket handshake verification anti-crawler

1. Target sites

Le Yu Sports: live.611.com/zq


2. Website analysis


1. To establish a Socket link address, of which 9394 adf88ece4ff08f9ac6e82949f3a1 parameter is a variable value



2. Obtain the token in the following way

def getToken() :
    url = "https://live.611.com/Live/GetToken"
    response = requests.get(url)
    if response.status_code == 200:
        data = json.loads(response.text)
        token = data["Data"]
        return token
    else:
        print("Request error")
Copy the code


3. As can be seen from the figure below, the green arrow is the data sent by the client to the server, and the red arrow is the data responded by the server


3. Obtain data

There are many Python libraries for connecting websockets, but easy-to-use, stable ones are websocket-client(non-asynchronous), WebSockets (asynchronous), aiowebSocket (asynchronous), The following uses websocket-client and WebSockets.

Websocket – client method:

import requests
import websocket
import json
import time


def getToken() : Get the token argument
    url = "https://live.611.com/Live/GetToken"
    response = requests.get(url)
    if response.status_code == 200:
        data = json.loads(response.text)
        token = data["Data"]
        return token
    else:
        print("Request error")


def get_message() : # Data to send
    timestamp = int(time.time()) * 1000
    info = {'chrome': 'true'.'version': '80.0.3987.122'.'webkit': 'true'}
    message1 = {
        "command": "RegisterInfo"."action": "Web"."ids": []."UserInfo": {
            "Version": str([timestamp]) + json.dumps(info),
            "Url": "https://live.611.com/zq"
        }
    }
    message2 = {
        "command": "JoinGroup"."action": "SoccerLiveOdd"."ids": []
    }
    message3 = {
        "command": "JoinGroup"."action": "SoccerLive"."ids": []}return json.dumps(message1), json.dumps(message2), json.dumps(message3)


def Download(token,message1,message2,message3) :
    uri = "wss://push.611.com:6119/{}".format(token)
    ws = websocket.create_connection(uri, timeout=10)
    ws.send(message1)
    ws.send(message2)
    ws.send(message3)
    while True:
        result = ws.recv()
        print(result)

if __name__ == '__main__':
    token = getToken() # Get token string
    message1, message2, message3 = get_message() Construct request information
    Download(token,message1, message2,message3) # fetch data

Copy the code

The results

Web sockets method

import asyncio
import logging
import time,json,requests
from aiowebsocket.converses import AioWebSocket


def getToken() :
    url = "https://live.611.com/Live/GetToken"
    response = requests.get(url)
    if response.status_code == 200:
        data = json.loads(response.text)
        token = data["Data"]
        return token
    else:
        print("Request error")


def get_message() : # Data to send
    timestamp = int(time.time()) * 1000
    info = {'chrome': 'true'.'version': '80.0.3987.122'.'webkit': 'true'}
    message1 = {
        "command": "RegisterInfo"."action": "Web"."ids": []."UserInfo": {
            "Version": str([timestamp]) + json.dumps(info),
            "Url": "https://live.611.com/zq"
        }
    }
    message2 = {
        "command": "JoinGroup"."action": "SoccerLiveOdd"."ids": []
    }
    message3 = {
        "command": "JoinGroup"."action": "SoccerLive"."ids": []}return message1, message2, message3

async def startup() :
    token = getToken()  # Get token string
    uri = "wss://push.611.com:6119/{}".format(token)
    message1, message2,message3 = get_message()  Construct request information
    async with AioWebSocket(uri) as aws:
        converse = aws.manipulator
        await converse.send(json.dumps(message1))
        await converse.send(json.dumps(message2))
        await converse.send(json.dumps(message3))
        while True:
            mes = await converse.receive()
            if mes:
                msg = json.loads(str(mes, encoding="utf-8"))
                print(msg)


if __name__ == '__main__':
    try:
        asyncio.get_event_loop().run_until_complete(startup())
    except KeyboardInterrupt as exc:
        logging.info('Quit.')
Copy the code

The results


Third, summary

In the Web world, polling and WebSocket are two methods for implementing ‘real-time’ updates of data. Polling means that the client accesses the server interface at certain intervals (e.g., 1 second) to achieve the effect of ‘real time’. Although the data looks like it is being updated in real time, it is actually being updated at certain intervals and is not really being updated in real time. Polling usually adopts pull mode, in which the client actively pulls data from the server.

WebSocket adopts the push mode, in which the server actively pushes the data to the client, which is the real real-time update.

After the server creates the socket service, it listens to the client and reads the message sent by the client using while True

The handshake request sent by the server is then verified. If the verification succeeds, the response header with status code 101 is returned; otherwise, the response header with status code 403 is returned