preface

We need to understand some basic Python syntax, crawler library, HTML query, data processing and preservation

Python:

www.runoob.com/python/pyth…

  • Basic grammar and features

Requests:

www.cnblogs.com/lanyinhao/p…

  • The Apache2 Licensed HTTP library is used instead of URllib

Support HTTP connection retention and connection pooling, support the use of cookies to maintain sessions, support file upload, support automatic response content encoding, support international URL and POST data automatic encoding.

Beautiful Soup:

cuiqingcai.com/1319.html

  • Provides simple, Python-like functions to handle navigation, searching, modifying analysis trees, and more.

Crawler and JS reverse

Baidu translation

Baidu Translation crawler, through baidu translation interface query

Analysis of the

  • Open Baidu Translation

url : fanyi.baidu.com/#en/zh/

Analysis process

  1. Switch the browser to mobile, so that you may have fewer parameters to view the request, click the button in the following image:

  1. View request interface

  1. Check whether the request parameters have changed
  • Graph one:

  • Figure 2:

After analysis, we found that two parameters here changed. After inference, the parameters here may be generated using JS, so we need to analyze and crack the interface to get the value here

The results of the analysis

Analyze the target —- Analysis results ——————-
Request URL Analysis fanyi.baidu.com/v2transapi
Request Mode Analysis POST
Request parameter analysis See request parameter analysis
Request header analysis See request header analysis
Request parameter analysis
The KEY parameters The results of the analysis
query Translate the words (change)
from En (constant value)
to En (fixed value unchanged)
token 900 aa0a84929561d52bbee8c9222c0aa (after the request and testing, we found that for a fixed value)
sign 54706.276099

JS reverse flow

Note: in JS reverse, not you have to be proficient in JS to do JS reverse, you have to understand a little BIT of JS can do reverse, important reverse thinking, the way of thinking about the problem

Chrome Debugging Tips

  1. Search Displays the query panel
  2. The query panel can find all the code where the keyword appears by keyword
  3. Click to track the code and format it
  4. Set breakpoints on formatted code
  5. Moving the mouse cursor over it allows you to see the value of the current running code variable, where the original code of the function is located, and so on
JS reverse flow
  1. Enter the code with the keyword, enter the line of code that sent the request, and extract the keyword from the REQUESTED URL
  2. Add a breakpoint to the code that sent the request, and trigger the send request to confirm that the correct code was found
  3. Up and down, looking for target parameters and generating logic
  4. Use the JS2PY simulation to perform the generation logic to get the desired content

Follow the reverse process to find the JS code we need

  1. Search keywords

  1. Follow up the code and analyze the AJAX request

  1. Look for the values we need

  • Find exactly what we need

  • Copy the JS code we need

  • The code is as follows:

function e(r) {
        var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
        if (null === o) {
            var t = r.length;
            t > 30 && (r = "" + r.substr(0.10) + r.substr(Math.floor(t / 2) - 5.10) + r.substr(-10.10))}else {
            for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++)
                ""! == e[C] && f.push.apply(f, a(e[C].split(""))), C ! == h -1 && f.push(o[C]);
            var g = f.length;
            g > 30 && (r = f.slice(0.10).join("") + f.slice(Math.floor(g / 2) - 5.Math.floor(g / 2) + 5).join("") + f.slice(-10).join(""))}var u = void 0
          , l = "" + String.fromCharCode(103) + String.fromCharCode(116) + String.fromCharCode(107);
        u = null! == i ? i : (i =window[l] || "") | |"";
        for (var d = u.split("."), m = Number(d[0) | |0, s = Number(d[1) | |0, S = [], c = 0, v = 0; v < r.length; v++) {
            var A = r.charCodeAt(v);
            128 > A ? S[c++] = A : (2048 > A ? S[c++] = A >> 6 | 192 : (55296= = = (64512 & A) && v + 1 < r.length && 56320= = = (64512 & r.charCodeAt(v + 1))? (A =65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)),
            S[c++] = A >> 18 | 240,
            S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224,
            S[c++] = A >> 6 & 63 | 128),
            S[c++] = 63 & A | 128)}for (var p = m, F = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), D = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), b = 0; b < S.length; b++)
            p += S[b],
            p = n(p, F);
        return p = n(p, D),
        p ^= s,
        0 > p && (p = (2147483647 & p) + 2147483648),
        p %= 1e6,
        p.toString() + "." + (p ^ m)
    }
Copy the code

Write crawler code

  • Before we write the code, we need to understandjs2pyThis module
  • Pypi.org/project/Js2…
# !/usr/bin/python3
# -*- coding: utf-8 -*-

import js2py
import requests

js_ctx = js2py.EvalJs()

# 0: English-Chinese 1: Chinese-English translation
t_mode = 0


class Translation(object) :

    def __init__(self, query) :
        # initialization
        self.url = "https://fanyi.baidu.com/v2transapi?from={0}&to={1}"
        self.query = query
        self.headers = {
            "User-Agent": "Mozilla / 5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1"."Referer": "https://fanyi.baidu.com/"."Cookie": "BAIDUID=714BFAAF02DA927F583935C7A354949A:FG=1; BIDUPSID=714BFAAF02DA927F583935C7A354949A; PSTM=1553390486; delPer=0; PSINO=5; H_PS_PSSID=28742_1463_21125_18559_28723_28557_28697_28585_28640_28604_28626_22160; locale=zh; from_lang_often=%5B%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%2C%7B%22value%22%3A%22zh%22%2C%22text% 22%3A%22%u4E2D%u6587%22%7D%5D; to_lang_often=%5B%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%2C%7B%22value%22%3A%22zh%22%2C%22text%22 %3A%22%u4E2D%u6587%22%7D%5D; REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; Hm_lvt_afd111fa62852d1f37001d1f980b6800 = 1553658863155766, 321155769, 980155770, 442; Hm_lpvt_afd111fa62852d1f37001d1f980b6800=1553770442; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574 = 1553766258155766, 321155769, 980155770, 442; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1553770442"
        }

    # js retrieve the value of sign backwards
    def make_sign(self) :
        with open("translation.js"."r", encoding="utf-8") as f:
            js_ctx.execute(f.read())

        # Generate sign by calling a function in js
        sign = js_ctx.e(self.query)
        # Add sign to data
        return sign

    def make_data(self, sign) :
        data = {
            "query": self.query,
            "token": "6f5c83b84d69ad3633abdf18abcb030d"."sign": sign
        }
        return data

    def get_content(self, data) :
        if t_mode == 0:
            from_str = "en"
            to_str = "zh"
        else:
            from_str = "zh"
            to_str = "en"

        response = requests.post(
            url=self.url.format(from_str, to_str),
            headers=self.headers,
            data=data
        )
        return response.json()['trans_result'] ['data'] [0] ['dst']

    def run(self) :
        # Get the value of sign
        sign = self.make_sign()
        # Build parameters
        data = self.make_data(sign)
        # Get the translated content
        content = self.get_content(data)
        print(content)


if __name__ == '__main__':
    t_mode = int(input(Please enter the translation mode (0: English-Chinese 1: Chinese-English):))
    query = input("Please enter what you want to translate :")
    translation = Translation(query)
    translation.run()
Copy the code

Pay attention to

  • So we’re going to run the code and we’re going to get an error, we’re going to say we’re missingrThe value of the

Solution: Back in the browser, we look for the value of r and add the function that generates the value of r to the js file we created earlier, above the G function

  • The code to generate R is shown below
function n(r, o) {
        for (var t = 0; t < o.length - 2; t += 3) {
            var a = o.charAt(t + 2);
            a = a >= "a" ? a.charCodeAt(0) - 87 : Number(a),
            a = "+" === o.charAt(t + 1)? r >>> a : r << a, r ="+" === o.charAt(t) ? r + a & 4294967295 : r ^ a
        }
        return r
    }
Copy the code
  • We rerun the code and find another error saying it is missingiSo our solution is to keep goingjsGo backwards and look for the value of I

  • Reverse analysis
  1. We go back tofunction a()In, find the need to useiValue, put a break point

  1. After we hit the breakpoint to refresh the page, we put the mouse over againiOn, we’re going to see a floating point string, and we’re not sureiIf the value of is changed, so we change a translated word, refresh the page, and check againiValue of, we foundiThe value of theta is fixed, so we can define a fixed value directly in our codeiValue.

We’ll write on the top line of our custom JS code:

var i = "320305.131321201"
Copy the code

The last

Run the code, successful, complete a simple Baidu translation of JS reverse attempt.

QQ Music download

Qq music download is what kind of process

Analysis process

  • Open QQ Music

url : y.qq.com/

  1. Switch the browser to mobile, so that you may have fewer parameters to view the request, click the button in the following image:

  1. View request interface

  1. To continue searching for available links, view the required pass parameter song_mid through the link of a single song:

Follow the reverse process to find the JS code we need

  1. You can view the required parameter sign value and data, and then concatenate data to view the conclusion of parameter sign

  1. Debug Debugs the js method and the JS code for obtaining the sign value

  1. GetSecuritySign internal js methods that mainly deal with data are copied to sign.js as above
  2. The links obtained from this link are spliced into the source of the music

Finally splicing the music source, you can complete the download

Refer to the content

Github.com/Kr1s77/awes…