Warm tips

The full text plus the code total 6.8K words, reading about 10 minutes, thank you for your click, is willing to solve your problem;

preface

A few days ago in the piggy group, some students asked, does anyone know how to make douban automatic reply function? In fact, everyone is right. It is not good to help others, but considering that there is a threshold for a student who does not know how to do this, let alone using Selenium to search materials.

Douban reply function try

My initial idea was also based on Selenium, but I thought I’d first simulate it and look at douban’s reply process. First open the need to reply to the post, and then log in douban, and then return to the post, observe the interface, the reply button is at the bottom, click send is a comment;

Press F12, select Network, and click the red button on the left twice to clear all the previous data.

Then input the content, click the “send” button, and then check the network interface. It is not difficult to find the request to send a comment. From the name, you can also confirm that it is the request to send a comment.

https://www.douban.com/group/topic/121989778/add_comment
Copy the code

And found that when the request, to take four parameters:

Start =0 submit_btn= sendCopy the code

In that case, try it with Postman:

The request header parameter uses the conventional cookie, user-agent, referer, host;

As for the body part, although there are 4 parameters in the packet captured above, the actual verification only needs 2 parameters, and the sent content is Jbtest.

Click Send on Postman, refresh the page on that post, and you’ll see what you just replied to

It seems that the douban reply function only needs to adjust the interface, not selenium;

In this case, the following code cannot be written:

Import requests # import requests # Format is post link + / add_comment db_url = "https://www.douban.com/group/topic/121989778//add_comment" headers = {" Host ": "www.douban.com", "Referer": "https://www.douban.com/group/topic/121989778/?start=0", "User-Agent": "Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36", } params = {"ck": "TXEg", "rv_comment": "TXEg", "rv_comment": Post (db_URL,headers=headers, data=params) {jbtest11111} requests. Post (db_URL,headers=headers, data=params)Copy the code

The above code is to define the request header and the body parameter, like douban comment interface send a request, run the script, refresh the page:

No problem, the reply function is so easy;

The automatic function

Well, the reply function is ready, but does Python have a timer function? Just execute the post request on a regular basis, okay?

The answer is yes, APScheduler;

APScheduler profile

APScheduler is a scheduled task framework in Python. It provides scheduled tasks based on date, fixed interval, and crontab tasks similar to those on Linux. Moreover, the framework can not only add and delete scheduled tasks, but also store tasks in the database to realize the persistence of tasks, so it is very convenient to use.

Brief introduction of the official link: apscheduler. Readthedocs. IO/en / 3.3.1

The installation

1) install with PIP :(recommended)

pip install apscheduler 
Copy the code

2) based on the source code installation: pypi.python.org/pypi/APSche…

python setup.py install 
Copy the code

The four components

APScheduler has four components: 1) Triggers: Triggers contain scheduling logic. Each job has its own triggers that determine which job will run next. The triggers are completely stateless except for their own initial configuration.

2) Job Stores: Used to store scheduled jobs. The default job storage is simply to store job tasks in memory, while other job storage can store job jobs in various databases, supporting MongoDB, Redis and SQLAlchemy storage methods. When a job task is persisted, its data is serialized and deserialized when the job is re-read.

3) Executors: Use the executors to execute scheduled tasks. Just place the tasks to be executed in a new thread or thread pool and run them. The executor notifies the scheduler when the job task is complete. For executors, ThreadPoolExecutor is fine by default, but ProcessPoolExecutor can be used for special tasks such as cpu-consuming tasks. Of course, you can use both executors as required.

4) Schedulers: Schedulers connect other parts together. Generally, there is only one scheduler in an application. Application developers do not operate triggers, task stores and actuators directly. The scheduler is used to store tasks and configure actuators, such as adding. Modify and remove task jobs.

APScheduler provides a variety of schedulers. The commonly used schedulers are:

The name of the scenario
BlockingScheduler Suitable for situations where only a single task is running in a process
BackgroundScheduler Suitable for any situation that requires running in the background of a program
AsyncIOScheduler Suitable for situations where the Asyncio framework is used
GeventScheduler Suitable for situations where the GEvent framework is used
TornadoScheduler It is suitable for applications using Tornado framework
TwistedScheduler Suitable for applications using the Twisted framework
QtScheduler Suitable for situations where QT is used

Simple example

The from apscheduler. Schedulers. Blocking import BlockingScheduler import time # instantiate a scheduler scheduler = BlockingScheduler (def)  job1(): print "%s: "% time.asctime() # add task with fixed interval, Add_job (job1, 'interval', seconds=3)Copy the code

Effect after implementation:

So it’s pretty easy to initialize, and then add_job, and then start, but now I’m going to talk a little bit more about what the different components do;

Timing task

Trigger Provides three modes for triggering tasks:

  • Date: only to do this at some point, usage: run_data (datetime | STR)

    scheduler.add_job(my_job, ‘date’, run_date=date(2017, 9, 8), args=[]) scheduler.add_job(my_job, ‘date’, run_date=datetime(2017, 9, 8, 21, 30, 5), args=[]) scheduler.add_job(my_job, ‘date’, run_date=’2017-9-08 21:30:05′, args=[]) sched.add_job(my_job, args=[[])

    • interval: To do this every once in a while, usage: weekes = 0 | days = 0 | pump | = 0 minutes = 0 | seconds = 0, the start_date = None, end_date = None, timezone = None

    scheduler.add_job(my_job, ‘interval’, hours=2) scheduler.add_job(my_job, ‘interval’, hours=2, start_date=’2017-9-8 21:30:00′, end_date= ‘2018-06-15 21:30:00)

    @scheduler.scheduled_job(‘interval’, id=’my_job_id’, hours=2) def my_job(): print(“Hello World”)

    • Cron: uses the crontab mode in Linux, that is, scheduled tasks. Usage: (year=None, month=None, day=None, week=None, day_of_week=None, hour=None, minute=None, second=None, start_date=None, end_date=None, timezone=None)

    sched.add_job(my_job, ‘cron’, hour=3, minute=30) sched.add_job(my_job, ‘cron’, day_of_week=’mon-fri’, hour=5, minute=30, end_date=’2017-10-30′)

    @sched.scheduled_job(‘cron’, id=’my_job_id’, day=’last sun’) def some_decorated_task(): print(“I am printed at 00:00:00 on the last Sunday of every month!” )

Generally speaking, the interval method is used more, you can pay attention to;

Timed mission combat

1) How does APScheduler set time-range tasks? For example, I want to execute tasks at a random time point within 10:00 ~ 11:00

Print (get_time()+"jbtest") t= random. Randint (1,10) # # 1~10 Seconds =t,start_date='2018-09-05 10:00:00', end_date='2018-09-05 11:00:00')Copy the code

2) If you don’t want a specific time, but a certain range:

Scheduled_job (‘cron’, day_of_week=’mon-fri’, hour=’0-9′, minute=’30-59′, second=’*/3′) It is executed between 30 and 59 minutes with a frequency of 3 seconds.

Task operation

Add_job Add_job returns an instance of apscheduler.job. job that can be modified or deleted, while tasks added using modifiers cannot be modified after they are added.

Get_jobs The current task list can be obtained using the get_jobs method, or information about a task can be obtained using get_job() based on job_id. Apscheduler also provides a print_jobs() method to print a formatted list of tasks.

scheduler.add_job(my_job, 'interval', seconds=5, id='my_job_id' name='test_job')
print scheduler.get_job('my_job_id')
print scheduler.get_jobs()
Copy the code

Modify tasks modify_job to modify attributes can use apscheduler. Job. The job. The modify () or modify_job () method, which can be modified in addition to any other attribute of the id.

job = scheduler.add_job(my_job, 'interval', seconds=5, id='my_job' name='test_job')
job.modify(max_instances=5, name='my_job')
Copy the code

Remove tasks remove_job Remove tasks from the scheduler. You can use remove_job() to remove tasks based on job IDS or use remove(). If remove() is used, the instance object returned when adding a task needs to be saved in advance.

Add_job (myfunc, 'interval', minutes=2) job.remove() # Remove scheduler.add_job(myfunc, minutes=2) 'interval', minutes=2, id='my_job_id') scheduler.remove_job('my_job_id')Copy the code

Task to suspend pause_job and continue resume_job suspend and restore task can be directly operating task instance or scheduler. When a task is paused, its running time is reset, and no time is counted during the pause.

job = scheduler.add_job(myfunc, 'interval', Scheduler.add_job (myfunc, 'interval', minutes=2) # scheduler.job (myfunc, 'interval', minutes=2) Id ='my_job_id') scheduler.pause_job('my_job_id') # pause scheduler.resume_job('my_job_id') # resumeCopy the code

Task modifiers modify and reset reschedule_job

Job.modify (max_instances=6, name='Alternate name') scheduler.reschedule_job('my_job_id', trigger='cron', minute='*/5')Copy the code

Scheduler operation

The start() method is used to start the scheduler. BlockingScheduler needs to be initialized before it can execute start(). For other schedulers, calls to start() return directly. You can then proceed with subsequent initialization operations

from apscheduler.schedulers.blocking import BlockingScheduler
def my_job():
    print "Hello world!"
    scheduler = BlockingScheduler()
    scheduler.add_job(my_job, 'interval', seconds=5)
    scheduler.start()
Copy the code

Close the scheduler. Shotdown (wait = True | False) use below method to close the scheduler:

scheduler.shutdown() 
Copy the code

By default, the downloader closes its task store and executor and waits for all executing tasks to complete. If you don’t want to wait, you can do the following:

scheduler.shutdown(wait=False)
Copy the code

Resume ()

Douban automatic reply

After reading so many introduction of APScheduler, there are some examples above. Combining with the example of Douban in part 1, it is not difficult to write the following code:

The import requests from apscheduler. Schedulers. Blocking import BlockingScheduler # douban specific reply post interface, Format is post link + / add_comment db_url = "https://www.douban.com/group/topic/121989778//add_comment" scheduler = BlockingScheduler() headers = { "Host": "www.douban.com", "Referer": "Https://www.douban.com/group/topic/121989778/?start=0", "the user-agent: Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36", } params = {"ck": "TXEg", "rv_comment": "TXEg", "rv_comment": "jbtest11111", } def my_job(): requests.post(db_url,headers=headers, Add_job (my_job,"interval",seconds=10,id="db") scheduler.start()Copy the code

The effect is as follows:

Yes, every 10 seconds. Good ~

You think you’re done here? Well, if it’s a post, it’s done, but here’s one of two scenarios that still need to be worked over:

1) The same post, need to reply within a short time, this short time can not be defined, may be several minutes, unless it is like a normal user several hours to reply once may be no problem; 2) N posts need to be replied, and the reply interval is relatively short, similar to problem 1;Copy the code

So in both cases, what’s the problem? Of course, it is the anti-reptilian trigger of Douban:

At present, it is found that the verification code will pop up as long as the comments are made three times, even if they are made one minute apart.

Obtain the verification code ID

F12: enter a captcha to initiate a request:

captcha-solution

Verification code information:

Postman request content:

Click Send, and the original page will refresh and look like this:

Right, the content is not changed, because that would certainly not be effective, which is not so easy;

Verification code scenario:

Now there is A web address T, and two users, A and B, visit T. T returns A verification code X to A and A verification code Y to B. Both verification codes are correctCopy the code

How does the server distinguish between A and B? That’s using cookies;

For example, some websites will automatically log in after logging in once, but if the cookie is cleared, it will not automatically log in, and the cookie is not the same as others; Having said that, the process of server background generation of verification code is easy to understand:

I'm going to randomly generate a random string and bind it to a cookie and write it to the image and return it to youCopy the code

For more information on captcha generation, read this article;

At this point, you may be wondering, when using Postman, cookies should be the same as PC click send, but why not?

Because cookie is only the simplest binding condition, so it seems that Douban has other conditions, so let’s look again, when PC clicks send, besides the verification code, what else can be sent?

------WebKitFormBoundaryDjMAMsD95W3eYF1i Content-Disposition: form-data; name="ck" TXEg ------WebKitFormBoundaryDjMAMsD95W3eYF1i Content-Disposition: form-data; Name = "rv_comment" -- -- -- -- -- - repeated WebKitFormBoundaryDjMAMsD95W3eYF1i Content - Disposition: the form - data; name="img"; filename="" Content-Type: application/octet-stream ------WebKitFormBoundaryDjMAMsD95W3eYF1i Content-Disposition: form-data; name="captcha-solution" produce ------WebKitFormBoundaryDjMAMsD95W3eYF1i Content-Disposition: form-data; name="captcha-id" woMrwYOVwn67NNfl9lv9vhRz:en ------WebKitFormBoundaryDjMAMsD95W3eYF1i Content-Disposition: form-data; name="start" 0 ------WebKitFormBoundaryDjMAMsD95W3eYF1i Content-Disposition: form-data; Name = "submit_btn send -- -- -- -- -- -" WebKitFormBoundaryDjMAMsD95W3eYF1i -Copy the code

All the above information is the body in the request when clicking send. After a brief look, the parameter captcha- ID was missing in the previous analysis, so I guess it is the key.

Captcha-id = captcha-id = captcha-ID = captcha-ID = captcha-ID = captcha-ID = captcha-ID = captcha-ID = captcha-ID = captcha-ID = captcha-ID = captcha-ID = captcha-ID = captcha-ID = captcha-ID = captcha-ID = capTCHa-ID = capTCHa-ID = capTCHa-ID = capTCHa-ID

Click F12 to navigate to the verification code:

Make sure you enter the captcha code in the box, and see what you find. Isn’t that the capTcha – ID you want?

Love is doubt, so let’s test it with Postman.

Ha, ha, ha, ha, ha, ha, ha, ha, ha

Now the logic should be modified to look like this:

1) Open the post page and determine whether a verification code is required. If yes, obtain the CAPtcha-ID and verification code, and then post a request for capTCHa-ID and verification code. 2) If no verification code is required, simply post a requestCopy the code

In this case, compare the structure of a web page that requires a captcha to one that does not;

Verification code required:

No captcha required:

By comparison, when you need to input the verification code, there will be a div tag. The DIV tag is expanded, the download link of the TWO-DIMENSIONAL code can be found, and the captcha_ID can also be found. In this case, you can use xpath to determine the captcha_image.

A simple experiment to see if you can get this CAPTCHA, there is the following code:

import requests from lxml import html response = requests.get(db_url).content selector = html.fromstring(response) Captcha = selector. Xpath ("//img[@id= "captcha_image "]/@src") print(captchaCopy the code

In the beginning, JB thought that if there is no such value, it means that the data is generated by JS. So let’s study how to obtain jS-generated web page data, and then introduce selenium.

As it turns out, it doesn’t have to be that complicated. , the same code as above:

import requests
from lxml import html
response = requests.get(db_url,verify=False).content
print(response)
Copy the code

Print out the response and find something like this:

response = requests.get(db_url,verify=False).content.decode()
Copy the code

Renderings, so you can see the Chinese:

This solution, which took a long time to discover accidentally, is a small point to get. In the future, decode must be followed by request, and selenium will run far away.

Obtain the qr code download address

To write code like this:

response = requests.post(db_url,headers=headers, data=params,verify=False).content.decode()
selector = html.fromstring(response)
captcha = selector.xpath("//img[@id=\"captcha_image\"]/@src")
print(captcha)
Copy the code

Then reply the post three times to make the verification code appear, and then execute this script, otherwise the verification code does not appear, will get as [] :

If there is a captcha, get the image link and the captcha ID. If there is no captCHA, just post the request.

The import requests from apscheduler. Schedulers. Blocking import BlockingScheduler from LXML import HTML concrete posts links db_url = # douban "Https://www.douban.com/note/657346123/" # douban specific reply post interface, Format is post link + / add_comment db_url_commet = "https://www.douban.com/note/657346123///add_comment" scheduler = BlockingScheduler() headers = { "Host": "www.douban.com", "Referer": "Https://www.douban.com/group/topic/121989778/?start=0", "the user-agent: Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36", } params = {"ck": "TXEg", "rv_comment": "TXEg", "rv_comment": "jbtest11111", } def my_job(): Response = requests. Post (db_URL, headers=headers, data=params, verify=False).content.decode() selector = html.fromstring(response) captcha_image = selector.xpath("//img[@id=\"captcha_image\"]/@src") if(captcha_image): print(captcha_image) captcha_id = selector.xpath("//input[@name=\"captcha-id\"]/@value") print(captcha_id) else: Db_url_commet, headers=headers, data=params, Add_job (my_job, "interval", seconds=2, id="db") scheduler.start()Copy the code

Effect:

Ok, so we can get the TWO-DIMENSIONAL code picture and the ID corresponding to the picture, then what should we do next?

Identify qr code

Since you can get two-dimensional code pictures, and the request to bring this field, it means that you must first download the picture, and then to identify this picture, and then put the request together to submit;

Download image & Name:

import requests import re i = "https://www.douban.com/misc/captcha?id=9iGoXeJXeos3E1JukgkltEVp:en&size=s" captcha_name = Re.findall ("id=(.*?):", I) #findall ("id=(.*?):", I) #findall "+filename) # create filename with open(filename, 'wb') as f: '" Mozilla / 5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",','Referer': I} f.write(requests. Get (I,headers=header).content) print("%s download complete "% filename) # urllib.request.urlretrieve(requests.get(i,headers=header), "%s%s%s.jpg" % (dir, image_title, num))Copy the code

Ok, now that the image has been downloaded, like the one above, it looks like this:

tesserocr

Then we want to identify it, first try tesserOCR recognition rate, about tesserOCR article, please click here to understand, there is a detailed introduction, here does not repeat:

Image = image.convert('L') import tesserocr from PIL import Image # Create an Image object Image = image.open ("5.jpg") Threshold = 4 table = [] for I in range(256): if I < threshold: table.append(0) else: Table.append (1) # convert table to binary image, 1 is white, Image = image.point(table,"1") image.show() # Result = tesserocr.image_to_text(image) print(result)Copy the code

After several debugging processes, it is found that setting the binarization threshold to 4 is the optimal effect, and the length of the verification code after binarization is as follows:

Tut, tut, tut, tut, tut, tut, tut, tut, tut, tut, tut, tut, tut, tut, tut, tut, tut, tut, tut.

Baidu OCR

Since tesserOCR doesn’t work well, let’s try Baidu OCR, (for an article about Baidu OCR, please click here) :

From aip import AipOcr from PIL import Image import OS "" your APPID AK SK "" config = {" APPID ": ", "apiKey": ", "secretKey": "} client = AipOcr(**config) """ def get_file_content(filePath): with open(filePath, 'rb') as fp: return fp.read() def get_image_str(image_path): Image = get_file_content(image_path) """ "" result = client.basicAccurate(image) # if 'words_result' in result: return ''.join([w['words'] for w in result['words_result']]) if __name__ == "__main__": print(get_image_str("5.jpg"))Copy the code

Results after direct operation:

BAT, BAT, BAT, BAT, BAT, BAT, BAT, BAT, BAT, BAT, BAT, BAT, BAT, BAT, BAT, BAT, BAT

So if we put the tesserocr binarization here, will it work? The code is as follows:

"" def get_file_content(filePath): Image = image.convert('L') threshold = 15 table = [] for I in range(256): if i < threshold: table.append(0) else: Table.append (1) # convert table to binary image, 1 is white, Image = image.point(table, "1") image.save(os.path.join(os.getcwd(), os.path.basename(filePath))) with open(filePath, 'rb') as fp: return fp.read()Copy the code

After execution, it is found that the recognition rate is still touching. In fact, it has been tested before, and the verification code does have some record failures:

Since Baidu (free) are not good, then we will change the charge, charge of the code platform, the number of super eagle fame is larger;

The super eagles

Official website address: www.chaojiying.com/ Open the official website, there is a free test, click and find to log in, then register first; The results found that the free test or to question points, to pay attention to the public number binding account to send 1000 points, this is free test, too lazy to toss, directly charge money;

Super eagles is according to the order of magnitude to collect fees, large amount of cheap, standard price: $1 = 1000 points, different verification code, need of points are different, the details can query here: www.chaojiying.com/price.html#

After recharging, return to the free test interface for testing

Wait a while, the page will pop up:

The uploaded verification code is as follows:

In contrast, the results are completely correct, charging really niubi, this is even Baidu can not do;

Let’s change the verification code of weibo:

TSK, TSK, no difficulty;

Super Hawk also supports access. There is an API document at the bottom of the home page. Click there to find support for various languages, find Python and download the demo

Source code is based on 2.x write, not difficult to understand, but there are students on the Internet to reorganize, a lot of succinct, put this code posted out, the author :coder- Pig:

From hashlib import md5 import requests # cjy_params = {'user': '448975523', 'pass2': Md5 (' your password '.encode('utf8')). Hexdigest (), 'sofTID ': '96001',} # cjy_headers = {'Connection': 'Keep Alive -', 'the user-agent' : 'Mozilla / 4.0 (compatible; MSIE 8.0; Windows NT 5.1; Def cjy_fetch_code(im, codeType): cjy_params.update({'codetype': codetype}) files = {'userfile': ('ccc.jpg', im)} resp = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=cjy_params, If resp.get('err_no', 0) == 0: Return resp. Get (' pic_str ') # calling code if __name__ = = "__main__ ': im = open('captcha.jpg', 'rb').read() print(cjy_fetch_code(im, 1902))Copy the code

After execution, the verification code is found to be correct, and a request is about 3S.

Ok, verification code cracking this is here, want to access super eagle, Baidu, Tesserocr, or write their own artificial intelligence algorithm, you choose;

Douban automatic recovery function – terminal

For the sake of effect, here uses the super Eagle to crack, first look at the log:

According to douban records, yes, cracked good;

It’s all the code from above;

You can also have small optimizations

Is that the end of it? No, because now we are using the charge, so the accuracy verification code by the super eagles guarantees, but if use free of charge, the recognition rate moving, from the perspective of the user, CARES what captcha failed, want a notification, you can access the server at this time sauce, once WeChat captcha validation failure notification, results are as follows:

Those who are interested can go to the official website of Server sauce: sc.ftqq.com/3.version

18.9.13 update

Today, executing the script found that the script did not report an error, but it will not take effect after execution (there is no relevant comment on the corresponding post, and there will be no verification code), JB, a face of confusion, and then try to use Postman can not, later through the webpage simulation and Postman step by step simulation, found that the root of the problem is, when the request, The contents of the body ck field will change. The default was:

"ck": "TXEg"
Copy the code

For those of you wondering when this will change, the test found that when the user logs out, this field will change, but for the script, it will not go out, so that when the cookie expires, the content of this field will change;

The solution to cookie failure is also used. Selenium is used to retrieve cookies each time you log in, so that there is no cookie failure.

Here is a brief description of how to get the value of ck;

Simulate comments on the web page, and then can get the value of CK, and then copy, go to the HTML search of the web page, will appear how this value is obtained;

That’s right, the last four letters of the exit button URL, so in this case, I’m going to xpath that URL, and then I’m going to fetch those four letters;

So I write an xpath

selector.xpath("//div/ul/li/div/table/tbody/tr/td/a/@href")
Copy the code

I found that I couldn’t get it anyway. After I checked the output of Response. content, I did not find the code of the exit button.

This means that the HTML is generated by JS, and you can only use Selenium to retrieve the HTML generated by JS. However, there is a problem here. Selenium’s attempt to concatenate a custom cookie is very, very cumbersome.

Since Selenium is out of the question, is there another way? So I took the key ck and found this value on the cookie:

If the last CK value of the cookie changes, the ck value of the exit button will change as well. Regardless of cookie expiration, the value of the exit button will change as well. Ck this one gets the last ck content of the cookie directly;

The last ck value is obtained by cookie
def get_ck(a):
    # This is invalid and you can't use Selenium
    # ck_value = selector.xpath("//div/ul/li/div/table/tbody/tr/td/a/@href")
    text = re.findall("ck=(.*?) ;",headers["Cookie"[])- 1]
    return text
Copy the code

The long-lost automatic top stickers are back

summary

Shout, struggle to 4 points, at last pulled out, automatic reply function is very simple, actually see on Monday afternoon, have to in the evening, and then in the authentication code over problems, including trying to tuning, without charge, the results for several late not to read, and then is the problem of the response HTML code, at first thought it was js loaded, It turned out that decode was required, which had nothing to do with JS. One night was spent sorting out selenium knowledge. I thought it was related to JS, so I worked on it.

Back to the article, this paper introduces a lot of content, which is roughly divided into three parts as follows: 1) how to analyze douban comment interface; 3) CapTCHA cracking (TesserOCR, Baidu OCR, Super Eagle)

In fact, there is nothing particularly good to talk about, very simple things, just tedious, the reason why spend so much time, is the thinking has been diffused, that’s all, thank you;