Accidentally do it because colleagues sent us a link to the baidu know, let’s help to thumb up, just beginning to think not, for example, the first point of the later need not point, and then the second day can we help then point again, I was wondering if I could have more convenient way to implement it, I think the first is to find a few baidu account, then login thumb up.

Occasionally, colleagues said that Baidu knows not to log in can also like. I tried it later and found out that it was. I thought, well, that’s IP restricted. Sure enough. It turns out that Baidu knows the rules for liking something like this. Each IP can be liked once a day. So the first thing THAT came to my mind was to change the IP and like it.

In fact, I had another idea at that time, that is to find its API, but I quickly thought that with Baidu’s technology, it would not be so easy to expose the API. So stick with Chrom + WebDriver.

My implementation method is divided into two parts, one is to find IP, one is to change IP.

1. Find IP addresses and create an IP address pool

This web site

http://www.xicidaili.com/nn/1
Copy the code

You can find a lot of proxy IP, some IP can be used, some IP can not use, free of charge to make do with it, or enough to use. Most of the HTTPS IP addresses are valid, and then those IP addresses and port numbers crawl down.

The specific code is as follows:

#encoding=utf8 import urllib2 import BeautifulSoup User_Agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64; Rv :43.0) Gecko/20100101 Firefox/43.0' header = {} header[' user-agent '] = User_Agent URL = 'http://www.xicidaili.com/nn/1' req = urllib2.Request(url,headers=header) res = urllib2.urlopen(req).read() soup = BeautifulSoup.BeautifulSoup(res) ips = soup.findAll('tr') f = open("test.txt","w") for x in range(1,len(ips)): ip = ips[x] tds = ip.findAll("td") ip_temp = tds[1].contents[0]+"\t"+tds[2].contents[0]+"\n" # print tds[2].contents[0]+"\t"+tds[3].contents[0] f.write(ip_temp)Copy the code

Grab the TD tag directly, and then crawl to the IP and port number into the test.txt file.










For example, climb to the data







Second, change IP access, simulate the like operation

On the first code

from selenium import webdriver from selenium.webdriver.support.wait import WebDriverWait from . Selenium.webdriver.com mon. By the import by the from the selenium webdriver. Support the import expected_conditions as EC # out IP and IP pool interface Proxys =[] # with open('test.txt','r') as f0: for I in f0: tmp=i.split() host.append(tmp[0]) port.append(tmp[1]) for h,p in zip(host,port): PROXY = h+":"+p chrome_options = webdriver.ChromeOptions() chrome_options.add_argument('--proxy-server={0}'.format(PROXY)) chrome = webdriver.Chrome(executable_path='/Users/mac/Desktop/chromedriver', chrome_options=chrome_options) try: # access points of baidu know url chrome. Get (' https://zhidao.baidu.com/question/621673501705715932.html ') # thumb up button ID is fixed, WebDriverWait(Chrome, 20). Until (ec.visibility_of_element_located ((by.id, 'evaluate-2796284381')) elem = chrome.find_element_by_id("evaluate-2796284381").click() # Except Exception, e: print PROXY print e continueCopy the code

Selenium should be familiar to anyone who has done Web automation testing or crawlers. We added WebDriverWait because in practice it takes a while to wait for a page to load. At least it works.





Look for the ID of the like button

The script took two after-hours nights to compile and debug. The script was not written in vain. After I gave it to my colleague, I drank my first cup of Starbucks.