Many friends like to use Selenium + Chromedriver in the process of developing crawlers, thinking that they can avoid being found by the anti-crawler mechanism of websites.

Leaving aside taobao’s anti-crawler strategy based on user behavior, it’s just a simple little website that uses a single line of Javascript code to easily identify if you’re using Selenium + Chromedriver to simulate your browser.

Let’s look at an example.

Use this code to launch the Chrome window:

from selenium.webdriver import Chrome

driver = Chrome()
Copy the code

Now, open developer Tools in this window and navigate to the Console TAB, as shown in the figure below.

Now, enter the following JS code in this window and press enter:

window.navigator.webdriver
Copy the code

As you can see, the developer tool returns true. See the figure below.

However, if you open a normal Chrome window and execute the same command, you’ll see that this line of code returns undefined, as shown below.

So, if the website gets this parameter through JS code, returning undefined means the browser is normal, and returning true means Selenium is being used to simulate the browser. Catch a good catch. Here is an example of js code for testing Selenium:

webdriver = window.navigator.webdriver;
if(webdriver){
	console.log('You idiot think you can use Selenium to simulate a browser? ')}else {
	console.log('Normal Browser')}Copy the code

By running this JS code while the page is loading, the site can identify whether the visitor is using Selenium emulation browser. If so, disable access or trigger other anti-crawler mechanisms.

So in this case, how do you prevent this parameter from telling the site that you’re simulating a browser during crawler development?

Some javascript friends may think that they can hide themselves by overwriting this parameter, but in fact this value cannot be overridden:

If you are more proficient in JS, you might use this code:

Object.defineProperties(navigator, {webdriver: {get:(a)= >undefined}});
Copy the code

The running effect is shown in the figure below:

And it worked. Is it safe to write this way? Is not the case, if you by clicking on the link, enter the url in browser to enter another page, or open a new window, you will find that the window. The navigator. Webdriver and turned out to be true. See the figure below.

So if we can in each page after open, again through the webdriver implementation on js code above, so as to realize the window in every page. The navigator. Webdriver set to undefined? Also not line.

Because when you execute: driver.get(url), the browser opens the site, loads the page, and runs the js code that comes with the site. So when you resize the window. The navigator. Webdriver before actually website would have already know your browser is simulated.

Next, another friend suggested that you can solve this problem by writing a Chrome plug-in, so that the JS code inside the plug-in is executed before all the JS code that comes with the website.

Of course you can do that, but there’s an easier way to fix the problem by simply setting the Boot parameters of the Chromedriver.

Before starting Chromedriver, enable the excludeSwitches parameter of Chrome. The value of excludeSwitches is [‘ enableautomation ‘]. The complete code is as follows:

from selenium.webdriver import Chrome
from selenium.webdriver import ChromeOptions

option = ChromeOptions()
option.add_experimental_option('excludeSwitches'['enable-automation'])
driver = Chrome(options=option)
Copy the code

The Chrome window that launches at this point will pop up a message in the upper right corner. Ignore it and don’t hit the stop button.

Again in the developer tools of the Console TAB in the query window. The navigator. Webdriver, this value can be found automatically become undefined. And whether you open a new page, open a new window, or click on a link to another page, it doesn’t make it true. The running effect is shown in the figure below.

As of 20:46 on February 12, 2019, the method described in this article can be used to log in zhihu. If you use Selenium to log in to Zhihu directly, a verification code will pop up. If you use the method in this article and log in to Zhihu, you can successfully disguise yourself as a real browser and no verification code will pop up.

In fact, this is not the only feature that Selenium + Webdriver can recognize. For details on how to hide other features, please follow my wechat official account.