Crawler can’t break through even with domestic proxy IP. What’s going on here? Is the domestic proxy IP not good to use, or crawler problem?

A lot of friends in the use of the high quality and stable proxy IP, control the access speed and frequency, set the UserAgent, Referer and a series of methods, found the crawler work or will encounter all sorts of bad situation, lead to the crawler work is always not so smooth, not efficient climb took large amounts of data, finish the daily work tasks, What’s the problem, and what’s a good solution?

Each site anti – crawling strategy is not the same, so need to specific problems specific analysis. However, some basic operations still need to be done, IPIDEA global agent reminds you of several points:

First, use high-quality proxy IP;

Second, set the header information, not only the UserAgent and Referer values, but also many other header values. Open developer mode in your browser (press F12) and browse the url to view them.

Third, deal with Cookies, save the Cookies information, and then bring Cookies with the next request;

Fourth, if you can’t crawl the data through headers and cookies, you can consider simulated browser collection, the common technique being PhantomJS.

If you meet with a domestic proxy IP crawler can not break through, through the above four steps, basically will not climb the data.