Transfer:

Why didn’t your tunnel agent change IP

Some partners feedback in the client using the tunnel proxy dynamic version (each request to change the IP) found that the IP was not changed, so they suspected that our tunnel proxy server is the problem, in fact, it is not. Today we are here to talk about why there will be a tunnel agent will not change the IP situation.

A persistent connection

It is important to understand that HTTP/1.1 (and various enhancements to HTTP/1.0) allow HTTP to keep a TCP connection open (and not immediately closed) after the request has been processed, so that previously established connections can be reused for future HTTP requests. A TCP connection that remains open after the completion of request processing is said to be a persistent connection. Non-persistent connections are closed at the end of each request. Persistent connections remain open between requests until either the client or the server decides to close them. By reusing idle persistent connections that have been opened to the target server, you can avoid the slow connection setup phase. Moreover, an already open connection can avoid the congestion adaptation phase of a slow start so that data can be transferred more quickly.

HTTP client example

From the above content, we can learn that turning on Keep-Alive can carry out data transmission more quickly. The most common usage scenario is when the browser opens a web page, instead of opening a TCP connection for every resource, it will open a small number of TCP connections at the same time, using the keep-alive mechanism to continuously use a small number of connections to transmit most HTTP requests. Speed up access by removing the overhead of making and closing connections.

In some programming languages developed HTTP client, crawler framework, in addition to the use of asynchronous way to send requests, but also use TCP connection reuse to speed up the processing of user requests.

Python-Requests

In addition to saving cookies automatically when sending Requests using the Session provided by Requests, the connection pooling provided by URLLIB3 is also used.

For example, configure the tunneling agent dynamic version with Requests Session to send three requests in a row and print out the currently used proxy IP

import time
import requests
 
 
username = "txxxxxxxxxxxxx"
password = "password"
tunnel = "tpsXXX.kdlapi.com"
proxies = {
 "http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel},
 "https": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": tunnel}
}
s = requests.session()
for i in range(3):
    res = s.get('https://dev.kdlapi.com/testproxy', proxies=proxies)
    print(res.text)
    time.sleep(1)
sucess! client ip: 175.7.196.238 
sucess! client ip: 175.7.196.238 
sucess! client ip: 175.7.196.238 

It can be found that the three times are the same IP, and using Wireshark to grab the packet to check, it is obvious that the three requests are through a TCP connection.

Python-Scrapy

Scrapy the underlying use of the Twisted asynchronous network programming framework, in the Twisted source code can be found for the implementation of connection reuse

twisted/web/_newclient.py

Class HTTP11ClientProtocol

_finishResponse_WAITING()

• When you get the response from the target site, determine if the connection in the response header is close in the _finishResponse_Waiting function • Call self._giveup () for close, Directly to close the connection, otherwise call self. Transport. ResumeProducing (), continue to reuse the TCP connection, read the response or sending HTTP requests.

why

The underlying implementation of Tunneling Agent Dynamic Edition is that requests can only be forwarded to different proxy servers if a new connection is established. If the HTTP client using the proxy does not close the TCP connection directly after receiving the response, subsequent HTTP requests may continue to be made on the same TCP connection, resulting in multiple HTTP requests using the same proxy IP.

The solution

So how do you proactively close the connection after the request has ended? In HTTP/1.0, Keep-Alive is not used tacitly. The client must send a Connection: Keep-Alive request header to activate the Keep-Alive Connection. While keep-alive is enabled by default in HTTP/1.1, by default all connections in HTTP1.1 are maintained unless specified in the request header or response header to Close the Connection: Connection: Close at the end of the response. In general, you simply add Connection: Close to the request header. When the target server recognizes it, it also adds Connection: Close to the response header and closes the Connection after sending the response. So if you are not sure whether the HTTP client you are using will Close the request after it finishes, you can proactively add: Connection: Close to the header that sent the request

The same Python-Requests code

headers = {"Connection": "close"}
s = requests.session()
for i in range(3):
    res = s.get('https://dev.kdlapi.com/testproxy', proxies=proxies, headers=headers)
    print(res.text)
    time.sleep(1)
sucess! client ip: 121.205.214.213 
sucess! client ip: 27.148.203.221 
sucess! client ip: 114.99.131.98 

With each request, the IP changed, Wireshark looked at the packet data again, and three HTTP requests, each time a new TCP connection was established.

conclusion

Using the Tunneling Proxy Dynamic Edition to find that the IP has not been changed is most likely due to the HTTP client reusing a previously established TCP connection to speed up network requests. We buy the tunnel with dynamic version is to change the IP every time the request, reuse the previous TCP Connection can not achieve the effect of changing the IP, just need to add Connection: Close in the request header, explicitly point out that the Connection transmission is completed immediately closed. Of course, if you don’t need to switch IP every time you request, using KEEP_ALIVE can speed up your request, depending on your business situation.

More good articles on crawler technology