Requests is a simple, easy-to-use http-client library that was one of the TOP10 most popular early python projects on github. Requests is an elegant and simple HTTP library for Python, built for human beings. Exaggerate to the point that a human would use requests :). Let’s read the source code and learn how it is implemented. The whole document is divided into the following parts:

  • The project structure
  • The API module
  • Sessions module
  • Models module
  • Adapters module
  • tip

The project structure

Clone the project from Github, use the log command to view the historical information, find the tag=2.24.0, and switch the version:

git checkout 0797c61fd541f92f66e409dbf9515ca287af28d2
Copy the code

You can use the following methods to easily determine the amount of code, so that you can feel more satisfied after reading.

➜ requests git: 0797 c61f ✗ find requests -name "*. Py" | xargs cat | grep -v ^ $| wc - l # 4000Copy the code

With a glance at the project structure and code, we can see what each module does:

The name of the describe
adapters.py Responsible for HTTP connection processing, mainly adapted from urllib3 library
api API interface
auth HTTP authentication
certs HTTPS Certificate Processing
compat Python adaptation package
cookies Cookie handling
help help
hook Hook system
models The data model
packages Compatibility package correlation
sessions The session handle
status_codes The HTTP status code
structures The data structure
utils tool

More than 4000 lines of code, more than 10 modules, to comb all the workload is not small, difficult. In this article we still focus on the main line, for the side and minor details can not be very understanding.

The API module

Let’s start with an example of requests:

>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass')) >>> r.status_code 200 >>> r.headers['content-type'] 'application/json; charset=utf8' >>> r.encoding 'utf-8' >>> r.text '{"type":"User"... ' >>> r.json() {'private_gists': 419, 'total_private_repos': 77, ... }Copy the code

The above usage method is provided by the API:

# api.py

def request(method, url, **kwargs)
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)

def get(url, params=None, **kwargs):
    kwargs.setdefault('allow_redirects', True)
    return request('get', url, params=params, **kwargs)
    
    ...
Copy the code

The get-Request API is packaged in a manner similar to the Redis source code we have read before, making it more secure and convenient for users. Request gets a session from the session context and sends the request using session.request.

HTTP OPTIONS, HEAD, POST, PUT, PATCH, and DELETE methods are also wrapped in the API.

sessions

Creation and context of the Sessions. Py object:

# sessions.py

class Session(SessionRedirectMixin):
    
    def __init__(self):
        self.headers = default_headers()
        self.cookies = cookiejar_from_dict({})

        # Default connection adapters.
        self.adapters = OrderedDict()
        ...
        self.mount('https://', HTTPAdapter())
    
    def mount(self, prefix, adapter):
        self.adapters[prefix] = adapter
        
    def __enter__(self):
        return self

    def __exit__(self, *args):
        for v in self.adapters.values():
            v.close()
Copy the code

During session initialization, the default HTTP-header, http-cookie information, and HTTPAdpater object are created. __enter__ and __exit__, which are context decorator functions, can be used to ensure that the adapter is closed.

Use the request method to send a request:

def request(self, method, url,
        params=None, data=None, headers=None, cookies=None, files=None,
        auth=None, timeout=None, allow_redirects=True, proxies=None,
        hooks=None, stream=None, verify=None, cert=None, json=None):
    req = Request(
        method=method.upper(),
        url=url,
        headers=headers,
        files=files,
        data=data or {},
        json=json,
        params=params or {},
        auth=auth,
        cookies=cookies,
        hooks=hooks,
    )
    ...
    prep = PreparedRequest()
    prep.prepare(
        method=request.method.upper(),
        url=request.url,
        files=request.files,
        data=request.data,
        json=request.json,
        headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict),
        params=merge_setting(request.params, self.params),
        auth=merge_setting(auth, self.auth),
        cookies=merged_cookies,
        hooks=merge_hooks(request.hooks, self.hooks),
    )
    ...
    adapter = self.get_adapter(url=request.url)
    ...
    resp = adapter.send(prep, **send_kwargs)
    return resp
Copy the code

The processing flow of the request function is divided into four steps:

  1. Encapsulate the Request object with Request parameters
  2. Generate a PreparedRequest object and preprocess the Request object
  3. Obtain the corresponding HTTP/HTTPS protocol adapter and use its SEND method to send the request
  4. Return the obtained Response object

models

In the process of making the Request, Request and PreparedRequest objects are created, and Response objects are returned from adpater. The concrete implementation of these three objects are in the models.py module.

class Request(RequestHooksMixin):
    
    def __init__(self,
            method=None, url=None, headers=None, files=None, data=None,
            params=None, auth=None, cookies=None, hooks=None, json=None):
        
        ...
        self.hooks = default_hooks()
        for (k, v) in list(hooks.items()):
            self.register_hook(event=k, hook=v)

        self.method = method
        self.url = url
        self.headers = headers
        self.files = files
        self.data = data
        self.json = json
        self.params = params
        self.auth = auth
        self.cookies = cookies
        ...
Copy the code

Request object creation is relatively simple, which is to do some attribute assignment, and then check the external injected hook to ensure that the function and function set can be executed.

def register_hook(self, event, hook): """Properly register a hook.""" if event not in self.hooks: raise ValueError('Unsupported event specified, with event name "%s"' % (event)) if isinstance(hook, Callable): ## hook iS a function self.hooks[event]. Append (hook) elif hasattr(hook, '__iter__'): Self.hooks [event].extend(h for h in hook if isinstance(h, Callable))Copy the code

The PreparedRequest object does more validation and preparation for external parameters:

class PreparedRequest(RequestEncodingMixin, RequestHooksMixin):
    
    ...
    
    def prepare(self,
        method=None, url=None, headers=None, files=None, data=None,
        params=None, auth=None, cookies=None, hooks=None, json=None):
    """Prepares the entire request with the given parameters."""

        self.prepare_method(method)
        self.prepare_url(url, params)
        self.prepare_headers(headers)
        self.prepare_cookies(cookies)
        self.prepare_body(data, files, json)
        self.prepare_auth(auth, url)

        ...
        hooks = hooks or []
        for event in hooks:
            self.register_hook(event, hooks[event])
Copy the code

You can see the PreparedRequest object passing by:

  • Preparing HTTP methods
  • Prepare the url
  • Prepare the header
  • Prepare a cookie
  • Prepare the HTTP body
  • Prepare certification
  • Accepts the hooks brought on the Request object

Hook Let’s go into more detail at the end. Here we use prepare_headers as an example to see what happens during the validation process:

def prepare_headers(self, headers): "" If the website is set to make a website that is set to make HTTP headers.""" self.headers = CaseInsensitiveDict() for header in headers.items(): # Raise exception on invalid header value.check_header_validity (header) # Raise exception on invalid header value.check_header_validity (header) Headers [to_native_string(name)] = value #Copy the code

Response object mainly simulates file operation, RAW retains binary data stream, Content property is to obtain all binary data, text property is to encode binary data into text, json method is to serialize text.

CONTENT_CHUNK_SIZE = 10 * 1024 # 10k class Response(object): def __init__(self): File-like object representation of response (for advanced usage). #: Use of ``raw`` requires that ``stream=True`` be set on the request. #: This requirement does not apply for use internally to Requests. self.raw = None @property def content(self): """Content of the response, in bytes.""" ... self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b'' ... return self._content @property def text(self): content = str(self.content, encoding, errors='replace') return content def json(self, **kwargs): ... return complexjson.loads(self.text, **kwargs)Copy the code

Requests preferentially serialized JSON using SimpleJSON

The iter_content function uses a generator to iteratively retrieve data from the stream. As to how the flow is derived, see the Adapter implementation later.

def iter_content(self, chunk_size=1, decode_unicode=False):
    def generate():
            # Special case for urllib3.
            if hasattr(self.raw, 'stream'):
                try:
                    for chunk in self.raw.stream(chunk_size, decode_content=True):
                        yield chunk
    stream_chunks = generate()
    return stream_chunks
Copy the code

Adapters module

How exactly is the HTTP request sent? This is mainly in the HTTPAdapter:

class HTTPAdapter(BaseAdapter): def __init__(self, pool_connections=DEFAULT_POOLSIZE, pool_maxsize=DEFAULT_POOLSIZE, max_retries=DEFAULT_RETRIES, pool_block=DEFAULT_POOLBLOCK): ... Self. poolmanager = poolmanager (num_pools=connections, maxsize=maxsize, block=block, strict=True, **pool_kwargs) def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None): Conn = self. Poolmanager. Connection_from_url # (url) to obtain the connection url = self. Request_url (request, proxies) self.add_headers(request, stream=stream, timeout=timeout, verify=verify, cert=cert, Method =request. Method, url=url, body=request. headers=request.headers, redirect=False, assert_same_host=False, preload_content=False, decode_content=False, retries=self.max_retries, timeout=timeout ) return self.build_response(request, resp) def close(self): Self.poolmanager.clear () # Connection pool closedCopy the code

In this article, we will not go into the implementation of the urllib3 library. We will focus on how to generate a Response object:

def build_response(self, req, resp): response = Response() # Fallback to None if there's no status_code, for whatever reason. response.status_code = getattr(resp, 'status', None) # Make headers case-insensitive. response.headers = CaseInsensitiveDict(getattr(resp, 'headers', {})) # Set encoding.response. encoding = get_encoding_from_headers(response.headers) response.raw = resp # binary stream response.reason = response.raw.reason if isinstance(req.url, bytes): response.url = req.url.decode('utf-8') else: response.url = req.url # Add new cookies from the server. extract_cookies_to_jar(response.cookies, req, resp) # Give the Response some context. response.request = req response.connection = self return responseCopy the code
  • Resp is an HTTPResponse implementation of URllib3
  • Cookie is a combination of Request and Response
  • Response also references the PreparedRequest object, which makes it easier to use Response

The process of using Requests for HTTP requests, which focuses on the four modules above, now has some understanding of the core process. HTTPS is based on HTTP, do more authentication and other work. A quick review of the request execution process:

  1. An easy-to-use API wrapped in an API
  2. The process is processed in the Session
  3. Request and PreparedRequest preprocess the Request
  4. Response Encapsulates the Response, providing easy-to-use methods (JSON) and data (OK)

tip

There is also some code in the Requests library that makes it easier to use and you can learn from it.

Json indent output

Indent can be indent and sort_keys can be sorted in json output.

# help.py

"""Pretty-print the bug information as JSON."""
print(json.dumps(info(), sort_keys=True, indent=2))
Copy the code

Here are some examples and demonstrations:

a = { "name": "game404", "age": 1} print(json.dumps(a)) print(json.dumps(a, sort_keys=True, indent=2)) 2} { "age": 2, "name": "game404" }Copy the code

structures

Two data structures are defined in the Structures module. Ordinary Python dictionaries cannot use the. Value. Need to define objects:

# structure.py a = {"name":"game404"} # print(a.name) # AttributeError print(a["name"]) # define a data structure object class Person(object): def __init__(self, name): self.name = nameCopy the code

LookupDict can be used without defining object attributes. Value, which is handy in some configuration classes:

class LookupDict(dict): """Dictionary lookup object.""" def __init__(self, name=None): self.name = name super(LookupDict, self).__init__() def __repr__(self): return '<lookup \'%s\'>' % (self.name) def __getitem__(self, key): # We allow fall-through here, so values default to None # Return self.__dict__. Get (key, None) def get(self, key, default=None): return self.__dict__.get(key, default a = LookupDict(name="game404") a["motto"] = "Life is short, you need Python" a.age = 2 print(a["motto"], a.age, a["age"]) # none, 2, 2Copy the code

CaseInsensitiveDict defines a case-insensitive dictionary for handling HTTP-headers:

class CaseInsensitiveDict(MutableMapping): def __init__(self, data=None, **kwargs): Self._store = OrderedDict() # Use extra _store to store data if data is None: data = {} self.update(data, **kwargs) def __setitem__(self, key, value): # Use the lowercased key for lookups, but store the actual # key alongside the value. self._store[key.lower()] = (key, Def __delitem__(self, key): del self._store[key.lower()] cid = CaseInsensitiveDict() cid['Accept'] = 'application/json' print(cid['aCCEPT'] == 'application/json') # TrueCopy the code

You can see that the __dict__ of the CaseInsensitiveDict object is actually wrapped with a layer of _store:

print(cid.__dict__)  # {'_store': OrderedDict([('accept', ('Accept', 'application/json'))])} 
print(cid._store)  # OrderedDict([('accept', ('Accept', 'application/json'))])
Copy the code

status_codes

Status_codes define the semantic names of HTTP status codes. For example, OK is a semantic expression of 200, and the OK state can be seen by people who do not understand HTTP.

print(requests.codes["ok"], requests.codes.OK, requests.codes.ok, requests.codes.OKAY)  #200 200 200 200
print(requests.codes.CREATED)  # 201
print(requests.codes.found)  # 302
Copy the code

Its implementation method is mainly as follows:

# statuc_codes.py codes = LookupDict(name='status_codes') for code, titles in _codes.items(): for title in titles: Setattr (codes, title, code) # Default key if not title. Startswith (('\\', '/')): setattr(codes, title.upper(), code) # uppercase keyCopy the code

hook

Hooks provides a simple hook system, you can name register multiple handlers for an event register_hook (front), and then trigger at the right time can get the data processing, data processing like Linux pipe symbol | :

= ['response'] def default_hooks(): [] for event in HOOKS} def dispatch_hook(key, hooks, hook_data, **kwargs): """Dispatches a hook dictionary on a given piece of data.""" hooks = hooks or {} hooks = hooks.get(key) if hooks: If hasattr(hooks, '__call__'): _hook_data = hook(hook_data, **kwargs) # if _hook_data is not None: hook_data = _hook_data return hook_dataCopy the code

Use in:

class Session(SessionRedirectMixin):

    def send(self, request, **kwargs):
        ...
        r = adapter.send(request, **kwargs)
        # Response manipulation hooks
        r = dispatch_hook('response', hooks, r, **kwargs)
    
Copy the code

When the session receives the request, it triggers a predefined hook for further processing of the response.

The original article: game404. Making. IO/post/python…

Refer to the link

  • Requests. Readthedocs. IO/zh_CN/lates…
  • urllib3.readthedocs.io/en/latest/
  • Gist.github.com/kennethreit…