DNS parsing process from Chrome source code

The DNS resolution function is to resolve the domain name into the corresponding IP address, because on a WAN, the router needs to know the IP address to know who to send the packet to. DNS stands for Domain Name System. It is a protocol that is described in RFC 1035. The specific process is shown in the figure below:

This process seems simple, but there are several problems:

(1) How does the browser know the DNS server, such as 8.8.8.8 in the figure above?

(2) a domain name can be resolved into multiple IP addresses, if there is only one IP address, in the case of a large number of concurrent, that server may explode?

(3) if the domain name is set to host, then the domain name is set to host.

(4) What is the effective time of domain name resolution, that is, how long after the same domain name needs to be resolved again?

(5) What is the A record, AAAA record, CNAME record of domain name resolution?

Curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl curl

The current DNS server IP address can be seen in the network Settings of the local computer, such as mine:

These two DNS servers are provided by a positive broadband connected to my home:

Google also provides two free DNS addresses for the public, 8.8.8.8 and 8.8.4.4. These two IP addresses are used to make it easier to remember. If your DNS service is not working properly, try changing them to these two IP addresses.

How do the devices that access the network get these IP addresses? When a device is connected to a router, the router assigns an IP address to it through DHCP and tells it to the DNS server. The DHCP Settings of the router are as follows:

You can observe the packet capture process using wireshark:

When my computer is connected to wifi, it sends a DHCP Request broadcast. Upon receiving this broadcast, the router assigns an IP address to my computer and notifies the DNS server.

/ /resolver.conf/res_ninit/res_ninit/res_ninit/res_ninit/res_ninit/res_ninit/res_ninit

#
# Mac OS X Notice
#
# This file is not used by the host name and address resolution
# or the DNS query routing mechanisms used by most processes on
# this Mac OS X system.
#
# This file is automatically generated.
#Search DHCP HOST nameserver 59.108.61.61 nameserver 219.232.48.61Copy the code

The search option is used to add suffixes to a domain name that cannot be resolved, such as ping hello.DHCP/hello.HOST.

Chrome gets the DNS server configuration at startup based on the operating system and puts it in the DNSConfig nameservers:

  // List of name server addresses.
  std: :vector<IPEndPoint> nameservers;Copy the code

Chrome also listens for network changes and synchronizes configuration changes.

The nameservers list is then used to initialize a socket pool for sending requests. When domain name resolution is required, a socket is extracted from the socket pool and server_index is transmitted. The value is 0 when initialized, that is, the IP address of the first DNS service is obtained. If the resolution request fails twice, server_index + 1 uses the next DNS service.

    unsigned server_index =
        (first_server_index_ + attempt_number) % config.nameservers.size();
    // Skip over known failed servers.
    // The maximum number of attempts is 2, set during construction of DnsConfig
    server_index = session_->NextGoodServerIndex(server_index);Copy the code

If all nameservers fail, it takes the nameserver that failed the earliest.

In addition to reading the DNS server at startup, Chrome will also fetch, read, and parse the hosts file into the hosts property of DNSConfig, which is a hash map:

// Parsed results of a Hosts file.
//
// Although Hosts files map IP address to a list of domain names, for name
// resolution the desired mapping direction is: domain name to IP address.
// When parsing Hosts, we apply the "first hit" rule as Windows and glibc do.
// With a Hosts file of:
300.300.300.300 localhost # bad IP
/ / 127.0.0.1 localhost
/ / 10.0.0.1 localhost
// The expected resolution of localhost is 127.0.0.1.
using DnsHosts = std: :unordered_map<DnsHostsKey, IPAddress, DnsHostsKeyHash>;Copy the code

On Linux, the hosts file is in /etc/hosts:

const base::FilePath::CharType kFilePathHosts[] =
    FILE_PATH_LITERAL("/etc/hosts");Copy the code

Reading this file is tricky, requiring processing line by line and doing some illegal things like commenting the code above.

Then there are two configurations in DNSConfig, one is hosts and the other is Nameservers. DNSConfig is combined with DNSSession, as shown in the following figure:

The session layer is used to manage server_index and socket pool, such as allocating sockets. The session layer is used to manage server_index and socket pool. Session initializes config, which is used to read the local bound hosts and Nameservers configurations. Each of these layers has its own responsibilities.

An important feature of resolver is that it combines a job to create a task queue. Resolver also put together a Hostcache, it is put the analytic results of the cache, if the cache cache hit, you don’t have to go to parse, this process is that the external adjustable rosolver provide HostResolverImpl: : Resolve interface, This interface determines whether it can be processed locally:

  int net_error = ERR_UNEXPECTED;
  if (ServeFromCache(*key, info, &net_error, addresses, allow_stale,
                     stale_info)) {
    source_net_log.AddEvent(NetLogEventType::HOST_RESOLVER_IMPL_CACHE_HIT,
                            addresses->CreateNetLogCallback());
    // |ServeFromCache()| will set |*stale_info| as needed.
    return net_error;
  }

  // TODO(szym): Do not do this if nsswitch.conf instructs not to.
  // http://crbug.com/117655
  if (ServeFromHosts(*key, info, addresses)) {
    source_net_log.AddEvent(NetLogEventType::HOST_RESOLVER_IMPL_HOSTS_HIT,
                            addresses->CreateNetLogCallback());
    MakeNotStale(stale_info);
    return OK;
  }

  return ERR_DNS_CACHE_MISS;Copy the code

CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS: CACHE_MISS If the return value is not CACHE_MISS, return:

  if(rv ! = ERR_DNS_CACHE_MISS) { LogFinishRequest(source_net_log, info, rv); RecordTotalTime(info.is_speculative(),true, base::TimeDelta());
    return rv;
  }Copy the code

Otherwise, create a job and see if it can be executed immediately. If there are too many job queues, add it to the job queue and pass a successful callback handler.

So this is basically the same as our cognition, first check the cache, then check the hosts, if not, then query. If the cache is stale (staled), null will be returned during cache query. The stale criterion is as follows:

    bool is_stale(a) const {
      return network_changes > 0 || expired_by >= base::TimeDelta();
    }Copy the code

That is, the network changes or the value of expired_BY is greater than 0. The time difference is the current time minus the current cache expiration time:

stale.expired_by = now - expires_;Copy the code

The expiration time is initialized using the value of now + TTL, which is the TTL returned from the last request resolution:

 uint32_t ttl_sec = std::numeric_limits<uint32_t>::max();
 ttl_sec = std::min(ttl_sec, record.ttl);
 *ttl = base::TimeDelta::FromSeconds(ttl_sec);Copy the code

The code above does an overflow prevention. This TTL can be visually seen in wireshark DNS response:

The TTL of the current domain name is 600s, that is, 10 minutes. This can be set at the provider of the domain name:

In addition, you can see that the record type is A, what is A, as shown below:

When adding resolution, you can see that A resolves to an IPv4 address, AAAA resolves to an IPv6 address, and CNAME resolves to another domain name. The advantage of using a CNAME is that when many other domain names point to a CNAME, when you need to change the IP address of the CNAME, you can change the address of the CNAME, and the rest will take effect, but you have to do a second resolution.

If the domain name cannot be resolved locally, Chrome will send the request. The operating system provides a system function called getAddrInfo for domain name resolution, but Chrome does not use it. Instead, it implements a DNS client that encapsulates DNS Request messages and resolves DNS Response messages. This is probably because there is more flexibility, for example Chrome can decide how to use nameservers, the order, the number of failed attempts, etc.

Start parsing in resolver’s startJob. Get to the next queryId, then build a Query, then build a DnsUDPAttempt, and then execute its start, because the DNS client queries using UDP packets (the secondary DNS queries using TCP to the primary DNS) :

uint16_t id = session_->NextQueryId();
std: :unique_ptr<DnsQuery> query;
query.reset(new DnsQuery(id, qnames_.front(), qtype_, opt_rdata_));

DnsUDPAttempt* attempt =
    new DnsUDPAttempt(server_index, std::move(lease), std::move(query));
int rv = attempt->Start(
    base::Bind(&DnsTransactionImpl::OnUdpAttemptComplete,
               base::Unretained(this), attempt_number,
               base::TimeTicks::Now()));Copy the code

The parsing process is broken down into several steps. The code is organized like this, with a state determining the order of execution:

int rv = result;
do {
  // The initial state is STATE_SEND_QUERY
  State state = next_state_;
  next_state_ = STATE_NONE;
  switch (state) {
    case STATE_SEND_QUERY:
      rv = DoSendQuery();
      break;
    case STATE_SEND_QUERY_COMPLETE:
      rv = DoSendQueryComplete(rv);
      break;
    case STATE_READ_RESPONSE:
      rv = DoReadResponse();
      break;
    case STATE_READ_RESPONSE_COMPLETE:
      rv = DoReadResponseComplete(rv);
      break;
    default:
      NOTREACHED();
      break; }}while(rv ! = ERR_IO_PENDING && next_state_ ! = STATE_NONE);Copy the code

State changes from the state of the first case to the state of the second case, to the state of the third case in the second case, and so on, until the state of the while loop or ERR terminates the current transaction. So this code organization is kind of interesting.

When the result is successfully parsed, the result is stored in the cache:

    if (did_complete) {
      resolver_->CacheResult(key_, entry, ttl);
      RecordJobHistograms(entry.error());
    }Copy the code

An addressList is then generated and passed to the corresponding callback, since DNS parsing may return multiple results, such as the following:

In this example, we do not use Chrome to print the wireshark output. Instead, we use The Wireshark output to directly view the wireshark output.

This article briefly introduces the DNS resolution process and some related concepts of DNS, I believe that here, should be able to answer the above several questions. Basically, the client initiates a query to the DNS server, and the DNS server returns a response. When the device is connected to the network, the router sends DNS nameservers to the device through DHCP. Chrome queries the DNS server based on the nameservers sequence and caches the results. The validity time is based on TTL. There are several types of DNS resolution results. The most common are A record and CNAME record. A record indicates that the result is an IP address, and CNAME indicates that the result is another domain name.

This article does not go into detail, but the core concepts and logical processes should be covered.

DNS parsing process from Chrome source code

Related Posts

August closure | more challenges

Deep Unlock Webpack series (Advanced)

A comprehensive understanding of HTTP caching