preface

When it comes to learning, I don’t know if you have a desire for knowledge, the ultimate pursuit of technology. I don’t know if you do, but maybe I do. Of course, learning is to acquire knowledge and summarize the learning results, so as to be handy in future work applications. The idea of writing a book came from a passion to help more people like us learn together. Then they all came together to write a book about their thirst and passion for knowledge.

In the book, the author from each “network school” and “basic services in the middle of the platform” a number of experts, before the creation of this book, the organization of NGINX underlying source code reading and debugging. After several months of learning and practice, I accumulated and precipitated this book. The analysis part of RTMP module in this book has also been applied to many live broadcast peaks and experienced millions of online live broadcast verification.

Preliminary knowledge

Before learning the NGINX source code can be a preliminary understanding of some knowledge, master these knowledge, so easy to learn and understand the knowledge.

  • C/C++ Basics: The first step is to master the basic language of C/C++. In the process of reading the source code, you can understand the semantics, syntax, and relevant business logic. It is easy to learn the related knowledge of NGINX.
  • GDB Debugging: Some of the debugging snippets in this book are debugged using GDB. Knowing the GDB Debugger also makes it easy to debug Nginx processes and some of its logic.
  • Nginx Basic Uses: Learn Nginx1.16 if you have learned some of the uses of Nginx as you read. In the process of reading, can play some help.
  • Basic knowledge of HTTP protocol and network programming.

co-author

  • Team: Nie Songsong

Responsible for the development and architecture of the core live broadcasting system of the university. Graduated from Northeastern University, majored in computer science and technology, more than 9 years of audio, video and streaming media related work experience, proficient in NGINX, FFMPEG technology stack.

  • Basic service center: Zhao Yu

Good future back-end senior development, once participated in self-entrepreneurship. Currently responsible for the future cloud container platform Kubernetes component development, under the container IaaS team. Familiar with PHP, Ngnix, Redis, MySQL and other source code implementation, willing to study technology.

  • Team: Shi Hongbao

Good future backend development expert, master of Southeast University, have a deep understanding of Redis, Nginx, MySQL and other open source software, familiar with C/C++, Golang development, love to study technology, coauthor <<Redis5 design and source analysis >>.

  • Network school team: Jing Luo

Open source enthusiast, senior technical expert, used to be senior R&D engineer of big data of sohu Group and R&D engineer of sina weibo, 7-year experience in terminal architecture, familiar with source code implementation of PHP, Nginx, Redis, MySQL, etc., good at high concurrency processing and large-scale website architecture, practice of “building a learning team”.

  • Online school team: yellow peach

Senior technical expert, 8 years of terminal work experience, good at high performance website service architecture and stability construction, author of “PHP7 low-level design and source code implementation” and other books.

  • Online school team: Li Le

I am an expert in PHP development. I have a master’s degree from Xidian University. I am interested in studying the technology and source code, and have a deep understanding of Redis and Nginx. Co-author of Redis5 Design and Source Analysis.

  • Basic service center: zhang bao

Good future group access layer gateway direction, have a deep understanding of Nginx, Tengine, OpenResty and other high-performance Web servers, proficient in large site architecture and traffic scheduling system design and implementation.

  • Online school team: Yan Chang

Good future backend development expert, deep knowledge in the field of information security for many years, deep understanding of server development under Linux, good at the implementation of high concurrency business.

  • Online school team: Tian Feng

Head of Research and Development Department of Xueersi Online School. More than 13 years of experience in the Internet industry, successively engaged in research and development and technical team management in sogou, Baidu, 360 and Good Future companies, with rich experience in high-performance service architecture design and complex business system development.

Learning guide

In the process of learning, most people may pay more attention to the issue of income. On the one hand are the hard skill benefits of the technology, on the other hand are the high order enhancements.

When it comes to learning, there are some learning paths that can be organized. Here’s a diagram to see what it takes to learn the Nginx source code. And how to learn, as shown in Figure 1-1.



</center> > Figure 1-1 Nginx learning outline </center>

From the above diagram, you can clearly see what you need to learn to learn the NGINX source code.

At the beginning of learning, you can first understand the NGINX source code and compilation installation and NGINX architecture foundation and design idea, from the advantages of NGINX, source structure, process model and other aspects of NGINX. Then learn Nginx memory management, from the memory pool, shared memory expansion of Nginx memory management and use. Then you can expand the data structure learning of NGINX, and use the data structure and algorithm of strings, arrays, linked lists, queues, hashes, red-black trees and base trees respectively.

After learning the data structures, you can learn the NGINX configuration parsing, learn through the Main, Events, and HTTP configuration blocks, and then learn the entire NGINX configuration parsing process. Next, you can learn the process mechanism, through the process mode, the master process, the worker process, and the process building communication mechanism to fully understand the NGINX process management. Then in learning the HTTP module, through the module initialization process, request parsing, 11 stages of HTTP processing, as well as HTTP request response, master the processing process of HTTP module.

Upsteam: Upstream initialization, Upstream and downstream setup, long connections, and the FastCGI module.

Then you can learn some modules, such as NGINX time module implementation, NGINX event model of the file event, time event, process pool, connection pool and other event processing flow. The second is NGINX load balancing, current limiting, log and other modules.

If you want to use Nginx across platforms, you can learn about the cross-platform implementation, the configure compile file for Nginx, and the cross-platform atomic operation lock.

Interested in live broadcasting, you can also learn the RTMP implementation of NGINX live broadcasting module. Through the RTMP protocol and the module processing process, you can further understand the RTMP module implementation.

Infrastructure and design concepts

Since its birth, NGINX has been famous for its high performance, high reliability and easy expansion, which benefits from its many excellent design concepts. This chapter stands on the macro point of view to appreciate the beauty of NGINX architecture design.

Nginx process model

Nowadays, most systems need to deal with massive user traffic, and people are paying more and more attention to the high availability, high throughput, low latency, low consumption and other characteristics of the system. At this time, the small and efficient NGINX has come into the view of everyone, and soon has been favored by people. Nginx’s new process model and event-driven design make it inherently easy to handle C10K or even C100K high concurrency scenarios.

Nginx uses the Master management process (management process Master) and Worker Worker process (Worker process Worker) designs, as shown in Figure 2-1.



Figure 2-1 Master-Worker process model </center>

The Master process is responsible for managing workers and controlling workers’ actions through signals or pipes. When a Worker exits abnormally, the Master process will start a new Worker process to replace it. Worker is a process that really handles user requests. All Worker processes are equal, and they achieve load balance through some inter-process communication mechanisms such as shared memory and atomic operation. Symmetrical Multi-Processing (SMP) model utilizes the concurrent Processing capability of a multicore architecture of Symmetrical multi-processing to ensure service robustness.

Based on the same multi-process model, there are several reasons why Nginx has such strong performance and ultra-high stability.

  • Asynchronous non-blocking:

The Worker process of Nginx works in asynchronous non-blocking mode in the whole process, from the establishment of TCP connection to the reading of the request data in the kernel buffer, to the processing of the request by each HTTP module, or the forwarding of the request to the upstream server during the reverse proxy, and finally sending the response data to the user. The Worker process almost does not block. When a system call is blocked (such as I/O operation, but the operating system has not yet prepared the data), the Worker process will immediately process the next request. When the conditions are met, the operating system will inform the Worker process to continue to complete the operation. A request may require multiple stages to complete. However, as a whole, every Worker is always in a high working state, so Nginx only needs a very few Worker processes to handle a large number of concurrent requests. Of course, this is all thanks to Nginx’s fully asynchronous non-blocking event-driven framework, especially after Linux2.5.45, the operating system I/O multiplexing model of the new Epoll device, Nginx put on a new engine all the way to the top of the performance.

  • CPU bound

Usually, the number of workers configured with NGINX in the production environment is equal to the number of CPU cores. At the same time, workers will be bound to a fixed core through worker_cpu_affinity, so that each Worker can enjoy a single CPU core. In this way, frequent CPU context switching can be effectively avoided. It can also significantly increase the CPU cache hit rate.

  • Load balancing

When the client tries to establish a connection with the Nginx server, the operating system kernel returns the corresponding FD of the socket to Nginx. If every Worker strives to accept the connection, it will cause the famous “panic” problem, that is, only one Worker will successfully accept the connection in the end. Other workers are awakened by the system in vain, which is bound to reduce the overall performance of the system. In addition, if some workers are unlucky and accept failure all the time, while some workers are already very busy but accept success, the load imbalance among workers will result, and the processing capacity and throughput of NGINX server will also be reduced. Nginx solves these two problems with a global accept_mutex lock and a simple load balancing algorithm. First of all, each Worker will obtain the accept_mutex lock through ngx_trylock_accept_mutex without blocking before listening. Only the Worker that successfully grabs the lock will actually listen on the port and accept a new connection. The Worker who fails to grab the lock can only continue to process the events on the accepted connection. Second, NGINX designs a global variable for each Worker, ngx_accept_disabled, and initializes the value as follows:

ngx_accept_disabled = ngx_cycle->connection_n / 8 - ngx_cycle->free_connection_n

Where CONNECTION_N represents the total number of connections that each Worker can accept at the same time, free_connnection_n represents the number of free connections, and when the Worker process starts, the number of free connections is equal to the number of acceptable connections. That is, ngx_accept_disabled starts with a value of -7/8 * connection_n. When ngx_accept_disabled is positive, it indicates that the number of idle connections is less than 1/8 of the total number. At this time, it indicates that the Worker process is very busy. Therefore, it gives up the fight for the accept_mutex lock in this event loop and focuses on processing existing connections. At the same time, it will reduce its ngx_accept_disabled by one, and continue to judge whether it enters the lock grab link in the next event loop. The following code summary shows the algorithm logic:

if (ngx_use_accept_mutex) {
        if (ngx_accept_disabled > 0) {
            ngx_accept_disabled--;
        } else {
            if (ngx_trylock_accept_mutex(cycle) == NGX_ERROR) {
                return;
            }
 ……
    }

On the whole, this design is a little rough, but it is more simple and practical. To some extent, it maintains the load balance among Worker processes, avoids the refusal of service due to the exhaustion of resources by a single Worker, and improves the performance and robustness of NGINX server.

In addition, Nginx also supports single-process operation mode, but this mode does not take advantage of the CPU’s multi-core processing power and is usually only suitable for local debugging.

Nginx modular design

Nginx provides only a small amount of core code in the main framework, and a lot of powerful functionality is implemented in the modules. The module design fully follows the principle of high cohesion and low coupling. Each module only deals with the configuration items within its own responsibilities and focuses on completing a specific function. All types of modules achieve a unified interface specification, which greatly enhances the flexibility and scalability of NGINX.

Module classification

Nginx officially classifies many modules into five categories according to their functions, as shown in Figure 2-2.



Figure 2-2 Nginx module classification diagram </center>

1) Core modules: The most important class of modules in Nginx, Includes ngx_core_module, ngx_http_module, ngx_events_module, ngx_mail_module, ngx_openssl_module, ngx_errlog_module. Each core module defines modules of the same style type.

2) HTTP module: a type of module closely related to the processing of HTTP requests. HTTP module contains far more modules than other types of modules. A large number of rich functions of Nginx are basically realized through HTTP module.

3) Event module: It defines a series of event-driven modules that can run in different operating systems and different kernel versions. Nginx’s Event processing framework perfectly supports event-driven models provided by various operating systems, including epoll, poll, select, kqueue, eventport and so on.

4) Mail module: the module related to Mail service. Mail module enables Nginx to proxy IMAP, POP3, SMTP and other protocols.

5) Configuration module: Such modules only ngx_conf_module a member, but it is the basis of other modules, because other modules in front of the effect depends on configuration module processing configuration instructions and complete their preparation, configuration module to guide the function of all modules according to the configuration file, it is Nginx configurability, customizable and extensible.

Module interface

Despite the number and complexity of Nginx modules, developers are not bothered because all the modules follow the same ngx_module_t interface design specification, defined as follows:

struct ngx_module_s {
    ngx_uint_t            ctx_index;
    ngx_uint_t            index;
    char                   *name;
    ngx_uint_t            spare0;
    ngx_uint_t            spare1;
    ngx_uint_t            version;
    const char           *signature;
    void                   *ctx;
    ngx_command_t        *commands;
    ngx_uint_t            type;

    ngx_int_t           (*init_master)(ngx_log_t *log);
    ngx_int_t           (*init_module)(ngx_cycle_t *cycle);
    ngx_int_t           (*init_process)(ngx_cycle_t *cycle);
    ngx_int_t           (*init_thread)(ngx_cycle_t *cycle);
    void                (*exit_thread)(ngx_cycle_t *cycle);
    void                (*exit_process)(ngx_cycle_t *cycle);
    void                (*exit_master)(ngx_cycle_t *cycle);

    uintptr_t             spare_hook0;
    uintptr_t             spare_hook1;
    uintptr_t             spare_hook2;
    uintptr_t             spare_hook3;
    uintptr_t             spare_hook4;
    uintptr_t             spare_hook5;
    uintptr_t             spare_hook6;
    uintptr_t             spare_hook7;
};

This is a very important structure in the NGINX source code. It contains the basic information about a module: module name, module type, module directives, module order, etc. Note that there are 7 hook functions, such as init_master, init_module and init_process, which enable each module to embed its own logic in the stages of Master process start and exit, module initialization, Worker process start and exit, etc., which greatly improves the flexibility of module implementation.

As mentioned earlier, NGINX classifies all modules, and each module has its own features and implements its own unique methods. How can all modules be associated with a unique structure called ngx_module_t? The attentive reader may have noticed that ngx_module_t has a CTX member of type void, which defines the public interface of the module. It is the link between ngx_module_t and the various modules. What is a “public interface”? In a nutshell, each module class has its own family of protocol specifications, abstracts through a void CTX variable, and modules of the same type only need to follow this set of specifications. Here’s an example of the core and HTTP modules: For the core module, CTX points to a structure named ngx_core_module_t. This structure is very simple. Except for one name member, there are only two methods: create_conf and init_conf. If one day NGINX will create a new core module, it will be based on the common interface ngx_core_module_t.

typedef struct {
    ngx_str_t          name;
    void               *(*create_conf)(ngx_cycle_t *cycle);
    char               *(*init_conf)(ngx_cycle_t *cycle, void *conf);
} ngx_core_module_t;

For HTTP modules, CTX refers to a structure named ngx_http_module_t, which defines eight common methods that HTTP modules call before and after parsing configuration files, as well as when creating and merging HTTP, server, and location segments. As shown in the following code:

typedef struct {
    ngx_int_t   (*preconfiguration)(ngx_conf_t *cf);
    ngx_int_t   (*postconfiguration)(ngx_conf_t *cf);

    void       *(*create_main_conf)(ngx_conf_t *cf);
    char       *(*init_main_conf)(ngx_conf_t *cf, void *conf);

    void       *(*create_srv_conf)(ngx_conf_t *cf);
    char       *(*merge_srv_conf)(ngx_conf_t *cf, void *prev, void *conf);

    void       *(*create_loc_conf)(ngx_conf_t *cf);
    char       *(*merge_loc_conf)(ngx_conf_t *cf, void *prev, void *conf);
} ngx_http_module_t;

At startup, NGINX can call all the methods specified by CTX in the HTTP module in turn, based on the current execution context. More importantly, for a developer, it is only necessary to implement the desired logic according to the interface specification in ngx_http_module_t, which not only reduces the development cost, but also increases the extensibility and maintainability of the NGINX module. From the global point of view, Nginx module interface design gives consideration to the idea of unification and differentiation, and realizes the polymorphism of modules in the most simple and practical way.

The module division of labor

Now that Nginx has classified modules, each module implements a specific function. So how do these modules fit together effectively? What does each module need to do to prepare for Nginx startup? How do the modules cooperate with each other in the process of processing the request? This chapter gives us a general idea, which will be explained in detail in the following chapters.

In fact, the Nginx main framework is only concerned with the implementation of six core modules, each of which “endorses” a type of module. For example, the HTTP module is managed uniformly by ngx_http_module. When to create the structure of each HTTP module to store configuration items and when to perform the initialization operation of each module is completely controlled by the core module ngx_http_module. Just like the management team of a large company, each senior manager is in charge of a large department in which each employee is focused on fulfilling his or her own mission. The top leaders focus only on the department managers, and the department managers only have to manage their subordinates. This idea of layering makes Nginx source code also has the characteristics of high cohesion and low coupling.

Nginx startup needs to complete the configuration file parsing, this part of the work is completely based on the Nginx configuration module and the parsing engine to complete, for each configuration instruction, in addition to the need to accurately read and identify, more important is the storage and parsing. First, NGINX will find the module that is interested in the directive and call the module’s predefined handler. In most cases, this will save the parameters to the module’s configuration item structure and initialize them. The core module in the boot process will not only create to hold the “family” all storage configuration structure of container, but also in order to organize the structure, so many module configuration information by their family “eldest brother” management, Nginx can also according to the serial number from these global container quick access to a module configuration items. In addition, the most important work that needs to be done in the startup process of the event module is to select an event-driven model according to the user configuration and operating system. In the Linux system, Nginx will default to choose the Evoll model. When the Worker process is forked out and enters the initialization stage, The event module creates the respective epoll objects and adds the port listening FD to the epoll through the epoll_ctl system call.

In order to make the processing process more flexible and the coupling degree of each module is lower, NGINX intentionally divides the processing process of HTTP request into 11 stages, each stage theoretically allows multiple modules to execute the corresponding logic. After the configuration file is parsed at startup, each HTTP module mounts its own handler function as a hook into a stage. The event module of NGINX will execute the handler processing methods of each stage in turn according to various event scheduling HTTP modules, and decide whether to continue to execute down or end the current request through the return value. This pipelink-style request processing process completely decouples HTTP modules, which brings great convenience to the design of NGINX module. After completing the module’s core processing logic, the developer only needs to consider which stage to register the handler function in.

Since Nginx has been open source, a large number of excellent third-party modules have emerged in the community, which greatly extend the core functions of native Nginx. All of these benefit from the excellent modular design idea of Nginx.

Nginx event driven

Nginx fully asynchronous event-driven framework is an important cornerstone of its high performance. Event driven not Nginx pioneering, the concept of early in the computer field, it refers to the last things management to make decisions in the process of a kind of strategy, namely appear to follow the current point in time, available resources and related tasks, to solve problems constantly, prevent the transaction to pile up. In general, the core of event-driven architecture consists of three parts: event collector, event generator, and event handler. As the name implies, the event collector is responsible for collecting all the events. As a Web server, Nginx mainly handles the events from the network and disk, including TCP connection establishment and disconnection, receiving and sending network packets, disk file I/O operations, and so on. Each type corresponds to one read event and one write event. The event distributor is responsible for distributing the collected events to the target object. Nginx realizes the management and distribution of read and write events through the event module. The event handler, as the consumer, is responsible for receiving and processing all kinds of events that are distributed. Generally, every module in NGINX may become the event consumer. After the module processes the business logic, it immediately returns the control to the event module for the scheduling and distribution of the next event. Since the main body of the consuming event is the HTTP module, the event processing function is completed in a process. As long as the HTTP module does not let the process enter the hibernation state, the whole request processing process is very fast, which is the key for NGINX to maintain ultra-high network throughput. Of course, this kind of design increases the programming difficulty, and the developer needs to solve the blocking problem through some means (such as asynchronous callbacks). Different operating systems provide different event-driven models. For example, Linux 2.6 supports the Epoll, Poll, and Select models, FreeBSD supports the kqueue model, and Solaris 10 supports the EventPort model. In order to ensure its cross-platform characteristics, Nginx’s event framework perfectly supports event-driven models of various operating systems. For each model, Nginx designed an event module. Includes ngx_epoll_module, ngx_poll_module, ngx_select_module, ngx_kqueue_module, etc. The event framework will select one of the modules as an event-driven module for the NGINX process when the modules are initialized. For most production LIUNX system Web servers, NGINX will default to the most powerful event management Evoll model, which we will cover in detail in Chapter 7.

conclusion

More content to master can purchase “Nginx low-level design and source analysis” to learn.



If you are interested in the underlying source code of NGINX, you can purchase the paper version of this book to read. Or join a team of authors to learn and encourage together. The author’s team is “Network School Team” and “Basic Service Middle Station Team” respectively.

In the “network school team” can hand in hand with the bosses to learn the bottom oh, in the “basic services in the middle of the team” can also learn with teachers NGINX bottom and K8S bottom.