At the end of the Tinywebserver project, the president proposed some possible interview questions, readers can try to answer after learning the project, to see if they are really familiar with the project: Including project introduction, thread pool related, concurrent model related, HTTP packet parsing related, timer related, log related, pressure measurement related, comprehensive ability, etc. I have given a brief answer to this question, and please feel free to comment on any mistakes.

The original address: zhuanlan.zhihu.com/p/364044293

Project introduction

Why do such a project?

— Lab projects tend to machine vision, I feel my knowledge of background development is a little weak, so I want to learn the relevant knowledge of server background development;

Tell me about your project?

– Linux C++ lightweight Web server, help beginners quickly practice network programming, build their own server.

Concurrency model using thread pool + non-blocking socket + epoll(ET and LT) + event processing (Reactor and simulated Proactor);

State machines are used to parse HTTP request packets, including GET and POST requests.

Access the server database to achieve web user registration, login functions, can request server pictures and video files;

Realize synchronous/asynchronous log system to record the running status of the server;

Webbench pressure test can realize tens of thousands of concurrent connection data exchange;

Thread pool correlation

Handwritten thread pool

The thread pool code must be familiar, see source code or comment version for details:

#ifndef THREADPOOL_H #define THREADPOOL_H #include <list> #include <cstdio> #include <exception> #include <pthread.h> #include ".. /lock/locker.h" #include ".. /CGImysql/sql_connection_pool.h" template <typename T> class threadpool { public: /*thread_number is the number of threads in the thread pool, Threadpool (int acTOR_model, connection_pool *connPool, int thread_number = 8, int max_request = 10000); ~threadpool(); bool append(T *request, int state); bool append_p(T *request); */ static void *worker(void *arg); */ static void *worker(void *arg); -----class specific void run(); private: int m_thread_number; // the number of threads in the pool int m_max_requests; Pthread_t *m_threads; STD ::list<T *> m_workQueue; // Request queuelocker m_queuelocker; // the mutex sem m_queuestat protects the request queue; Connection_pool *m_connPool; // database int m_ACTOR_model; // Model switch (Reactor/Proactor)}; Template <typename T> // threadpool constructor threadpool<T>::threadpool(int actor_model, connection_pool *connPool, int thread_number, int max_requests) : m_actor_model(actor_model),m_thread_number(thread_number), m_max_requests(max_requests), m_threads(NULL),m_connPool(connPool) { if (thread_number <= 0 || max_requests <= 0) throw std::exception(); m_threads = new pthread_t[m_thread_number]; Pthread_t is a long integer if (! m_threads) throw std::exception(); for (int i = 0; i < thread_number; ++ I) {// The third argument in the function prototype is a function pointer to the address of the processing thread function. // If the thread function is a member of the class, the this pointer will be passed as the default argument to the function, which will not match the thread function argument (void*) and will not be compiled. // Static member functions do not have this problem because there is no this pointer. if (pthread_create(m_threads + i, NULL, worker, this) ! = 0) { delete[] m_threads; throw std::exception(); If (pthread_detach(m_threads[I])) {delete[] m_threads; if (pthread_detach(m_threads[I]); throw std::exception(); } } } template <typename T> threadpool<T>::~threadpool() { delete[] m_threads; } template <typename T> bool threadPool <T>::append(T *request, int state) {m_queuelocker.lock(); if (m_workqueue.size() >= m_max_requests) { m_queuelocker.unlock(); return false; } // request->m_state = state; m_workqueue.push_back(request); m_queuelocker.unlock(); m_queuestat.post(); return true; } template <typename T> // Request in Proactor mode bool threadPool <T>::append_p(T *request) {m_queuelocker.lock(); if (m_workqueue.size() >= m_max_requests) { m_queuelocker.unlock(); return false; } m_workqueue.push_back(request); m_queuelocker.unlock(); m_queuestat.post(); return true; Template <typename T> void *threadpool<T>::worker(void *arg) {// Arg is this! Threadpool *pool = (threadPool *)arg; // Run () is called when each thread in the pool is created. return pool; Template <typename T> void threadPool <T>::run() {while (true) {m_queuestat.wait(); m_queuelocker.lock(); if (m_workqueue.empty()) { m_queuelocker.unlock(); continue; } // T *request = m_workqueue.front(); m_workqueue.pop_front(); m_queuelocker.unlock(); if (! request) continue; //Reactor if (1 == m_actor_model) { if (0 == request->m_state) { if (request->read_once()) { request->improv = 1; connectionRAII mysqlcon(&request->mysql, m_connPool); request->process(); } else { request->improv = 1; request->timer_flag = 1; } } else { if (request->write()) { request->improv = 1; } else { request->improv = 1; request->timer_flag = 1; } } } //default:Proactor else { connectionRAII mysqlcon(&request->mysql, m_connPool); request->process(); } } } #endifCopy the code

What are the synchronization mechanisms for threads?

— Semaphores, conditional variables, mutex, etc.

Is the worker thread in the thread pool always waiting?

— Yes, waiting for a new task to wake up;

What is the state of your thread pool worker thread after processing a task?

If the request queue is empty, the thread enters the thread pool and waits. If it is not empty, the thread competes with other threads for the task.

If 1000 clients are making access requests at the same time and there are not many threads, how can they respond to each one in a timely manner?

— This project is based on the concurrent mode of IO reuse. Note that there is not one thread for each client connection! If it is really so, Taobao double 12 server collapse early! When a customer connection has an event that needs to be processed, ePoll will notify the customer of the event and then add the corresponding task to the request queue, waiting for the worker thread to compete for execution. If it’s still slow, you can either increase the thread pool capacity or consider clustering.

If a customer request takes a long time to thread, does it affect subsequent customer requests? What’s a good strategy?

— Yes, because the number of threads is fixed. If a client request occupies thread resources for a long time, it will inevitably affect the overall response speed of the server. A strategy could be to set a time threshold for each thread to process a task, and when a customer takes too long to request, put it at the end of the task request, or disconnect.

Concurrency model correlation

A brief description of the concurrency model used by servers?

— The semi-synchronous and semi-reactor concurrent model selected for the project.

In the Proactor mode, the main thread acts as an asynchronous thread, listening for events on all sockets

If a new request comes in, the main thread receives it to get a new connection socket and registers the read/write event on that socket in the epoll kernel event table

If a read/write event occurs on the connected socket, the main thread receives data from the socket and inserts the data into the request queue as a request object

All worker threads sleep on the request queue and compete (such as mutex) to take over when a task arrives

What are the differences between reactor, Proactor, and reactor model?

Reactor model: The main thread (I/O processing unit) is only responsible for listening for events (readable and writable) on file descriptors. If so, it immediately notifies the worker thread to put socket readable and writable events into the request queue. The worker thread is used to read and write data, accept new connections, and process customer requests. (Need to distinguish between read and write events)

Proactor mode: The main thread and kernel handle I/O operations such as reading and writing data and accepting new connections, while the worker thread only handles business logic (giving corresponding return urls), such as handling customer requests.

The main idea is that the primary Reactor thread distributes Acceptor connections only, and the SUB Reactor distributes I/O events on connected sockets. The number of sub-reactor can be flexibly set according to the number of CPU cores. Here’s how it works:

The main reactor thread is always aware of the connection establishment event. If a connection is successfully established, the main reactor thread obtains the connected socket through the Accept method, and then selects a slave reactor thread according to a certain algorithm and adds the connected socket ** to the selected slave reactor thread. ** The main reactor thread’s only job is to call ACCEPT to get the connected socket and add the connected socket to the slave reactor thread.

So you use epoll, tell me why you use epoll, is there any other way to reuse it? What’s the difference?

Select /poll/epoll The reason Why epoll is used in this project is that the reference question (Why is epoll faster than select?)

  • For SELECT and Poll, all file descriptors are added to their set of file descriptors in user state, and the entire set needs to be copied to kernel state with each call. Epoll maintains the entire set of file descriptors in kernel state, requiring a system call each time a file descriptor is added. The overhead of system calls is high, and in cases where there are many short-term active connections, epoll may be slower than SELECT and Poll due to these high overhead of system calls.
  • Select uses a linear table to describe a set of file descriptors. File descriptors have an upper limit. Poll is described using linked lists; The epoll underlying layer is described by a red-black tree and maintains a ready list to which ready events are added from the event table. When called using epoll_wait, only data is observed in the list.
  • The greatest overhead of select and poll comes from the kernel’s determination of whether a file descriptor is ready: each time a SELECT or poll call is made, the kernel traverses the entire set of file descriptors to determine whether each file descriptor is active. Epoll does not need to be checked in this way. When activity occurs, the epoll callback is automatically triggered to notify epoll of file descriptors, and the kernel puts these ready file descriptors on the previously mentioned Ready list to be processed after epoll_wait calls.
  • Both SELECT and Poll work only in the relatively inefficient LT mode, while ePoll supports both LT and ET modes.
  • In summary, select and poll are recommended when the number of FD monitored is small and each FD is active. When a large number of FDS are monitored and only some FDS are active per unit of time, using epoll can significantly improve performance.

HTTP packet parsing is related

A state machine? Why a state machine?

Finite state machine is an abstract theoretical model, which can present the state change process described by a finite number of variables in a constructible and verifiable way. For example, closed digraphs. Finite-state machines can be implemented using if-else,switch-case, and function Pointers, primarily to encapsulate logic from a software engineering perspective. Finite state machine an efficient method of programming within a logical unit. In server programming, the server can process logic according to different states or message types, making the program logic clear and easy to understand.

State machine transitions. Okay?

– see below:

Why is HTTPS secure?

— The connection establishment stage is based on SSL security authentication; Data transmission stage encryption, further understanding can baidu;

HTTPS SSL connection process?

— One image is enough:

What’s the difference between GET and POST?

— Highly recommended! Other answers online didn’t answer to point: www.cnblogs.com/logsharing/…

Database login registration related

Log on and say?

The login process has been described in detail in previous articles:

Specifically, it involves loading database table, extracting user name and password, registering login process and page skipping.

  • Load the database table, combined with the code to load the data in the database to the server;

  • Extract the user name and password, parse the packet with the code, and extract the user name and password.

  • Registration and login process, combined with the code to describe the server registration and login verification process;

  • Page jump, combined with the code to explain the page jump mechanism

Do you have this saved? If you want to save, how do you do it?

State can be saved using session or cookie.

A cookie is a string of “123456789happy” assigned to a client by the server. Every time a client sends data, attach this string to an HTTP message, and the server knows who you are;

Session is a state stored on the server. When a client sends an HTTP packet, the server searches for it in the user data, similar to a check list.

If you load the user name and password to the local and then use map to match it, if you have a billion data, even after loading it to the local hash, it takes a lot of time. How do you optimize?

— The key to this question is how to authenticate user login in the case of large data volume? Loading all user information into the memory is time-consuming and profitable. The most efficient way to traverse big data is to use hash to establish multi-level indexes to speed up user authentication. Specific operations are as follows:

First, 1 billion user information is hashed using a hash algorithm roughly reduced by 1000 times. In this case, 1 million hash data is obtained, and each hash data represents a user information block (level 1).

Then, hash the 1 million hash data separately, for example, the remaining 1000 hash data (level 2).

In this mode, the server only needs to save 1000 second-level hash data. When the user requests login, the server first hashes the user information to find the corresponding information block (second-level), reads the corresponding first-level information block, and finally finds the corresponding user data.

Mysql, redis? The used?

— Yes, please see Redis.

Timer correlation

Why use a timer?

— Handle scheduled tasks or inactive connections to save system resources;

How does a timer work?

The server assigns a timer to each event. This project uses SIGALRM signal to realize the timer. First, every timed event is in an ascending linked list, and the SIGALRM signal is periodically triggered by alarm() function. Then the signal callback function notifies the main loop by pipe, and the main loop processes the timer on the ascending linked list after receiving the signal: If no data is exchanged within a certain period of time, the connection is closed.

What about the time complexity of adding and deleting? Can it be optimized?

Add is usually O(N), delete is O(1). Optimization from a two-way linked list is not practical, you can consider using the smallest heap, or the data structure of the hop table, see the hop table.

Minimum heap optimization? Talk about time complexity and how it works

— The minimum heap is sorted by the expiration time of each timer. The minimum timer is located on the top of the heap. When SIGALRM signal triggers the tick () function, the expiration timer is cleared.

Insert, O(logn);

Delete, O(logN);

Logs related

Tell me how your logging system works.

When initializing the server, initialize the log system in singleton mode, and determine whether the write mode is synchronous or asynchronous according to the configuration file.

Why asynchronous? What’s the difference between synchronization and synchronization?

– Synchronous log writing generates a large number of system calls. If a log message is too large, the log system is blocked, causing a system bottleneck. The asynchronous mode adopts the producer-consumer model and has high concurrency capability.

Now you want to monitor the status of a server and output monitoring logs. How do YOU distribute the logs to different machines?

— To facilitate troubleshooting, or server status analysis, to see whether maintenance is needed; Message queues can be used for message distribution, such as MQTT, RABITMQ, and so on.

Pressure test related

Has server concurrency been tested? How was it tested?

— Tested, using Webbench, at least meet more than ten thousand concurrency.

What is Webbench? Let me introduce the principle

– is a lightweight web site stress testing tool, can achieve up to 30,000 concurrent tests. How it works: The core principles of Webbench implementation are: Several child process, the parent process fork each child at the time of user requirements or default within actual access requests to the target web cycle, communicate through a pipeline, a parent and child process the child to write end through a pipeline to the parent process in several times after the request access records to the total information, read the child parent read through the pipe end from the relevant information, The child process ends when the time is up, and the parent process takes statistics and shows the final test results to the user after all the child processes exit, and then exits.

Did you have any problems during the test?

— — no…

Comprehensive ability

What problems does your project address that other projects of this kind do not?

– Build your own wheels;

Describe how the front-end sends a request and the server processes it. What protocols are involved?

— HTTP protocol, TCP protocol, IP protocol, computer network knowledge.