A preface

So far, I have published 3 articles on web download in blog garden, which implement HTTP multi-threaded remote breakpoint download utility based on socket. The author intends to develop a distributed file management utility program on this basis, so far, has realized the server/client upload, download part of the function logic. The knowledge points involved include thread pool technology, Linux epoll concurrency technology, upload, download and so on. The logic of the JDFS download function is similar to my previous articles on JWebFileTrans(JDownload). If you are not familiar with socket network downloading or are only interested in downloading features, please refer to my other three blogs. This article will not cover downloading features in detail, but will focus on thread pooling, epoll, and uploading. The addresses of the three blogs are:

  • JWebFileTrans: a small program that can download files from the Internet
  • JWebFileTrans(JDownload) – JWebFileTrans(JDownload) is a small program that can download files from the Internet
  • JWebFileTrans(JDownload): a small program can download files from the network (three), multithreaded breakpoint download link address please click me

Please click on me for github address of JDFS

 

PS: This blog is the original work of blog park user “CS primary school”, please indicate the original author and original link, thank you.

By Administration, the next section will first show the upload and download capabilities of JDFS.

JDFS upload and download function display

My tests were conducted on two virtual Ubuntu systems, using Ubuntu Linux installed with vmware Player. Before doing the experiment, we need to make sure that the network between two virtual Ubuntu is connected, which can be detected by using the ping command. The screenshots of the author’s detection are as follows:

As shown in the figure above, after logging in to the Ubuntu system, run the ifconfig command in the shell to check the IP address of the host. The blue line in the figure indicates the IP address of the host and the IP address of the host.

As shown in the figure above, two virtual Ubuntu servers ping each other and the result shows that the network is working, so this is the basis of this experiment.

As shown in the figure above, the directory of the server and client before the experiment. The directory on the left is the server. There is an English version of Advanced Programming in Unix Environment in the directory, and the client will request to download this e-book. On the right is the directory of the client, including an English version of Introduction to Algorithms. During the experiment, the client will request to upload this e-book to the server.

As shown in the figure above, it is a screenshot of the process of the client requesting the server to download apue-en. PDF. On the left is the server. You can see that the client directory originally only had an introduction to Algorithms book, now there is a book called advanced Programming in Unix environments.

The screenshot above shows a client request to upload an introduction to algorithms ebook to the server. The right side shows the information printed in the uploading process of the client, which can be seen as fragmented uploading, while the left side shows the information printed by shell in the receiving process of the server.

The above picture is a screenshot of the introduction to the algorithm uploaded by the client and opened in the Ubuntu system on the server. It can be seen that the uploading process is correct and the e-book can be opened and viewed normally.

Three Basic Ideas

This article is based on the three blog posts in the introduction, and the logic of the JDFS client-side download function is almost identical to that of JWebFileTrans(JDownload), which will not be covered in detail in the following section. The focus is on upload functionality, epoll, and thread pools. So if you haven’t read a blog about JWebFileTrans(JDownload) before, please go to the introduction section and read it first, otherwise you may encounter some obstacles in the process of reading this article.

The core part of this article is uploading and downloading. The logic needs to be implemented on the server and the client respectively. The client and the server follow the same protocol, or in other words, the server needs to agree some rules with the client, then the client sends the request according to the rules, and the server responds to the request according to the rules. For example, we define a data structure that contains the request type, file start location, file name, etc. After the server receives the data from the client, it first parses the data header according to the data structure defined in advance. If the request type is upload, it completes the data at the beginning of the description of the receiving header and writes it to the disk file. If the request type is download, the server looks for the file in the directory based on the file name, and sends the file contents of the corresponding section to the client if found. The sending and receiving functions are implemented by the send() recv() function.

So the server needs to always listen on the specific [IP, port], how to implement the logic of the server? Most people think of a while loop to keep listening, and quite a few blogs on the web describe it this way. The disadvantage of this is that the server CPU usage is high, and may even approach 100%. This means that CPU is wasted most of the time, even if there is no client connection.

There is an efficient solution to server-side concurrency in Linux: Epoll: After using epoll, when there is no client connection request, the top command can be used to see that the server program does not even display, because it is suspended by the kernel. Once there is a client download request, the author found in the experiment that the CPU usage is about 5%. You can see that the CPU is well used and not wasted. If there is a request, it will be served. If there is no request, it will be suspended by the kernel. There is no waste of system resources. Epoll can be divided into horizontal trigger and edge trigger. Edge trigger requires the server to use asynchronous I/O, and the logic is slightly more complicated. Horizontal trigger can use asynchronous or non-asynchronous. For the concept of horizontal triggering and edge triggering, please check out the relevant technology blogs on the web. This article will focus on the business logic of JDFS instead.

If epoll is used, how to handle each request from the client? Is it handled by the server? If so, if there are 100 requests from the client at the same time, the server will satisfy the service requests one by one, which will become serial processing, and the efficiency is not high. Epoll, after all, is built for large-scale server-side concurrency. So, using pthread? A thread is created to service each incoming request, thus achieving parallel processing on the server side. But the question is, suppose you have a million requests, do you want to create a million threads? Regardless of whether there is an upper limit to the number of threads that can be created, the creation and destruction of so many threads will destroy a lot of system resources. So what else is there to do? Yes, thread pools.

A thread pool can be visualized as a pool with a number of pre-created threads and a queue for storing client request data. The thread is constantly fetching tasks from the queue and executing them.

In conclusion, this article server-side model epoll is used to monitor the client’s request, once has the request, does not perform, but to add this request to the job queue of the thread pool, then the service side continue to monitor the client’s request, and completely don’t care about how the thread pool to meet the client request, will not be blocked thread pool. This makes it obvious that the server’s primary logic is listening and the thread pool is executing.

Four JDFS components are introduced in detail

1. The thread pool

When we mentioned thread pools, we used two key words: thread pools and job queues. So let’s first define the data structure. I clearly remember a famous saying in the textbook when I was an undergraduate: Data structure + algorithm = program, so we can see the importance of data structure. Job queue is a list, each stored in a linked list node is the client’s request, let’s call it a Job here, only the server knows what you’re going to do this Job, the service side had received the client’s request, will create a Job node insert Job queue of the thread pool, and record in the nodes of things to do, So you pass in a function pointer. A thread in a thread pool calls a function pointer for each Job node. The callback function, the argument to the callback function, and the pointer to the next Job node are recorded:

 

typedef struct job
{
    void * (*call_back_func)(void *arg);
    void *arg;
    int job_kind;
    struct job *next;
}job;Copy the code

 

The corresponding job queue data structure is defined as follows:

 typedef struct task_queue
{
    int is_queue_alive;
    int max_num_of_jobs_of_this_task_queue;
    int current_num_of_jobs_of_this_task_queue;
    job *task_queue_head;
    job *task_queue_tail;

    pthread_cond_t task_queue_empty;
    pthread_cond_t task_queue_not_empty;
    pthread_cond_t task_queue_not_full;

}task_queue;Copy the code

There are three condition variables, task_queue_EMPTY: the function that destroys the thread pool waits on this variable and can only destroy the pool if the job queue is empty. task_queue_not_empty: Task_queue_not_full: When a server receives a request from a client, it wraps the request into a Job queue in the thread pool, but only when the queue is full. If full, the server will block on this variable until it is woken up when the queue is full.

The thread pool data structure is defined as follows:

typedef struct threadpool
{
    int thread_num;
    pthread_t *thread_id_array;
    int is_threadpool_alive;

    pthread_mutex_t mutex_of_threadpool;
    task_queue tq;

}threadpool;Copy the code

The muTEX variable is used to protect the entire thread pool. Either fetching data from the thread pool or adding data to the thread pool requires a lock on the thread pool’s MUTEX, and then the corresponding operation. In addition, the job queue has a variable to indicate whether the job queue is currently alive: is_queue_alive, and the threadpool has a variable to indicate whether the threadpool is still alive: is_threadpool_alive. Here’s what they do:

  • Add jobs to the job queue in the thread pool: You can add jobs only if the thread pool is alive
  • Fetching jobs from the thread pool: You can only fetch jobs from the job queue if it is still alive.
  • Destroy thread pool: Destroy thread pool only if neither thread pool nor job queue is alive.

 

Note: When the thread pool is set to Not alive, there may still be jobs in the job queue. Therefore, you need to set the job queue to Not Alive until all the jobs in the job queue are executed. Then the thread pool can be destroyed.

The pthread_cond_wait() and pthread_cond_broadcast() functions are heavily used in thread pools. One thing to note when using these two functions is that they should be used like this:

pthread_mutex_lock();

while(condition is not ok){ ........... pthread_cond_wait(); . }Copy the code

Yes, it must be a while loop, why? I have searched the Internet for a lot of answers to this question, but feel that most of them are not very clear. But there is at least one reason to support our use of while. We assume that condition is a thread waiting for the queue to fill so that it can add a job to the queue. If the pthread_cond_wait() function is used instead of the while loop to determine if the queue is not satisfied, the pthread_cond_wait function will unlock mutex and then go to sleep. Unlock mutex, other threads will also successfully lock and wait. Suppose there are N such threads waiting for the condition variable. Suppose that a thread fetching a job from the job queue has fetched a job, causing a vacancy in the job queue. In this case, the thread calls pthread_cond_broadcast() to wake up all threads waiting for the condition variable. Pthread_cond_wait () will be locked again, and only one thread will be locked. A job will be added to the job queue, and the job queue will be full again. This thread will release the MUEX, and one of the other n-1 threads will have another chance to acquire the lock. And return from pthread_cond_wait(). With an if statement, the thread thinks the queue is full to add jobs (it is once again full), whereas with a while loop, it once again determines that the queue is full and once again calls pthread_cond_wait() to sleep.

So that’s the thread pool background. A thread pool defines a set of function interfaces as follows:

int threadpool_create(threadpool *pool, int num_of_thread, int max_num_of_jobs_of_one_task_queue);
int threadpool_add_jobs_to_taskqueue(threadpool *pool, void * (*call_back_func)(void *arg), void *arg, int job_kind);
int destory_threadpool(threadpool *pool);  

void *thread_func(void *arg);
int threadpool_fetch_jobs_from_taskqueue(threadpool *pool, job **job_fetched);Copy the code

The first three are external interfaces for callers to create a thread pool, add jobs to the job queue of the thread pool, and destroy the thread pool when the task is complete. The last two functions are internal interfaces for use only within the thread pool. When the thread pool is created, N threads are started, and each thread executes the thread_func() function. Inside this function, The thread continually calls threadpool_fetch_jobs_from_taskQueue () to fetch the job and execute it.

This is the thread pool logic. For details, please refer to my Github code.

2 Download Function

2.1 Client Download Function

In JDFS, the client-side download function is almost the same as the logic in my previous articles on JWebFileTrans, with only a few differences, mainly in the protocol. In A following in the Http protocol standard download procedures, program through the GET command sends A request to the server and the server will be based on Http protocol parsing the client’s request, according to the results of analysis to perform, such as analytical result is that the client request file A n to the first m bytes, sent [n, m] section of A file’s contents to the client. In JDFS, both client and server are designed by the author. In order to simplify programming, we do not use HTTP protocol, but define a set of rules by ourselves and execute according to this rule. In short, this rule is expressed as a data structure:

 

typedef struct http_request_buffer
{
    int request_kind;
    long num1;
    long num2;
    char file_name[100];
}http_request_buffer;Copy the code

 

Request_kind indicates the request type. 0 indicates the file length. After receiving this request, the server writes the file size to num1 and calls the send function to return it to the client. Num1 indicates that the client uploads the file to the server. Num2 indicates that the file uploads the num1 to num2 bytes of file_name. In this case, the server invokes recv to receive the data from the client and write the data to the local disk. 2 indicates download. That is, the client requests to download the num1 to num2 bytes of the file file_name.

The above is the difference between the client download logic and JWebFileTrans, only the rules to follow have changed, but the business logic of the download function is basically the same, you can refer to the blogs listed in the preface, which will not be described here.

2.2 Server Download Function

The following is the logical part of the server’s download function, which is a callback function that the server adds to the job queue when it receives the request. Threads in the thread pool call this callback function to fulfill the client’s download request. The server uses epoll, which will be covered in a later section.

 void *Http_server_callback_download(void *arg){

    callback_arg_download *cb_arg_download=(callback_arg_download *)arg;
    char *file_name=cb_arg_download->file_name;
    int  client_socket_fd=cb_arg_download->socket_fd;
    long range_begin=cb_arg_download->range_begin;
    long range_end=cb_arg_download->range_end;
    unsigned char *server_buffer=cb_arg_download->server_buffer;
    FILE *fp=fopen(file_name, "r");
    if(fp==NULL){
        perror("Http_server_callback_download,fopen\n");
        close(client_socket_fd);
        return (void *)1;
    }


    http_request_buffer *hrb=(http_request_buffer *)server_buffer;
    hrb->num1=range_begin;
    hrb->num2=range_end;

    fseek(fp, range_begin, SEEK_SET);
    long http_request_buffer_len=sizeof(http_request_buffer);
    memcpy(server_buffer+http_request_buffer_len,"JDFS",4);

    fread(server_buffer+http_request_buffer_len+4, range_end-range_begin+1, 1, fp);
    int send_num=0;
    int ret=send(client_socket_fd,server_buffer+send_num,http_request_buffer_len+4+range_end-range_begin+1-send_num,0); 
    if(ret==-1){
        perror("Http_server_body,send");
        close(client_socket_fd);
    }   

    if(fclose(fp)!=0){
        perror("Http_server_callback_download,fclose");
    }                
}Copy the code

The above is the callback function of the server download function, because the author of the function name and variable name strictly according to the function to name, so the code is very intuitive, easy to understand. Note the flCOSE (FP) at the end. This is important because the callback function is called many times, and if you do not close the file descriptor at the end, you will get an error message that too many files are open.

3 Upload Function

3.1 Client Upload Function

The following is part of the logic for the client download function

for(int i=0; inum1=offset; hrb->num2=offset+upload_one_piece_size-1; int ret=fread(upload_buffer+sizeof(http_request_buffer)+4, upload_one_piece_size, 1, fp); if(ret! =1){ printf("JDFS_http_upload,fread failed\n"); exit(0); } while(1){ int ret=send(socket_fd, upload_buffer, upload_buffer_len+4, 0); if(ret==(upload_buffer_len+4)){ break; }else{ int ret=close(socket_fd); if(ret==0){ Http_connect_to_server(server_ip, server_port, &socket_fd); continue; }else{ perror("JDFS_http_upload, close"); exit(0); }}}}Copy the code

3.2 Server Upload Function

The following is the code for the server upload function. The difference between this code and the download function is that it is not the callback function, but the server main program itself. Why not the thread pool callback function? This is due to a flaw in the design of the server, for reasons discussed later.

else if(hrb->request_kind==1){ printf("accept %s from client, range(byte): %ld---%ld\n",hrb->file_name,hrb->num1,hrb->num2); FILE *fp=NULL; char *file_name=hrb->file_name; if(hrb->num1==0){ fp=fopen(file_name, "w+"); }else{ fp=fopen(file_name,"r+"); } if(fp==NULL){ perror("Http_server_body,fopen"); close(client_socket_fd); }else{ long offset=hrb->num1; fseek(fp, offset, SEEK_SET); int ret=fwrite(server_buffer+sizeof(http_request_buffer)+4, hrb->num2-hrb->num1+1, 1, fp); if(ret! =1){ close(client_socket_fd); } } fclose(fp); }else if(hrb->request_kind==2){Copy the code

3.3 Defects of the upload function

 

The current upload function of JDFS is flawed, mainly due to the server side. When I debug the upload function, I always fail. In the process of decreasing the amount of data sent by the client calling send() function, when the amount of data is small enough, the upload succeeds. The reasons are as follows: When the client sends a large amount of data, for example, when the client sends a sentence of “header, the computer major of Huazhong University of Science and Technology is one of the top ten in China”, the server calls RECV () to receive the data. Due to network reasons, it is not guaranteed that the data can be received at one time. The data received by the server may be something like: “Header Computer of Huazhong University of Science and Technology”, “The major of huazhong University of Science and Technology is the top ten in China”. This will cause error, the server at this time of each call recv () function must determine whether the segment data really after receive, if you do not, it is possible to make the “professional is the national top ten” this piece of data as the client another complete the send () calls to send data, so the first few bytes as the header to resolve, In fact, this text is only part of the data from the client’s last send().

In addition, the server design prevents uploads from being processed in parallel using thread pools, as described in the next section.

4 Server framework based on Epoll

As mentioned earlier, the most intuitive way to design a server is to use a while loop to continuously check if a client is connected. This leads to inefficiency, and Linux provides epoll to support large server concurrency. What is epoll? There are a large number of detailed articles on the Internet. To facilitate readers’ understanding, we can briefly understand: Epoll means that only when there is a data request, the server will be activated to process it. Other times, the server will be in a sleep state and does not occupy system resources. Of course, epoll is much more than described by the author, and what is described here is only one of the points. The author also learned about epoll when it was used.

Epoll has three main function interfaces: epoll_create() creates an epoll and returns the corresponding epoll descriptor, and epoll_ctl() adds events of interest, such as data input on the socket listening descriptor FD. Epoll_wait () : Once interested in the file descriptor (such as socket listening descriptor) on inductive interested in things (data reading and writing), it returns the number of these things, and may, according to the “number” to traverse epoll_wait () the second parameter, to obtain the specific request (for example, a client connection request, the client data). For the specific use of these three functions, please refer to the relevant materials by yourself.

The epoll framework on the JDFS server is abbreviated as follows:

 

int epoll_fd=epoll_create(20); if(epoll_fd==-1){ perror("Http_server_body,epoll_create"); exit(0); } int ret=epoll_ctl(epoll_fd,EPOLL_CTL_ADD,*server_listen_fd,&event_for_epoll_ctl); if(ret==-1){ perror("Http_server_body,epoll_ctl"); exit(0); } int num_of_events_to_happen=0; while(1){ num_of_events_to_happen=epoll_wait(epoll_fd,event_for_epoll_wait,event_for_epoll_wait_num,-1); if(num_of_events_to_happen==-1){ perror("Http_server_body,epoll_wait"); exit(0); } for(int i=0; irequest_kind==0){ }else if(hrb->request_kind==1){ }else if(hrb->request_kind==2){ }else{ } } } }Copy the code

 

As in the code above, after registering the thing of interest with epoll, a while loop is followed to check for new client requests. Epoll_wait returns the number of current requests, followed by a for loop that iterates through each request. The outermost if-else statement in the for loop determines whether there is a new client request, calls Accept if so, and then adds the socket FD connected by the client to the epoll and sets it to be interested in the socket FD’s data input. In the else branch, if there is data to be read in, the recv() function is called to receive it, and the read headers are parsed according to HRB -> request_KIND. If HRB ->request_kind=0,2, then add the request to the thread pool. If HRB ->request_kind=0,2, then add the request to the thread pool. If HRB ->request_kind= 1, then the client has uploaded data. The recv() function here has received the client’s upload data and cannot add the request to the thread pool for parallel processing. Because the server main program already does recV operations, it is the server framework design mentioned above that causes the upload data cannot be processed in parallel.

In the next blog post, we will make improvements to the JDFS, both to make the upload part parallel and to modify the server upload logic so that the server upload function receives data correctly regardless of the size of a client send().

 

Five conclusion

This is the end of this article, to sum up, this article based on socket implementation of a upload, download function of the utility. The JDFS will continue to be refined with the hope of eventually developing a distributed file management utility.

Contact: https://github.com/junhuster/