preface

Data export, which can be said to be a ubiquitous demand, most of the management platform, report system will have this demand.

For this demand, many systems will do restrictions, can only export thousands or tens of thousands of data from the system, if more, will be submitted to the application, after layer upon layer of approval, to the DB team processing.

In fact, not to go to the application, largely depends on the company’s rules and regulations, most should not be particularly perfect, are done in the system, with authority can export all data.

To be honest, Huang has never understood why some people always want to export hundreds of thousands, millions of data to see, filter, filter…

However, there is a need, eventually must meet, such as the following hundreds of MB CSV file, is very often seen.

Q&A

When exporting large files, bandwidth and memory problems are common.

bandwidth

If the file for download is placed on our server, then the download will take up our outgoing bandwidth.

This is easy to fill up in the case of low bandwidth.

To solve the bandwidth problem, the best solution is not to occupy the bandwidth of the service system, which needs to introduce third-party cloud storage, such as ALI Cloud OSS, Tencent Cloud COS. The download link provided in this way is the link of the cloud storage, which is isolated from the business system.

memory

When generating files, it’s easy to take up a lot of server memory by putting data into memory all at once without considering memory.

There is no limit to how much memory a site can occupy, easily affecting other sites on the server.

This limits the amount of memory a site can occupy, which causes a site restart and affects normal access.

For memory problems, avoid putting all the data in memory at one time, can be processed in batches.

Let’s look at a specific data export scheme.

The specific plan

There will be five roles involved in this solution: user, backend system, middleware, export system and cloud storage.

The general picture is as follows:

Here Huang breaks it down roughly into 10 steps.

1. Submit an export application

When users want to export some content, they need to submit applications in the background system.

2. Generate export batches

After receiving the application submitted by the user, the background system generates a batch number for the application, and records the exported content and the query conditions.

Content this is the name of the method that can be used to store the query, and the query condition can be used to store the JSON string of method parameters.

This allows reflection to be done at the point of exporting the data.

Of course, there are also basic information such as time, person and status.

Put this information in the database, and that’s OK.

3. Send export batches to the middleware

This step involves the selection of middleware and MQ or Redis is generally recommended.

The simplest content to send is a batch number, of course, it is OK to send the other information of the batch together.

4. The application is submitted successfully

When the batch information is successfully sent to the middleware, it can be considered that the system has received the application, and at this time the user can be reminded that the application is successful.

5. Read the exported batch information

There are a lot of design points in the export system. You are advised to deploy the export system on an independent server to avoid cascading impact on application servers.

The export system listens for batch information in the middleware, and when it receives batch information, it starts working. There are two types of activity:

One is that if the export system is the central hub and only responsible for scheduling, its work is to assign specific worker nodes to perform subsequent operations, such as creating a K8S task.

The other is that the export system is a worker node, which is responsible for executing the subsequent content.

If the export task is not very frequent, the export system === worker node is ok, and do not over design.

6. Query, generate, or encrypt files

This step is the actual export operation.

With batch information you can know what users want to export, according to this to perform the query operation, and then the query results generated in the corresponding format of the file.

Because all files are uploaded to the cloud storage, for security reasons, you need to add a password, so that everyone can not open after downloading.

Files may not be generated as a single file, but may be shred by day and month, so it is best to put files in a compressed package.

In this way, the cloud storage is compressed with passwords.

7. Upload the file to the cloud storage system

Once the files are generated, they need to be uploaded to the cloud storage. You are advised to upload the IP address through the Intranet if conditions permit. Otherwise, the bandwidth of the NAT gateway or server is easily filled up.

8. Assemble the download address

After the upload is successful, the download address of the file should be spliced according to the parameters.

After getting the download address, you also need to carry out the validity period processing, that is, the download address will contain its expiration time, what time can not be accessed again. The cloud storage system provides corresponding methods, so this step is relatively easy.

Generally speaking, a file will be kept for about 3 ~ 7 days, or even a month. Do not rule out permanent preservation of tuhao companies.

There are several main considerations for setting an expiration date

In services, it is inevitable that the same content will be exported within a short period of time. You can reuse this content within the validity period to avoid repeated generation.

In terms of resources, although the price of cloud storage is not particularly expensive, it is still necessary to save money and clean some unnecessary historical files regularly

9. Backfill information

This step is to update the download address, completion time, batch status and other information back to the batch information

10. Query and export applications and download them

At this point, the user will receive an internal message telling the user that the download is ready. When the user clicks download, the user will jump to the cloud storage address and wait for the download to complete.

Write in the last

Data export, although said to be a relatively insignificant function, but want to achieve a better experience, is also to spend a little thought to do.

The scheme introduced here is actually in use, but only the relatively coarse content is listed, and some details will not be expanded, such as the same user continuously export the same content and so on.