• preface

Ali Cloud Object Storage (OSS) has many users. Many users need to back up data in OSS, whether online or offline, due to service or compliance requirements. Users can use OSS open apis to back up data based on service requirements or use existing OSS services to back up data, such as OSS cross-domain replication. However, the former method has problems of ease of use and backup efficiency. In the latter way, the data is stored in two or more copies, which cannot effectively avoid the risk that the copy of the original data will also be faulty. The OSS data backup solution based on cloud storage gateway and hybrid cloud backup introduced in this paper can not only ensure the multi-version backup of OSS data according to the policy, but also have simple configuration, good performance, high efficiency, and low cost.

  • OSS backup based on cloud storage gateway and hybrid cloud backup

The following is an architecture diagram of OSS backup solution based on hybrid cloud products. First, the files in the OSS bucket are reversely synchronized to the gateway cache through a cloud storage gateway. Mount the cloud storage gateway directory on one ECS and install the hybrid cloud backup Agent. Finally, configure hybrid cloud backup policies and tasks to back up files of multiple versions in the storage gateway to the cloud Dr Database.



OSS backup architecture diagram based on hybrid cloud products

– Implementation and configuration

First, users need to log in to ali Cloud console to open cloud storage gateway service. Note the following: For backup performance and efficiency, the region of the cloud storage gateway and the hybrid cloud backup should be the same as that of the backup source OSS.

In this section, we create a performance-based cloud storage gateway to ensure bandwidth and performance of OSS data synchronization to the cloud storage gateway and subsequent data backup from the cloud storage gateway to the Dr Database. Of course, users can choose the cloud storage gateway based on the total size and growth of data in OSS, the number of files, and the size of individual files.






Create a cloud storage gateway

After the creation, access the gateway management configuration page and perform three simple configuration steps: cache configuration, cloud resource configuration, and mount directory configuration.



Cloud storage gateway management page

The cache setting is the process by which the cache configuration of a gateway takes effect. Click Create and select an available cache disk to complete the creation.



Cache Settings

Cloud resource setting is the process of connecting the OSS bucket to the cloud storage gateway. The user only needs to select the name of the bucket to synchronize. The cloud storage gateway supports SSL encryption for connecting to OSS to ensure data transmission security.



Cloud Resource Settings

Directory setting is the process of providing mounting directories to clients. The cloud storage gateway supports two common NAS protocols, NFS and CIFS. Here we configure the NFS mount directory. One configuration item that needs to be noted is reverse synchronization. The files in the OSS bucket need to be synchronized to the cloud storage gateway. Therefore, select Yes for reverse synchronization. In addition, ‘mode’ remains the default ‘cache mode’. In this way, the cloud storage gateway retains only the metadata of hot data and cold data. In synchronous mode, the cache disk size is the same as the total file size of OSS on the cloud. Note that there is a ‘direction synchronization interval’ in the advanced Settings, which is used to tell the gateway how often it goes to OSS to pull the latest files and synchronize them to the gateway. Considering the overall solution, this option does not need to be set very short, because the hybrid cloud backup is defined periodically by hour or day. Therefore, it is more reasonable to set the cloud storage gateway to reverse synchronize the OSS data by hour.



NFS Directory Configuration

At this point, the configuration on the cloud storage gateway is complete. Next, you need to apply for an ECS to mount the NFS directory provided by the cloud storage gateway. Apply for an ECS with high Intranet bandwidth to match the bandwidth of a high-performance gateway. After that, you can start hybrid cloud backup. After commissioning and selecting a region, you only need to perform three simple steps: create a backup library, download and install backup Agents, and configure backup policies and tasks.

Creating a backup repository is very simple: name the backup repository, determine the backup type, and download the client and certificate.



Creating a Backup Warehouse



Selecting a Backup Type



Adding a Client



Download the client and certificate

Upload the downloaded client software to the Linux ECS that you have applied for, decompress it, and install it.





Upload and install backup agents

After the installation is complete, open http://< Cloud host public IP address >:8011 in the browser. The backup agent registration page is displayed. Users are required to enter previously downloaded certificates (keys to register and connect to the backup source and repository), AK authentication information, and client login passwords. The entire backup link is in the Aliyun network. Therefore, set the network type to Private network (VPC).



Backup client registration page

After successful registration, the user can see the client backup page. This page is the entry point for users to create backup and restore data. Before backing up, remember to mount the NFS directory of the cloud storage gateway on this ECS. On the backup client page, you can create a scheduled backup (instant backup is performed only once, and scheduled backup is performed periodically based on a policy). Create a backup policy before creating a planned backup. Define a backup policy that starts at 5:30 p.m. every day. Backup data will be stored in the backup library for 2 years.



Creating a Backup Policy

The next step is to create a backup. Enter the mount directory of the ECS into the source address, select Scheduled Backup and the backup policy you created earlier, and submit it.



Creating a Planned Backup

After committing, you see the backup job. By looking at the details, you know the status of the last backup and the time of the next backup.



Planned Backup Details

When the time is up, the backup is automatically triggered. You can see the total number of files and data being scanned by the backup job, and the backup speed is displayed in real time.



Performing a planned backup task

Here are some statistics from the backup process. It is important to note that the percentage of progress in the state can wobble and sometimes be lower than the previous progress. This is because the backup progress is calculated based on the proportion of the backup data to the total backup files. The CLOUD storage gateway is constantly reversely synchronizing the OSS files. As a result, the performance of the backup files may alternate with that of the scanned files during file scanning. As a result, the user’s progress does not increase all the time. Sometimes the progress becomes 100%, but in fact, the cloud storage gateway immediately synchronizes many new files to the backup service, and the progress becomes more than 90%.



Plan backup jobs after scanning



Number of files and storage space statistics on OSS

During the backup process, you can view the cache and network throughput performance on the monitoring page of the cloud storage gateway. As you can see from the figure below, the gateway is consistently reading files from OSS from 5pm until they are read by the backup ECS.



Cloud storage gateway performance diagram

After the backup is complete, you can see that the total number of files backed up is the same as the number of files seen on the OSS console above. In addition, users are most concerned about backup performance, 44.88MB/s. The average file size is about 6.8MB, which is lower than the average file size. This backup performance is great considering that many directories have very deep structures. In addition, because the cloud storage gateway reversely synchronizes the write disk and the backup service reads the disk, the 200 GB cache disk is also close to the bottleneck. If a large amount of backup data is required and a single file is larger, you can increase the capacity of the cache disk to improve backup performance. In this scenario, hundreds of MB/s backup performance can be achieved.



The backup to complete

You can also go to the console’s hybrid cloud backup page to see the backup warehouse information on the console. You can see that a backup was made and it was successful. The key pieces of information are the raw data and the actual percentages on the right. It can be seen that the actual usage of data after hybrid cloud backup is 1.31TB, smaller than the size of original data, because the hybrid cloud backup deduplicates and compresses the backup data when writing it to the backup library.



Backup Library Information

  • conclusion

The OSS backup solution combined with the cloud storage gateway and hybrid cloud backup service is a cloud native OSS backup solution that can be completed on the Ali Cloud console. This solution not only meets the requirements of OSS users for backup performance and multiple versions, but also is very cost-effective. The cloud storage gateway is currently in open beta and is free of charge. The hybrid cloud backup service charges based on the number of backup clients and the capacity of the uncompressed backup library. You can purchase either on-demand or resource packages, which is flexible and user-friendly.

In addition, all users are concerned about backup data consistency and data security. When the CLOUD storage gateway reversely synchronizes OSS data to the cache disk on the entire backup link, data consistency verification is performed. Consistency verification is also performed when the backup service reads data from the cache disk and then writes data to the backup library. In addition, the entire backup link is encrypted. The three-copy technology of the backup library also ensures that user data is safe when written to the backup library.

Finally, when users need to restore data, the hybrid cloud backup service can restore data to a specified directory by file in fine granularity. Combined with the cross-VPC access feature of the cloud storage gateway, the hybrid cloud backup service provides flexibility in data recovery and distribution.