I. Introduction to CHDFS

CHDFS (Cloud HDFS) is a high-performance distributed file system of Tencent Cloud, which provides standard HDFS access protocol and hierarchical namespace. It mainly solves the storage and data analysis of massive data in big data scenarios, and provides a solution to achieve the separation of computing and storage.

CHDFS focuses on distributed metadata services, where data storage relies on object storage COS. As a cloud-based storage service, COS provides a solid data base for CHDFS. It not only supports mass data storage and super-large bandwidth, but also supports multi-AZ mode with default EC encoding, which reduces the cost. At the same time, it further reduces the storage cost through intelligent layering of cold and hot data.

Second, lifecycle management

Over time, data on CHDFS accumulates, but only a small part of it is recently used by users. Most of it is historical data, which is accessed gradually less frequently, such as log files, data backups, etc.

If the user does not manage these data, then the storage cost gradually increases, and the user’s own business development is not friendly, but the active management needs to invest manpower and time costs, time-consuming and arduous, so the CHDFS alignment COS data layering ability, the launch of life cycle function, to help users more convenient management of hot and cold data.

By configuring CHDFS life cycle rules, users can regularly reduce data files from standard storage to archive storage or delete them directly. The whole settlement and deletion process is automatically completed by CHDFS life cycle function to ensure timely and accurate, and does not incur any additional costs. At the same time, it supports heat recovery operation. Used to re-initiate access to a data file that has been reduced to archival storage type.

Lifecycle rules

Lifecycle rules, known as lifecycle policies, require the user to specify the following parameters:

  • PATH: The target Path of the lifecycle rule.

    • Directory: The rule applies to all files in a directory, including files in recursive subdirectories.
    • Specify files: The rule only applies to specific files.
  • Type: Lifecycle rule Type.

    • Settlement: Reduce the file storage type from standard storage to archival storage on a regular basis to save costs.
    • Delete: Periodically delete files directly.
  • DAYS: Specifies how many Days after the last time a file was accessed by a life cycle rule to trigger an action.

Note: Unlike the COS object, CHDFS satisfies the semantics of the file system. It not only supports the Mtime of the last modification of the file and the Ctime of the last modification of the metadata, but also supports the layerization condition of the last access time of the file, Atime. This strategy can better meet the needs of users.

Four, heat recovery task

The purpose of reheat is to restart access to the sedimentation files. It will copy a standard file copy for users to read, and it will be automatically deleted when the copy expires. During this period, the files stored in the archive will always exist.

  • FilePath: Path to the reheat file.
  • Type: regenerative Type. According to the length of regenerative time, regenerative heat is divided into three categories.

    • Speed mode: the reheat task can be completed in 1-5 minutes.
    • Standard mode: The regenerative task is completed within 3-5 hours.
    • Batch mode: Reheat task is completed within 5-12 hours.
  • DAYS: The number of Days a copy of a file in standard storage is retained after reheat is completed.

Note: CHDFS needs to rely on the standard interface provided by COS for data sedimentation, deletion and heat recovery, so its life cycle is similar to COS in terms of usage.

Five, the use of

Lifecycle rules can be configured by the user through the console and the cloud API, and creating a heat back task supports the cloud API only.

1. Console

Enter the CHDFS console, select the specific file system, enter the life cycle configuration page, add rules, and complete the life cycle configuration, as shown in the figure below:

Note: specifying both settlement and deletion rules means that the target file is first settled and then deleted, and the deletion time must be longer than the settlement time.

2, cloud API

Example of creating a life cycle rule through the cloud API:

https://chdfs.tencentcloudapi.com/?Action=CreateLifeCycleRules &FileSystemId=f4mhaqkciq0 &LifeCycleRules.0.LifeCycleRuleName=test &LifeCycleRules.0.Path=/test &LifeCycleRules.0.Transitions.0.Days=90 &LifeCycleRules.0.Transitions.0.Type=1 &LifeCycleRules.0.Transitions.1.Days=180 &LifeCycleRules.0.Transitions.1.Type=2 & lifecyClerules.0.Status=1 &< public request parameter >

Create a regenerative task example:

https://chdfs.tencentcloudapi.com/?Action=CreateRestoreTasks &FileSystemId=f4mhaqkciq0 &RestoreTasks.0.FilePath=/test/file0 &RestoreTasks.0.Type=1 &RestoreTasks.0.Days=7 &RestoreTasks.1.FilePath=/test/file1 &restoreTasks.1.Type=2 &restoreTasks.1. DAYS =7 &< public request parameter >

Note: Support batch creation of heat back task, heat back task needs to specify the specific file path.

Six, billing,

Currently, CHDFS only charges for standard storage and bandwidth; archive storage and recovery requests are not currently charged.

Seven, conclusion

CHDFS combines the advantage of infinite capacity of object storage COS, deeply plows the file system metadata management, and the scale can be expanded to tens of billions. At the same time, with user-defined life cycle strategy, CHDFS maximizes efforts to help users reduce the storage cost of CHDFS and meet users’ use needs.

About us

Cloud + community “Tencent cloud storage team” home page, covering the Tencent cloud storage team’s latest news, team information, product matrix, technical documents, video tutorials, etc., welcome to pay attention to or leave a message, give your valuable suggestions.