Abstract: In traditional big data clusters, user data is stored in HDFS in plain text. Cluster maintenance personnel or malicious attackers can bypass HDFS permission control mechanism at the OS level or directly access user data by stealing disks.

This document is shared with huawei cloud community FusionInsight MRS Transparent Encryption Solution. Author: A walnut.

An overview of the

In traditional big data clusters, user data is stored in the HDFS in plain text. Cluster maintenance personnel or malicious attackers can bypass the HDFS permission control mechanism at the OS level or directly access user data by stealing disks.

FusionInsight MRS introduces and enhances the Hadoop KMS service to transparently encrypt data and ensure user data security by connecting to third-party KMS.

  • HDFS supports transparent encryption. Upper-layer components such as Hive and HBase that save data in HDFS are encrypted and protected by HDFS. Encryption keys are obtained from third-party KMS using HadoopKMS.

  • For Kafka, Redis and other components that store service data directly to local disks persistently, the luks-based partition encryption mechanism protects user data security.

HDFS Transparent encryption

  • HDFS transparent encryption Supports AES, SM4, CTR, and NOPADDING encryption algorithms. Hive and HBase use HDFS transparent encryption to encrypt data. SM4 encryption algorithm is provided by A-Lab based on OpenSSL.

  • The encryption keys are obtained from the KMS service in the cluster. The KMS service supports interconnection with third-party KMS based on Hadoop KMS REST apis.

  • One KMS service is deployed on FusionInsight Manager. The KMS service uses public key authentication to access the third-party KMS. Each KMS service has a CLK corresponding to the third-party KMS.

  • Multiple EZKs can be applied for under the CLK. The EZKS are used to encrypt data encryption keys corresponding to the encryption area on the HDFS. The EZKs are stored in the third-party KMS persistently.

  • The DEK is generated by the third-party KMS, encrypted by EZK, stored persistently in NameNode, and decrypted by EZK.

  • CLK and EZK keys can be rotated. As the root key of each cluster, CLK is not sensed on the cluster side, and the rotation is controlled and managed by the third-party KMS. EZK can be managed by FI KMS, and rotation can be controlled and managed by FI KMS. In addition, a third-party KMS administrator can manage keys in the KMS and rotate EZK.

LUKS partition encryption

FusionInsight cluster supports luks-based partition encryption to protect sensitive information for components that store service data to local disks persistently, such as Kafka and Redis.

The script tool for FusionInsight installation uses the Linux Unified Key Setup (LUKS) partition encryption scheme. During partition encryption, each node in the cluster generates an access Key or obtains an access Key from the third-party KMS to encrypt data keys. To protect data key security. After disk partitions are encrypted, the system automatically obtains the key and mounts or creates a new encrypted partition after the operating system restarts or the disk is replaced.

Click to follow, the first time to learn about Huawei cloud fresh technology ~