[TOC]

In my last job, I was mainly responsible for the construction of big data platform, during which I accumulated some notes on the construction and use of Hadoop ecological components. Due to time constraints, I did not intend to modify the miswords and typesetting in the notes, so I directly released the original notes.

precondition

There are three servers in my cluster, and the corresponding hosts are master,slave1 and slave2 respectively. The Hadoop service installation segment is

| host machine | component is | | : — — — — — — — – | — — — — — — — — : | | master | The namenode, datanode, journalnode, resourcemanager, nodemanager, jobhistoryserver | | slave1 | The namenode, datanode, journalnode, resourcemanager, nodemanager | | slave2 | datanode, journalnode, nodemanager |

Kerberos related

First of all, we want to install the kerberos, kerberos installation build reference link https://www.cnblogs.com/nices…

Create Kerberos accounts for Hadoop components

After entering Kerberos’ admin.local, execute the following commands in turn

// The component Web service's printcial addprint-randkey HTTP/[email protected] addprint-randkey HTTP/ host addprint-randkey Http/ [email protected] // nameNode's printcial addprint-randkey nn/ naddprint-randkey nn/ // Node's printcial addprint-randkey dn/[email protected] addprint-randkey dn/ addprint-randkey DN /[email protected] // journalNode printcial addprint-randkey jn/ print addprint-randkey jn/ print AddPrinc-randkey jn/[email protected] // printcial addPrinc-randkey rm/ addPrinc-randkey Rm /[email protected] // nodeManager principal addPrinc-randkey nm/ naddPrinc-randkey nm/ and Addprint-randkey nm/[email protected] //job hisotry server printcial addprint-randkey JHS/call

Make these accounts keytab

Also in admin.local, make the above account authentication information keytab

ktadd -k /opt/keytab_store/http.service.keytab HTTP/[email protected]
ktadd -k /opt/keytab_store/http.service.keytab HTTP/[email protected]
ktadd -k /opt/keytab_store/http.service.keytab HTTP/[email protected]

ktadd -k /opt/keytab_store/nn.service.keytab nn/[email protected]
ktadd -k /opt/keytab_store/nn.service.keytab nn/[email protected]

ktadd -k /opt/keytab_store/dn.service.keytab dn/[email protected]    
ktadd -k /opt/keytab_store/dn.service.keytab dn/[email protected]
ktadd -k /opt/keytab_store/dn.service.keytab dn/[email protected]

ktadd -k /opt/keytab_store/jn.service.keytab jn/[email protected]    
ktadd -k /opt/keytab_store/jn.service.keytab jn/[email protected]
ktadd -k /opt/keytab_store/jn.service.keytab jn/[email protected]

ktadd -k /opt/keytab_store/rm.service.keytab rm/[email protected]
ktadd -k /opt/keytab_store/rm.service.keytab rm/[email protected]

ktadd -k /opt/keytab_store/nm.service.keytab nm/[email protected]    
ktadd -k /opt/keytab_store/nm.service.keytab nm/[email protected]
ktadd -k /opt/keytab_store/nm.service.keytab nm/[email protected]

ktadd -k /opt/keytab_store/jhs.service.keytab jhs/[email protected]

Multiple accounts can be done in a keytab, the above command to do multiple files, different component roles separately into a keytab file. In fact, on an internal network, all Hadoop related components could be grouped into one large keytab file, reducing configuration complexity.

Distribute the above keytab file to all the machines in the cluster

core-site.xml

The key configuration

        <property>
                <name>hadoop.security.authentication</name>
                <value>kerberos</value>
        </property>
        <property>
                <name>hadoop.security.authorization</name>
                <value>true</value>
        </property>
        <property>
                <name>hadoop.security.auth_to_local</name>
                <value>
                        RULE:[2:$1/$2@$0]([ndj]n/.*@TEST.COM)s/.*/hdfs/
                        RULE:[2:$1/$2@$0]([rn]m/.*@TEST.COM)s/.*/yarn/
                        RULE:[2:$1/$2@$0](jhs/.*@TEST.COM)s/.*/mapred/
                        DEFAULT
                </value>
        </property>

The above configuration means to charge Kerberos for security authentication and authorization across the cluster. Hadoop. Security. Auth_To_Local configuring which services are accessed when the components visit each other and how to extract the actual users from the visiting principals. The general rule takes the first behavior example, which means that the principal of NameNode, DataNode and JournalNode is mapped to the user of HDFS and the final default is the default rule when the above rules do not match. The default rule simply extracts the information before the first slash from the principal as the user. For example, test/[email protected] will be identified as the user for test

HDFS

<property> <name>dfs.block.access.token.enable</name> <value>true</value> </property> <property> <name>dfs.namenode.kerberos.principal</name> <value>nn/[email protected]</value> </property> <property> <name>dfs.namenode.keytab.file</name> <value>/opt/keytab_store/nn.service.keytab</value> </property> <property> <name>dfs.namenode.kerberos.internal.spnego.principal</name> <value>${dfs.web.authentication.kerberos.principal}</value>  </property> <property> <name>dfs.journalnode.kerberos.principal</name> <value>jn/[email protected]</value> </property> <property> <name>dfs.journalnode.keytab.file</name> <value>/opt/keytab_store/jn.service.keytab</value> </property> <property> <name>dfs.journalnode.kerberos.internal.spnego.principal</name> <value>${dfs.web.authentication.kerberos.principal}</value> </property> <property> <name>dfs.datanode.kerberos.principal</name> <value>dn/[email protected]</value> </property> <property> <name>dfs.datanode.keytab.file</name> <value>/opt/keytab_store/dn.service.keytab</value> </property> <property> <name>dfs.web.authentication.kerberos.principal</name> <value>HTTP/[email protected]</value> </property> <property> <name>dfs.web.authentication.kerberos.keytab</name> <value>/opt/keytab_store/http.service.keytab</value> </property> <property> <name>dfs.http.policy</name> <value>HTTPS_ONLY</value> </property> <property> <name>dfs.data.transfer.protection</name> <value>authentication</value> </property>

The general configuration is to configure what principals each component uses. The _HOST is syntactic sugar, and Hadoop replaces the configuration based on the native hostname to achieve the purpose of having the same configuration file on different machines

Security configuration of the DataNode

The DataNode uses HTTP instead of RPC. Therefore, the DataNode cannot be authenticated using Kerberos. To solve this problem, there are two ways to configure the DataNode to secure its data transfer

  • JSVC
  • TLS/SSL

The basic principle of the JSVC approach is to use the JSVC tools to enable the DataNode to start using privileged ports. The privileged ports are those under 1024. This security configuration assumes that the attacker cannot obtain root access and therefore cannot operate on the DataNode. Prior to Hadoop 2.6.0, this was the only way to do it, and the configuration was more complex, so I won’t go into it here. After Hadoop 2.6.0, SASL mode was introduced to realize the secure transmission of data through TLS/SSL. This mode is described below

Certificate generation and installation

TLS/SSL related principles see the document, here paste address

First, make sure you have OpenSSL installed on your machine. The following is the detailed configuration. The idea is to make a private CA and then sign all other certificates with this private CA certificate. By installing the private CA certificate into the trust zone of each machine, TLS/SSL communication between machines is implemented

And then I pick any machine in the cluster, and I write the CA certificate, which is on the Master machine

 openssl req -new -x509 -keyout ca_private.key -out ca_cert -days 9999 -subj '/C=CN/ST=chengdu/L=chengdu/O=bigdata/OU=bigdata/CN=master'
 

Copy the above CA private key and additional certificate to each machine. We can do this on one machine if we use the exact same password to generate the certificate, and then distribute the relevant keystore and truststore to all machines.

// Generate your own private and public key on keytool -keystore keystore-alias localhost-validity 999-genkey-keyalg rsa-keysize 2048-dname "CN=slave2, OU=bigdata, O=bigdata, L=chengdu, ST=chengdu, C=CN" // Import the CA public key certificate into the localtruststore keytool-keystore truststore-alias caroot-import-file ca_cert // Export the CA public keyto keytool-keystore-alias CARoot import-file ca_cert // Export the CA public key certificate to keytool-certreq -alias localhost -keystore keystore-file local_cert // Openssl x509-req-ca hd_ca_cert-cakey ca_private-key-in local_cert-out local_cert_signed-days 9999: OpenSSL x509-req-ca hd_ca_cert-cakey ca_private-key-in local_cert-out local_cert_signed-days 9999 -cacreateserial // import your own Keystore keytool -keystore keystore-alias localhost -import-file local_cert_signed

The key configuration of hdfs-site.xml

Configuration DFS. HTTP. The value of the policy for HTTPS_ONLY configuration DFS.. The data transfer. The protection value for authentication, integrity, privacy, either. General internal cluster with authentication can be

  • Authentication, which only authenticates the signature
  • In addition to verifying signatures, Integrity also verifies whether data has been tampered with
  • Privacy, Data is transmitted in encryption in addition to authentication and integrity verification as described above

Ssl-client-xml and ssl-server. XML configurations

Hadoop generally has two template files, ssl-client.xml.example and ssl-server.xml.example, in the same directory as core-site. XML. We can directly remove the template and configure it as a configuration file. They are used to configure the current component’s own certificate kestore location when it is a server, and its own trust-certificate truststore location when it is a client

The configuration of ssl-client.xml is as follows

<configuration>                                                                                                                 
                                                                                                                                
<property>                                                                                                                      
  <name>ssl.client.truststore.location</name>                                                                                   
  <value>/opt/ssl_store/truststore</value>                                                                                      
  <description>Truststore to be used by clients like distcp. Must be                                                            
  specified.                                                                                                                    
  </description>                                                                                                                
</property>                                                                                                                     
                                                                                                                                
<property>                                                                                                                      
  <name>ssl.client.truststore.password</name>                                                                                   
  <value>123456</value>                                                                                                         
  <description>Optional. Default value is "".                                                                                   
  </description>                                                                                                                
</property>                                                                                                                     
                                                                                                                                
<property>                                                                                                                      
  <name>ssl.client.truststore.type</name>                                                                                       
  <value>jks</value>                                                                                                            
  <description>Optional. The keystore file format, default value is "jks".                                                      
  </description>                                                                                                                
</property>                                                                                                                     
                                                                                                                                
<property>                                                                                                                      
  <name>ssl.client.truststore.reload.interval</name>                                                                            
  <value>10000</value>                                                                                                          
  <description>Truststore reload check interval, in milliseconds.                                                               
  Default value is 10000 (10 seconds).                                                                                          
  </description>                                                                                                                
</property>                                                                                                                     
                                                                                                                                
<property>                                                                                                                      
  <name>ssl.client.keystore.location</name>                                                                                     
  <value>/opt/ssl_store/keystore</value>                                                                                        
  <description>Keystore to be used by clients like distcp. Must be                                                              
  specified.                                                                                                                    
  </description>                                                                                                                
</property>      
<property>
  <name>ssl.client.keystore.password</name>
  <value>123456</value>
  <description>Optional. Default value is "".
  </description>
</property>

<property>
  <name>ssl.client.keystore.keypassword</name>
  <value>123456</value>
  <description>Optional. Default value is "".
  </description>
</property>

<property>
  <name>ssl.client.keystore.type</name>
  <value>jks</value>
  <description>Optional. The keystore file format, default value is "jks".
  </description>
</property>

</configuration>                                                                                                               

ssl-server.xml

<property>
  <name>ssl.server.keystore.password</name>
  <value>123456</value>
  <description>Must be specified.
  </description>
</property>

<property>
  <name>ssl.server.keystore.keypassword</name>
  <value>123456</value>
  <description>Must be specified.
  </description>
</property>

<property>
  <name>ssl.server.keystore.type</name>
  <value>jks</value>
  <description>Optional. The keystore file format, default value is "jks".
  </description>
</property>

<property>
  <name>ssl.server.exclude.cipher.list</name>
  <value>TLS_ECDHE_RSA_WITH_RC4_128_SHA,SSL_DHE_RSA_EXPORT_WITH_DES40_CBC_SHA,
  SSL_RSA_WITH_DES_CBC_SHA,SSL_DHE_RSA_WITH_DES_CBC_SHA,
  SSL_RSA_EXPORT_WITH_RC4_40_MD5,SSL_RSA_EXPORT_WITH_DES40_CBC_SHA,
  SSL_RSA_WITH_RC4_128_MD5</value>
  <description>Optional. The weak security cipher suites that you want excluded
  from SSL communication.</description>
</property>

</configuration>                                              

The password 123456 configured above is the password we used when making the certificate

yarn

The overall configuration

<property> <name>yarn.resourcemanager.principal</name> <value>rm/[email protected]</value> </property> <property> <name>yarn.resourcemanager.keytab</name> <value>/opt/keytab_store/rm.service.keytab</value> </property> <property> <name>yarn.nodemanager.principal</name> <value>nm/[email protected]</value> </property> <property> <name>yarn.nodemanager.keytab</name> <value>/opt/keytab_store/nm.service.keytab</value> </property> <property> <! - security cluster must use the following LinuxContainerExecutor - > < name > yarn. The nodemanager. Container - executor. Class < / name > <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.group</name> <value>hadoop</value> </property> <property> < name > yarn. The nodemanager. Linux - container - executor. Path < / name > < value > / opt/hadoop - 3.1.3 / bin/container - executor value > < / </property>

container-executor

build LinuxContainerExecutor

The arn.nodeManager.linux-container-executor. path specifies the path to the container-executor corresponding to LinuxContainerExecutor. The Hadoop distribution is in the bin path, and it usually already has this file. This file execution requires a configuration, container-executor.cfg. The default load is $HADOOP_HOME/etc/hadoop/container – executor. CFG the configuration files of the path.

But since the path itself has Hadoop’s other configuration files, container-executor requires that all levels of the path container-executor.cfg have root access only. This causes all sorts of weird startup problems for our other components.

So we need to specify the location of the container-executor.cfg file. The problem is that the Container-Executor binary is built with the file path already written. If we need to respecify the configuration file path, we need to repackage Container-Executor. The build steps are

  • Start by downloading the source code of the same version of Hadoop
  • The path into the source packagehadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
  • Using the commandmvn package -DskipTests=true -Dcontainer-executor.conf.dir=/etc/hadoop/The build, container-executor.conf.dir parameter specifies the path to the new container-executor.cfg file
  • After the completion of the building, under the build path of the target/native/target/usr/local/bin path can be found a new build the container – executor, to copy it to $HADOOP_HOME/bin, replace the original program

    Configure the container – executor. CFG

    In /etc/hadoop/, create container-executor.cfg, which is configured as follows

yarn.nodemanager.linux-container-executor.group=hadoop
banned.users=hdfs,yarn,mapred,bin
min.user.id=1000
allowed.system.users=
feature.tc.enabled=false

Note that the configuration does not have Spaces on each line. The configuration value for yarn.nodeManager.linuk-container-executor. group is the same as in yarn-site.xml

Summarize the items that need to be configured for permission configuration

File permission change

Chown root: Hadoop /opt/hadoop-3.1.3/bin/container-executor chmod 6050 /opt/hadoop-3.1.3/bin/container-executor chmod 6050 /opt/hadoop-3.1.3/bin/container-executor chown root:hadoop /etc/hadoop/container-executor.cfg chmod 400 /etc/hadoop/container-executor.cfg

Hypothesis in yarn – site. XML in yarn. The nodemanager. Local – dirs configuration path for/home/var/data/hadoop/nodemanager/data Yarn. The nodemanager. Log – dirs configuration path for/home/var/data/hadoop/nodemanager/log, you need to do the following access configuration

chown yarn:hadoop /home/var/data/hadoop/nodemanager/data 
chown yarn:hadoop /home/var/data/hadoop/nodemanager/log  
chmod 755 /home/var/data/hadoop/nodemanager/data
chmod 755 /home/var/data/hadoop/nodemanager/log

mapreduce

 <property>
        <name>mapreduce.jobhistory.keytab</name>
        <value>/opt/keytab_store/jhs.service.keytab</value>
    </property>
 <property>
        <name>mapreduce.jobhistory.principal</name>
        <value>jhs/[email protected]</value>
    </property>

Start the

After the configuration, the original way can be started. Just because HDFS has SSL/TLS enabled, its original 9870 port is now 9871 and needs to be accessed via HTTPS. For example, our address is: https://master:9871

The resources

https://hadoop.apache.org/doc…

https://secfree.github.io/blo…

https://blog.csdn.net/picway/…

https://developer.aliyun.com/…

https://makeling.github.io/bi…

https://makeling.github.io/bi…

http://secfree.github.io/blog…

Welcome to my personal public account “North by Northwest Up”, to record the code life, industry thinking, technology review