System environment

  • Operating system: CentOS 6 or CentOS 7
  • JDK version: 1.8.0_151
  • Ambari version: 2.6.1
  • HDP version: 2.6.4.0

Extended link

  • Kerberos Principle — Classical dialogue
  • Kerberos installation configuration based on AMbari
  • Windows local installation configures the Kerberos client
  • Kerberos application
  • Disable Kerberos based on Ambari
  • This section describes how to use Java connection to operate Hive in Kerberos
  • If you still don’t understand the Kerberos principle, I lose!

Overview of Kerberos

Strong authentication and establishing user identity are the foundation for secure access to Hadoop. Users need to be able to reliably “identify” themselves and then propagate that identity across the Hadoop cluster. When this is done, these users can access resources (such as files or directories) or interact with the cluster (such as running MapReduce jobs). In addition to users, Hadoop cluster resources themselves (such as hosts and services) need to authenticate with each other to prevent potentially malicious systems or daemons from “impersonating” trusted cluster components to gain access to data.

Hadoop uses Kerberos as the basis for strong authentication and identity propagation for users and services. Kerberos is a computer network authentication protocol that allows one entity to communicate over an insecure network and prove its identity to another entity in a secure way. Kerberos is a third-party authentication mechanism in which users and services rely on a third party (Kerberos server) to authenticate each other. The Kerberos server itself is called the key distribution Center or KDC. At a higher level, it has three parts:

  • It knows the database of users and services (called principals) and their respective Kerberos passwords.
  • An Authentication Server (AS) : Verifies the identity of a Client (making sure that you are the person on your ID card). Once authenticated, a Ticket Granting Ticket (TGT) will be granted to the Client.
  • A Ticket Granting Server (TGS) : A Ticket sent to a Client by an AS is used to obtain a Server Ticket (ST) accessing the Server. A Service Ticket (ST) is also called a TGS Ticket.

Take the usual train ride for example:

A user requests authentication primarily from AS. AS returns a TGT encrypted with the Kerberos password of the user principal, known only to the user principal and the AS. The user principal decrypts the TGT locally using its Kerberos password, and from then on, until the ticket expires, the user principal can use the TGT to obtain the service ticket from the TGS. The service ticket allows the client to access the service.

Kerberos is simply a third-party protocol used for secure authentication. It adopts the traditional mode of shared key to realize the communication between the client and server in an insecure network environment. It is applicable to the client/server model and developed and implemented by MIT.

Kerberos services are a single sign-on system, which means that you only need to self-authenticate to the service once per session to automatically secure all subsequent transactions during that session.

Because the cluster resource (host or service) cannot provide a password each time a TGT is decrypted, they use a special file called keytab that contains the authentication credentials of the resource principal.

The set of hosts, users, and services controlled by the Kerberos server is called a realm.

2. Kerberos authentication process

Kerberos authentication is divided into two phases: initial authentication, which allows subsequent authentication, and all subsequent authentication itself.

1. Initial verification: Tickets are granted

The following figure shows how to do the initial validation:

  • A client starts a Kerberos session by requesting a ticket-granting Ticket (TGT) from the Key Distribution Center (KDC). This request is usually done automatically at login.

    To obtain additional tickets for a particular service, a TGT is required. Tickets are similar to passports. Like a passport, a TGT identifies you and allows you to obtain multiple “visas,” not for a foreign country but for a remote computer or network service. Like passports and visas, tickets grant tickets and various other tickets have a limited life cycle. The difference is that the Kerberos-based command notifies you that you have a passport and obtains a visa for you. You do not have to perform the transaction yourself.

    Another situation similar to a ticket grant is a three-day ski pass that can be used at four different ski resorts. As long as your ticket is not expired, you can present your ticket and get a lift ticket at any ski resort you decide to visit. After obtaining a lift ticket, you can ski at the resort at will. If you go to another ski resort the next day, you will need to show your admission ticket again and get another lift ticket for the new resort. The difference is that the Kerberos-based command notifies you that you have a ski weekend ticket and obtains your lift ticket for you. Therefore, you do not have to perform the transaction yourself.

  • KDC creates a TGT and sends it back to the client in encrypted form. The client uses its password to decrypt the TGT.

  • With a valid TGT, clients can request tickets for all types of network operations, such as Rlogin or Telnet, as long as that TGT is not expired. This ticket is normally valid for one day. Each time a client performs a unique network operation, it requests a ticket for that operation from KDC.

2. Perform subsequent Kerberos authentication

After the initial validation is received by the client, each subsequent validation follows the pattern shown below.

  • The client requests a ticket from KDC for a particular service (for example, remotely logging on to another computer) by sending its TGT to KDC as proof of its identity.

  • KDC sends the ticket for that particular service to the client.

    For example, suppose that user Joe wants to access an NFS file system that has been authenticated by the required KRB5 share. Since the user is already authenticated (that is, the user already has a ticket grant ticket), the NFS client system automatically and transparently gets the NFS service ticket from KDC when it tries to access the file.

    For example, suppose that user Joe uses Rlogin on the server Boston. Because the user is already authenticated (that is, the user already has a ticket grant), the user automatically and transparently obtains the ticket when the rlogin command is run. The user can use this ticket to remotely log in to Boston at any time until the ticket expires. If Joe wants to log in remotely to computer Denver, he needs to follow Step 1 to get another ticket.

  • The client sends the ticket to the server.

    When using the NFS service, NFS customers automatically and transparently send tickets for the NFS service to the NFS server.

  • The server allows this client access.

From these steps, it appears that the server is not communicating with the KDC. But the server actually communicates with and registers itself with the KDC, just as the first client did. This part has been omitted for the sake of simplicity.

For more information on Kerberos, please refer to the following links to help you understand Kerberos:

  • www.zhihu.com/question/22…
  • www.anquanke.com/post/id/171…

Basic Concepts of Kerberos

1. Key Distribution Center, or KDC

A trusted source authenticating in a Kerberos-enabled environment.

2. Kerberos KDC Server

A computer or server that acts as a key distribution center (KDC).

3. Kerberos Client

Any machine in the cluster that authenticates against the KDC.

4. KDC Admin Account

Ambari is the administrative account used to create principals and generate key tables in KDC.

5. Principal

The Kerberos Principal (also known as the principal) is used to mark a unique identity in the Kerberos encryption system. The principal can be a user (such as Joe) or a service (such as Namenode or Hive).

By convention, the principal name is divided into three parts: the primary name, the instance, and the realm. For example, a typical Kerberos principal could be Joe /[email protected]. In this example:

  • Joe is the primary name. The primary name can be the user name shown here or a service such as Namenode.

  • Admin is the instance. For user principals, instances are optional; For service principals, however, instances are required. For example, if user Joe sometimes acts as a system administrator, he can use Joe /admin to distinguish himself from his normal user identity. Similarly, if Joe has accounts on two different hosts, he can use two principal names with different instances, such as Joe /node1.example.com and Joe /node2.example.com. Note that Kerberos services treat Joe and Joe /admin as two completely different principals.

    For the service principal, the instance is the fully qualified hostname. For example, node1.example.com is an example of this.

  • EXAMPLE.COM is the Kerberos realm. The domain is described in the next section.

Each service and subservice in Hadoop must have its own principal. A principal name in a given domain consists of a master name and an instance name, which in this case is the FQDN of the host running the service. Because the service does not log in with a password to obtain its ticket, its principal’s authentication credentials are stored in a Keytab key table file that is extracted from the Kerberos database and stored locally in a secure directory with the principal on the service component host. For example, after kerberos is enabled on node1.example.com, the NameNode component automatically generates the nn.service.keytab file and stores it in the /etc/security/keytabs directory. The user owner is HDFS: Hadoop. The permission is 400, as shown in the figure:

The Principals of Ambari and Hadoop Service are stored in Kerberos KDC as shown below:

Principal and Keytab naming conventions

practice The sample
Principals [email protected] nn/[email protected]
Keytabs $service_component_abbreviation.service.keytab /etc/security/keytabs/nn.service.keytab

Notice the main name of each service principal in the previous example. These primary names, such as NN or hive, represent NameNode or Hive services, respectively. Each primary name is appended with the instance name, the FQDN of the host running it. This convention provides a unique principal name for services running on multiple hosts, such as DataNodes and NodeManager. Add host names to distinguish, for example, requests from DataNode A from requests from DataNode B. This is important for several reasons:

  • The compromised Kerberos credentials of one DataNode will not automatically result in the compromised Kerberos credentials of all Datanodes.
  • If multiple Datanodes have exactly the same principal and are connected to NameNode at the same time, and the Kerberos authenticator being sent happens to have the same timestamp, authentication will be rejected as a replay request.

Ambari Principals

In addition to Hadoop service principals, Ambari itself requires a set of Ambari Principals to perform service “smoke” checks, perform alert health checks, and retrieve metrics from cluster components. Ambari Principals’ Keytab files reside on each cluster host just like service Principals’ Keytab files.

Ambari Principals describe
Smoke and Headless Service users Ambari is used to perform service “smoke” checks and run alert health checks.
Ambari Server user When clustering is enabled for Kerberos, component REST endpoints (such as YARN ATS components) require SPNEGO authentication. Ambari Server needs access to these apis and requires Kerberos principals to authenticate against them through SPNEGO.

6. realms name

Kerberos networks, which contain KDCS and many clients, are similar to domains and colloquially known as domains.

7. keytab

Keytab is a file containing the principals and encrypted principal keys.

The keytab file is unique to each host because the key contains the hostname. Keytab files are used to implement kerberos authentication of a principal on a host without human interaction and to save plain text passwords.

Because the keytab file can be accessed on the server through Kerberos authentication as principal, the keytab file should be kept safe and accessible only to a small number of users.

8. I have a ticket.

Ticket is a packet of information used to securely transfer a user’s identity to a server or service. A ticket is valid only for one client and one particular service on a particular server. The ticket contains the following information:

  • The principal name of the service
  • The principal name of the user
  • IP address of the user host
  • Time markers
  • Defines the value of the ticket life cycle
  • A copy of the session key

All such data is encrypted using the server’s service key. After a ticket has been issued, the reusable ticket remains until it expires.

9. Do you have a credential?

Is an information packet that contains a ticket and a matching session key. Credentials are encrypted using the key of the requesting principal. Typically, KDC generates credentials in response to a ticket request from a client.

10. Authenticator

Is information that the server uses to validate the client user principal. The validator contains the user’s principal name, time marker, and other data. Unlike tickets, verifiers can only be used once, usually when requesting access to a service. The verifier uses the session key shared by the client and server for encryption. Typically, the client creates a verifier and sends it along with a ticket to the server or service for validation to the server or service.

4. Life cycle of ticket

A life cycle value specified by kinit with the -L option whenever a subject obtains a Ticket, including a ticket-granting Ticket (TGT), provided that the Ticket is obtained using Kinit. By default, kinit uses the maximum life cycle value. The maximum life cycle value (max_life) specified in the Kdc.conf file.

Updatable life cycle values that can be specified with kinit’s -r option, provided tickets are obtained or updated using Kinit. Max_renewable_life specifies the maximum renewable life cycle value in the KDC.conf file.

5. Kerberos principal name

Each ticket is identified with a subject name. Principal names can identify users or services. Here are some examples of principal names:

The subject name instructions
[email protected] User principals
username/[email protected] Admin principal, which can be used to manage the KDC database.
K/[email protected] Primary key name body. A master key name body is available with each

Primary KDC association.
krbtgt/[email protected] Generating ticket the subject used to grant tickets.
kadmin/[email protected] The body of the main KDC server that allows access to KDC using KadMIND.
[email protected] Ambari is used to perform service “smoke” checks and run alert health checks.
HTTP/[email protected] Principal used when accessing the Hadoop Web UI

Six, notes

1. Synchronize clocks

All hosts participating in the Kerberos authentication system must synchronize their internal clocks within the specified maximum time (called clock phase difference). For this requirement, another Kerberos security check is required. If the time difference between any two participating hosts exceeds the clock phase difference, the client request will be rejected.

The maximum default value for clock phase difference is 300 seconds (5 minutes). For safety reasons, do not increase the clock phase difference beyond 300 seconds.

Clock synchronization setting method: Click me

Advantages and disadvantages of Kerberos

1. The advantages

A high Performance

Although we have repeatedly stated that Kerberos is an authentication process involving three parties: Client, Server, and KDC. However, once a Client obtains a Ticket that has been used to access a Server, the Server can authenticate the Client based on the Ticket without the KDC’s participation. Performance improvements over traditional Windows NT 4.0 based NTLM that rely entirely on Trusted Third Party.

Realize Mutual Authentication

Traditional NTLM authentication is based on the premise that remote services accessed by clients are trusted and do not require authentication. Therefore, NTLM does not provide bidirectional authentication. This is obviously a bit idealistic, and Kerberos makes up for it by requiring authentication of the Server’s identity before the Client can access the Server’s resources.

Support for Delegation

Impersonation and Delegation are two important features in a distributed environment. Impersonation allows Server to perform certain operations locally using Logon’s Account, and Delegation needs Server to carry Logon’s Account into another Context to perform the Impersonation operations. NTLM only provides support for Impersonation, while Kerberos implements support for Delegation through a two-way, Transitive (Mutual, Transitive) trust mode.

Interoperability

Kerberos was originally invented at MIT and has become a widely accepted standard. So there can be extensive interoperability between different platforms.

Lack of 2.

  • Kerberos authentication uses symmetric encryption and uses the same key for encryption and decryption. Therefore, it is difficult to ensure the security of key exchange.

  • The service session key shared by the Kerberos server and the user is the user’s password. The server does not need to verify the authenticity of the user when responding, but assumes that only the legitimate user has the password. If the attacker intercepts the response message, it is easy to form a password attack.

  • AS (Authentication Service) and TGS in Kerberos are centrally managed, which can easily lead to bottlenecks. System performance and security depend heavily on the performance and security of AS and TGS. There should be access control in front of AS and TGS to enhance the security of AS and TGS.

  • As the number of users increases, key management becomes complicated. Kerberos owns the hash value of each user’s password, and AS and TGS are responsible for the allocation of keys for inter-household communication. If n users want to communicate with each other at the same time, n x (n-1) /2 keys need to be maintained.

Eight, summary

This article mainly introduces Kerberos from the aspects of the overview of Kerberos, the description of the verification process, the explanation of basic concepts, Kerberos considerations and advantages and disadvantages. The next article will be a tutorial on how to use Hadoop services in Kerberos environment

Extended link

  • Kerberos Principle — Classical dialogue
  • Kerberos installation configuration based on AMbari
  • Windows local installation configures the Kerberos client
  • Kerberos application
  • Disable Kerberos based on Ambari
  • This section describes how to use Java connection to operate Hive in Kerberos
  • If you still don’t understand the Kerberos principle, I lose!

The resources

  • Docs.oracle.com/cd/E19253-0…

  • www.zhihu.com/question/22…

  • www.anquanke.com/post/id/171…

  • Docs.hortonworks.com/HDPDocument…

  • Docs.hortonworks.com/HDPDocument…