Since HDP-2.5 Hortenworks provides its HDP Sandbox within a docker container within a Virtual Machine ISO. For many developers working with multiple VM HDP Sandboxes this is not optimal, as we always have to tunnel each connection through the VM host into the docker container. That’s why we are building our own custom Sandbox. However, when building a kerberized Hadoop installation, it is a bit tricky to configure a hostname such that Kerberos principals resolve the _HOST variable properly.

## Ambari’s autogenerated Kerberos principals

With Ambari and Ambari Blueprints automated Hadoop cluster installations are quite comfortable; one can simply describe all components and configurations in a Blueprint XML file. When it comes to Kerberos, Ambari automatically takes care of creating all principals and keytabs. However, I was experiencing a strange Kerberos authentication error when Ambari resolved the _HOST variable to localhost in all principals despite setting the hostname sandbox.chaosmail.at in /etc/hosts and /etc/hostname. Hence, the Kerberos principal was not valid.

It turned out, that the error occurred due to placing 127.0.0.1 sandbox.chaosmail.at in the last line of /etc/hosts instead of the first. Here is how I debugged the error.

## Diving into Ambari source code

If we dive into the Ambari source code on Github and search for _HOST, we quickly find the following code snippet.

String hostname = record.get(KerberosIdentityDataFileReader.HOSTNAME);

if(KerberosHelper.AMBARI_SERVER_HOST_NAME.equals(hostname)) {
// Replace KerberosHelper.AMBARI_SERVER_HOST_NAME with the actual hostname where the Ambari
// server is... this host
hostname = StageUtils.getHostName();
}

// Evaluate the principal "pattern" found in the record to generate the "evaluated principal"
// by replacing the _HOST and _REALM variables.
String evaluatedPrincipal = principal.replace("_HOST", hostname).replace("_REALM", defaultRealm);


Bingo, that’s the place where the _HOST variable gets resolved. In our case, running host and server on the same machine, the variable will be replaced by the return value of the StageUtils.getHostName() function.

Let’s find this function in the source code and look at the relevant line.

server_hostname = InetAddress.getLocalHost().getCanonicalHostName().toLowerCase();


Now we know, that the _HOST variable in a Kerberos principal will be replaced with the output of the getCanonicalHostName() function (which is implemented in the standard library in the package java.net.InetAddress) when autogenerating principals with Ambari.

## Testing the hostname

Let’s throw the pieces together and write a little Java script to print out the hostname using the getCanonicalHostName() function.

import java.net.InetAddress;
import java.net.UnknownHostException;

class PrintHostname {

public static void main(String[] args) {
String server_hostname;
try {
} catch (UnknownHostException e) {
System.out.println("Could not find canonical hostname");
server_hostname = "localhost";
}

System.out.println(server_hostname);
}
}


We can run the script using the following commands:

javac PrintHostname.java
java PrintHostname


Finally when we can test the 2 versions of /etc/hosts.

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
127.0.0.1   sandbox.chaosmail.at


Using the above hosts file, the PrintHostname script outputs localhost.

127.0.0.1   sandbox.chaosmail.at
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6


Using the above hosts file, the PrintHostname script outputs sandbox.chaosmail.at.

## Resolving _HOST in Kerberos principals

Finally we can be sure that getCanonicalHostName() returns sandbox.chaosmail.at and hence the _HOST variable we resolve to sandbox.chaosmail.at. This means that all principals generated by Ambari will have the proper hostname and hence will be valid principals.