Accessing Kerberized Sources From Spark2 In Cluster Mode on Yarn

Code

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Introduction

Many of phData’s customers face the issue that they need to connect to a source secured via Kerberos in a Spark application. A source can be a JDBC connection like Impala, or a web URL that uses Kerberos for Authentication. While a simple workaround is to run the application on YARN with the deploy-mode client, phData recommends to have all Spark applications run in cluster mode. This post will cover how to connect to a secured source in cluster mode based on the example of connecting to secured Kafka from a Spark streaming app.

What’s the Problem?

When Spark runs on YARN on a secured cluster, the user needs to kinit. After performing a kinit, when a job gets submitted, delegation tokens get sent out to the Application Master(AM) and the executors. Those delegation tokens are for HDFS, HBase, and YARN. When a developer wants to connect to a different kerberized source and run the application in cluster mode, it fails. Accessing Kudu with Impala JDBC drivers is a common use case as well as the access of secured Kafka.

The Solution

The solution in this article covers the approach using the concept of a jaas.conf file.

Please note that in order to run in cluster mode, all references to keytabs need to use relative paths.

Example jaas.conf (frank_jaas.conf):

KafkaClient {

  com.sun.security.auth.module.Krb5LoginModule required

  useKeyTab=true

  keyTab=”frank.keytab”

  principal=”frank@PHDATA.IO”;

};

Client {

  com.sun.security.auth.module.Krb5LoginModule required

  doNotPrompt=true

  useKeyTab=true

  keyTab=”frank.keytab”

  principal=”frank@PHDATA.IO”

  storeKey=true

  useTicketCache=false;

};

We ship the jaas.conf along with a keytab to the application master and the executors by specifying the –files option in spark-submit. This is different than using the –principal –keytab option in spark-submit. Please note the configuration for extraJavaOptions for the driver and the executors.

In the example below, we set -Dsun.security.krb5.debug=false, but you can set to true if you want to get debug information from Kerberos for troubleshooting.

TLS/SSL

If a custom truststore is required, the same approach can be followed with the trustore file (–files) and the system properties javax.net.ssl.trustStorePassword and javax.net.ssl.trustStore (…extraJavaOptions…)

Spark submit example command:

FILES=frank_jaas.conf,frank.keytab,my_truststore.truststore

spark2-submit \

–name my_streaming_app \

–master yarn –deploy-mode cluster\

–num-executors 1 \

–files $FILES \

–conf “spark.driver.extraJavaOptions=-Djava.security.auth.login.config=frank_jaas.conf -Dsun.security.krb5.debug=false -Djavax.net.ssl.trustStore=my_truststore.truststore -Djavax.net.ssl.trustStorePassword=changeit”\

–conf “spark.executor.extraJavaOptions=-Djava.security.auth.login.config=frank_jaas.conf -Dsun.security.krb5.debug=false -Djavax.net.ssl.trustStore=my_truststore.truststore -Djavax.net.ssl.trustStorePassword=changeit”\

–class io.phdata.spark.streaming.StreamingDriver \

/home/frank/streaming-driver-2.0.7-jar-with-dependencies.jar

Subscribe To Our Newsletter

Get updates and learn from the experts

More To Explore

Want to learn more about phData?

Image of desk