Many of phData’s customers face the issue that they need to connect to a source secured via Kerberos in a Spark application. A source can be a JDBC connection like Impala, or a web URL that uses Kerberos for Authentication. While a simple workaround is to run the application on YARN with the deploy-mode client, phData recommends to have all Spark applications run in cluster mode. This post will cover how to connect to a secured source in cluster mode based on the example of connecting to secured Kafka from a Spark streaming app.
What’s the Problem?
When Spark runs on YARN on a secured cluster, the user needs to kinit. After performing a kinit, when a job gets submitted, delegation tokens get sent out to the Application Master(AM) and the executors. Those delegation tokens are for HDFS, HBase, and YARN. When a developer wants to connect to a different kerberized source and run the application in cluster mode, it fails. Accessing Kudu with Impala JDBC drivers is a common use case as well as the access of secured Kafka.
The solution in this article covers the approach using the concept of a jaas.conf file.
Please note that in order to run in cluster mode, all references to keytabs need to use relative paths.
Example jaas.conf (frank_jaas.conf):
We ship the jaas.conf along with a keytab to the application master and the executors by specifying the –files option in spark-submit. This is different than using the –principal –keytab option in spark-submit. Please note the configuration for extraJavaOptions for the driver and the executors.
In the example below, we set -Dsun.security.krb5.debug=false, but you can set to true if you want to get debug information from Kerberos for troubleshooting.
If a custom truststore is required, the same approach can be followed with the trustore file (–files) and the system properties javax.net.ssl.trustStorePassword and javax.net.ssl.trustStore (…extraJavaOptions…)
Spark submit example command:
–name my_streaming_app \
–master yarn –deploy-mode cluster\
–num-executors 1 \
–files $FILES \
–conf “spark.driver.extraJavaOptions=-Djava.security.auth.login.config=frank_jaas.conf -Dsun.security.krb5.debug=false -Djavax.net.ssl.trustStore=my_truststore.truststore -Djavax.net.ssl.trustStorePassword=changeit”\
–conf “spark.executor.extraJavaOptions=-Djava.security.auth.login.config=frank_jaas.conf -Dsun.security.krb5.debug=false -Djavax.net.ssl.trustStore=my_truststore.truststore -Djavax.net.ssl.trustStorePassword=changeit”\
–class io.phdata.spark.streaming.StreamingDriver \