May 4, 2020

Apache Kudu Integration Testing in Scala/SBT Applications

By Brian McDevitt

Introduction to Kudu Integration Testing

Beginning with the 1.9.0 release, Apache Kudu published new testing utilities that include Java libraries for starting and stopping a pre-compiled Kudu cluster. This utility enables JVM developers to easily test against a locally running Kudu cluster without any knowledge of Kudu internal components or its different processes. Cloudera published a great introductory blog post covering the general usage of the utility. Unfortunately, SBT (a popular JVM build tool) lacked the necessary features to use the Kudu testing utilities. To solve this issue, I created a new SBT plugin, called sbt-os-detector, and made it publicly available. This post describes how to configure a Scala application using SBT in order to use the Kudu integration testing utilities. 

Configure SBT

In the build definition, add dependencies for kudu-test-utils and kudu-binary libraries. In this example, the integration test sources are separate from the unit test sources. See the SBT documentation on how and why to create separate test configurations. 

File: build.sbt

lazy val root = (project in file("."))
  .enablePlugins(OsDetectorPlugin)          //<1>
    name := "sbt-int-test-example",
    libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.8" % "it,test",
    libraryDependencies += "org.apache.kudu" % "kudu-client" % "1.11.1",
    libraryDependencies += "org.apache.kudu" % "kudu-test-utils" % "1.11.1" % "it",                                     //<2>
    libraryDependencies += "org.apache.kudu" % "kudu-binary" % "1.11.1" % "it" classifier osDetectorClassifier.value, //<3>
    libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.2.3",

<1> Enable the SBT OS Detector plugin

<2> Provides utilities for working with the Kudu test cluster

<3> kudu-binary dependency that matches the current operating system.  Note: Linux and MacOS are the only compile targets for Kudu. Windows is not a supported OS for Apache Kudu; using Windows will cause a kudu-binary dependency resolution failure on those hosts.

The next task is to configure SBT to use the correct version of the kudu-binary JAR, which requires a new SBT plugin called sbt-os-detector. This plugin provides the ability to detect the host operating system and CPU architecture as shown above (osDetectorClassifier). To use this plugin, include the following lines in the plugins.sbt file:

File: project/plugins.sbt

resolvers += "phData Releases" at       
"" //<1>
classpathTypes += "maven-plugin"                        //<2>
addSbtPlugin("io.phdata" % "sbt-os-detector" % "0.2.0") //<3>

<1> Include an additional repository for plugin resolution

<2> Include dependencies that are of type ‘maven-plugin’ which is required to resolve a transitive dependency (OS Maven Plugin)

<3> The SBT OS Detector plugin

Implement the Integration Test

Now that SBT is resolving necessary dependencies, the next step is to implement an integration test. This example will use the scalatest testing library. The example project has a single class, KuduExample, which takes one parameter and has one public method (KuduExample#createMovieTable).  Because this is an example project, the code is very simple. The class under test is instantiated with the KuduClient from the test harness. The Kudu test harness provides complete client instances without the developer needing to configure host:port values for the Kudu master server(s).

File: src/it/scala/org/apache/kudu/scala/examples/KuduExampleITest.scala

class KuduExampleITest extends FunSuite with BeforeAndAfter {

  private val harness = new KuduTestHarness() //<1>

  before {
    harness.before() //<3>

  after {
    harness.after()  //<3>

  test("create table example") {
    val kuduExample = new KuduExample(harness.getClient) //<2>
    val tableName = "testMovies"
    val testMovieTable = kuduExample.createMovieTable(tableName)
    testMovieTable match {
      case Failure(exception) => fail(exception)
      case Success(table) => assertResult(tableName)(table.getName)

<1> Create a new instance of the KuduTestHarness class to use the default settings of the MiniKuduCluster

<2> Kudu clients (both synchronous and asynchronous) are provided automatically by the test harness

<3> Setup and teardown of the MiniKuduCluster

Run the Integration Test

SBT is now configured to resolve dependencies and to compile and execute the integration test.  Simply run the test command with the it prefix.

$ sbt it:test

SBT will show a lot of output from the MiniKuduCluster during test execution. You can control logging output with any of the supported SLF4J logging frameworks. This example uses Logback to configure logging, specifically the file src/it/resources/logback.xml. The end of the test execution will be marked by a summary of the test results.

[info] KuduExampleITest:
[info] - create table example
[info] Run completed in 10 seconds, 617 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 11 s, completed Mar 23, 2020 2:12:47 PM


If you’re interested in learning more, you can find the complete example and many more in the examples directory of the Kudu source code repository.  You can be confident in the quality of  these test libraries because the Kudu project itself uses the KuduTestHarness for all of its own integration tests.  As a reminder, every application is different and your application may require a slightly different SBT configuration.  Please reach out to phData for additional information or assistance in getting Kudu integration testing implemented.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit