October 4, 2022

Reducing the Time to Value of your dbt Deployment with Slim CI

By Dakota Kelley

So you’ve been using dbt for a bit now…

You have all of your transformations in dbt and your deployments are executing flawlessly, plus you noticed your development velocity has greatly increased. However, as your dbt repo has grown, you’ve begun to see that your deployments are taking even longer. 

You’ve spent a lot of time tagging your code to optimize your data refreshes, and while your refreshes run quickly, your deployments aren’t. So how do we solve this? 

Thankfully, dbt has a solution to this and it is called the Slim CI.

What is Slim CI?

Slim CI is a wonderful feature built into dbt that allows us to only run certain portions of our code tree based on the activity of our last job run. It does this through the creation of a manifest that it used to track various things about the last run. 

This manifest can be used to help identify code that has changed, models that errored, models that were skipped, models that succeed, tests that fail, tests that pass, etc. As you can see, this can be very useful for helping us reduce our dbt deployments so we are only running changes.

So what does this look like? 

You have a couple of selectors you can use within your dbt statements. The three are: state, result, and source_status. The state allows us to look for models that have changed and run them, while the result is used to run models or tests that have various states after the last run, and source_status is used to only execute models who have a more recent source freshness from the last run. 

Below are all of the Slim CI selectors:

  • state:modified
  • result:fail
  • result:error
  • result:warn
  • result:success
  • result:skipped
  • result:pass
  • source_status:fresher

These selectors allow us to pass in a state file (or if you’re using dbt Cloud, just specify the manifest within your job and you’re done) that we can use to identify these sorts of changes and model states since the last job was run. 

If you’re using dbt Core, these manifest files are stored in the target folders, you can grab the files and keep them in your git repository to help track changes as time runs on.

How to Optimize dbt Deployments

Now, let’s optimize your dbt deployment using Slim CI. Once you’ve configured your dbt to track the manifest (whether through dbt Core by keeping manifest files, or by specifying the manifest to use in dbt Cloud), optimizing your deployments is very easy. You just need to update your dbt deployment job to execute the following command:

  • dbt build –select state:modified+
    • If you’re using dbt core make sure to append the following to your run:
    • –defer –state path/to/artifacts

This command will execute the dbt build command, but it will only run models that have changed and any of the downstream models that depend on them. This can save you time on your deployments and allow you to push your deployments out quicker—allowing the various business users to get changes to their models more efficiently.

Optimize dbt Model Refreshes

What about optimizing your build to only run models that actually have data that is more recent than the last run? You can also do this using the source_status selector. 

For this one to work, you do however need to have source freshness tests in your dbt project for it to compare against and your previous run must have a source freshness run stored within it. Then you can update your dbt model refresh runs to look something like this:

  • dbt source freshness
  • dbt build –select source_status:fresher+
    • If you’re using dbt core make sure to append the following to your run:
    • –state path/to/artifacts

What this does is begin by executing your source freshness tests, then build any models and their downstream models that have a source_status that is fresher or more recent than the last time the source freshness was run in the artifacts. 

This allows us to easily make sure we are only running models that have recently had their underlying data changed since the last time we refreshed our models. If you combine this with incremental models, you can optimize your runs to be as efficient as possible and drive value and data to the business at or near real-time.

Closing

With the combination of dbt, the Snowflake Data Cloud, automation, source control, and Slim CI, we are bringing our transformation pipelines into the future. They can be highly reproduced and we are able to optimize the deployments and refreshes of our data thanks to the power of dbt and everything that comes with it.

Need help getting started with Snowflake or dbt? Our team of data experts are happy to assist. Reach out today!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit