Reducing the Time to Value of your dbt Deployment with Slim CI

So you’ve been using dbt for a bit now…

You have all of your transformations in dbt and your deployments are executing flawlessly, plus you noticed your development velocity has greatly increased. However, as your dbt repo has grown, you’ve begun to see that your deployments are taking even longer.

You’ve spent a lot of time tagging your code to optimize your data refreshes, and while your refreshes run quickly, your deployments aren’t. So how do we solve this?

Thankfully, dbt has a solution to this and it is called the Slim CI.

What is Slim CI?

Slim CI is a wonderful feature built into dbt that allows us to only run certain portions of our code tree based on the activity of our last job run. It does this through the creation of a manifest that it used to track various things about the last run.

This manifest can be used to help identify code that has changed, models that errored, models that were skipped, models that succeed, tests that fail, tests that pass, etc. As you can see, this can be very useful for helping us reduce our dbt deployments so we are only running changes.

So what does this look like?

You have a couple of selectors you can use within your dbt statements. The three are: state, result, and source_status. The state allows us to look for models that have changed and run them, while the result is used to run models or tests that have various states after the last run, and source_status is used to only execute models who have a more recent source freshness from the last run.

Below are all of the Slim CI selectors:

state:modified
result:fail
result:error
result:warn
result:success
result:skipped
result:pass
source_status:fresher

These selectors allow us to pass in a state file (or if you’re using dbt Cloud, just specify the manifest within your job and you’re done) that we can use to identify these sorts of changes and model states since the last job was run.

If you’re using dbt Core, these manifest files are stored in the target folders, you can grab the files and keep them in your git repository to help track changes as time runs on.

How to Optimize dbt Deployments

Now, let’s optimize your dbt deployment using Slim CI. Once you’ve configured your dbt to track the manifest (whether through dbt Core by keeping manifest files, or by specifying the manifest to use in dbt Cloud), optimizing your deployments is very easy. You just need to update your dbt deployment job to execute the following command:

dbt build –select state:modified+
- If you’re using dbt core make sure to append the following to your run:
- –defer –state path/to/artifacts

This command will execute the dbt build command, but it will only run models that have changed and any of the downstream models that depend on them. This can save you time on your deployments and allow you to push your deployments out quicker—allowing the various business users to get changes to their models more efficiently.

Optimize dbt Model Refreshes

What about optimizing your build to only run models that actually have data that is more recent than the last run? You can also do this using the source_status selector.

For this one to work, you do however need to have source freshness tests in your dbt project for it to compare against and your previous run must have a source freshness run stored within it. Then you can update your dbt model refresh runs to look something like this:

dbt source freshness
dbt build –select source_status:fresher+
- If you’re using dbt core make sure to append the following to your run:
- –state path/to/artifacts

What this does is begin by executing your source freshness tests, then build any models and their downstream models that have a source_status that is fresher or more recent than the last time the source freshness was run in the artifacts.

This allows us to easily make sure we are only running models that have recently had their underlying data changed since the last time we refreshed our models. If you combine this with incremental models, you can optimize your runs to be as efficient as possible and drive value and data to the business at or near real-time.

Closing

With the combination of dbt, the Snowflake Data Cloud, automation, source control, and Slim CI, we are bringing our transformation pipelines into the future. They can be highly reproduced and we are able to optimize the deployments and refreshes of our data thanks to the power of dbt and everything that comes with it.

Need help getting started with Snowflake or dbt? Our team of data experts are happy to assist. Reach out today!

Reducing the Time to Value of your dbt Deployment with Slim CI

What is Slim CI?

How to Optimize dbt Deployments

Optimize dbt Model Refreshes

Closing

More to explore

Snowflake Query Tagging Best Practices

Data Ingestion from PostgreSQL to Snowflake using Openflow

The GenAI Strategy Playbook

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit

Industries

Solutions

Company

Technology Partners

Other Technology Partners

Check out our latest insights

Snowflake Query Tagging Best Practices

Data Ingestion from PostgreSQL to Snowflake using Openflow

Data Engineering

Consulting, Migrations, Data Pipelines, DataOps

Change Management, Enablement & Learning

COE, Coaching, PMO

Data Science and Machine Learning Services

MLOps Enablement, Prototyping, Model Development and Deployment

Strategy Services

Data, Analytics, and AI Strategy, Architecture and Assessments

Reporting, Analytics, and Visualization Services

Self-Service, Integrated Analytics, Dashboards, Automation

Elastic Operations

Data Platforms, Data Pipelines, and Machine Learning