So you’ve heard all the talk around dbt, but now you’re working to determine if you should go with dbt Core or dbt Cloud and you’re wanting to know what advantages dbt Cloud has over the free dbt Core offering.
Upon a quick trial and look at dbt Cloud, the primary things you might notice are the IDE as well as the ease of managing deployments. However, dbt Cloud offers you much more than that.
In this blog post, we’ll dive into six of the most powerful dbt Cloud features and many others that you probably don’t know about.
One of the first things you’re greeted with when you login to dbt Cloud is the dedicated Integrated Development Environment (IDE). This IDE is fairly straightforward, allowing you to set a light or dark mode. It comes with a rather lightweight intellisense, and highlights for both SQL and Jinja use.
The real power is the ability to run your models and view the outputs, or even have your SQL compiled to verify that your Jinja or SQL compiles into the correct model. This can be very useful when troubleshooting your models and is far easier than digging through the compiled models in the target folder.
But most importantly, the IDE validates your work and configurations, allowing you to catch and correct errors before you ever attempt to execute dbt.
Hosted Doc Site for Documentation
One of the most powerful features of dbt can be the documentation you generate. This documentation can give different users insight into where data came from, what the profile of the data is, what the SQL looked like, and the DAG to know where the data is being used. While dbt Core provides you the ability to generate these document sites, you still have to find a place to host these documents.
With dbt Cloud, however, you are hosting this documentation in your dbt Cloud instance. In fact, you can even configure the default documentation for a project to be the documentation that is generated on a production deployment. Then your read-only accounts can log in and view the current production documentation while your dev team is able to view and generate dev documentation for their own review before pushing it out to your general business users.
Exposures and Their Quality
On the note of documentation, dbt provides a piece of documentation known as exposure. It allows you to tag which final models are being used for a particular data product or dashboard. The great thing about this is that you can tag your final models, and thanks to the DAG, the parent models will all be brought into that exposure. This is great for making it easier for analysts and business users to investigate their portion of the models in your project.
But maybe your business users want to be able to know if the data they’re consuming is fresh and up to their standards for data quality. Those sorts of questions become exceptionally easy with dbt Cloud, where they provide a simple Metadata API that you can use to generate an iFrame to add to your dashboard that can let you know the freshness of your data AND if your tests have passed.
Even if there is a failure, or if your data is stale, dbt Cloud provides a link to see the details of the tests and the sources that might be stale.
Data Health with the Metadata API
As mentioned earlier, dbt Cloud provides an easy-to-query metadata API that contains all the metadata about a dbt project and the various runs. This GraphQL API allows you to easily query various information about your dbt Cloud runs and projects and helps you evaluate not just your current data health, but your overall data health at a point in time.
While there are some limitations on the retention of the artifacts, this API does support the last three months of runs and you can generate various dashboards to help monitor and alert yourself (and your team) to issues.
Without the Metadata API, you would have to collect and parse the various artifacts by hand when you initiate a deployment or data refresh. That can be a lot of work and rather difficult to do, however, the Metadata API that is provided with dbt Cloud makes this sort of analysis and data visualization arbitrary and easy.
Monitor and Initiate dbt Jobs with the Cloud API
Aside from the Metadata API, there is also a Rest API provided with dbt Cloud. This API allows you to monitor, create, run, cancel, or even overload your various dbt Jobs and pipelines, which is something you don’t have access to with dbt Core.
This makes it very easy for you to configure certain actions to execute based on other activities, such as a merge in your git repository. You can even extract the artifacts from various job executions in case you see any issues or want to monitor what activity is going on. You can see this sort of information in action when you watch how a job is executed in dbt Cloud.
Slim CI Simplification
Slim CI is one of the best ways to optimize your pipelines and help reduce your time to value of your dbt deployments. While this tool is available within dbt Core, it can sometimes be difficult to decide how you are going to manage the manifest that Slim CI needs to optimize your runs.
With dbt Cloud, this sort of optimization becomes trivial, you can even go so far as to have a particular job to monitor the manifest of another job. All of this makes it very easy to optimize your deployments and allow your Analytics Engineers to focus on creating value additive models instead of monitoring long running pipelines and deployments.
As you can see, there are a plethora of features that really separate dbt Cloud from dbt Core. Taking advantage of these features can make the development and monitoring of dbt a breeze.
dbt Cloud also gives your end users certainty that the data they’re using to make decisions is clean and current. Pair this up with the Snowflake Data Cloud and you’ll have both an unmatched cloud data warehouse and a game changing transformation workload.
Need help getting started with Snowflake or dbt? Our team of data experts are happy to assist. Reach out today!