October 30, 2023

Why Upgrade to dbt Cloud over dbt Core?

By Dakota Kelley

So you’ve heard all the talk around dbt, but now you’re working to determine if you should go with dbt Core or dbt Cloud and you want to know what advantages dbt Cloud has over the free dbt Core offering.

Upon a quick trial and look at dbt Cloud, the primary things you might notice are the IDE as well as the ease of managing deployments. However, dbt Cloud offers you much more than that. Especially with all the new features announced at dbt Coalesce 2023, including cross-project lineage, data mesh, and dbt Explorer.

As dbt Labs’ 2023 Partner of the Year, we’ve helped many customers be successful with dbt, and in this blog, we’ll dive deep into the biggest advantages our customers see when upgrading to dbt Cloud.

Top Reasons to Upgrade to dbt Cloud

Before we dive into the various features of dbt Cloud, let’s start by highlighting a few of the important features that our customers love about dbt:

  • Dedicated IDE
  • Simplified git workflow
  • Hosted Documentation
  • dbt Explorer
  • Unified Metrics and Headless BI with Semantic Layer
  • SlimCI to optimize pipelines and reduce cloud spend
  • Data Transparency with model exposures

Dedicated IDE

One of the first things you’re greeted with when you login to dbt Cloud is the dedicated Integrated Development Environment (IDE). This IDE is fairly straightforward, allowing you to set a light or dark mode. It comes with a rather lightweight intellisense, and highlights for both SQL and Jinja use.

Additionally, this IDE simplifies the git process for newer developers or analytics engineers to help remove the intimidation factor that comes with learning git.

The real power is the ability to run your models and view the outputs, or even have your SQL compiled to verify that your Jinja or SQL compiles into the correct model. This can be very useful when troubleshooting your models and is far easier than digging through the compiled models in the target folder.

Additionally, linting and parsing are built-in features allowing organizations to define coding standards and enforce them within the IDE thanks to various linters and parsers.

But most importantly, the IDE validates your work and configurations, allowing you to catch and correct errors before you ever attempt to execute dbt.

For those that have a particular IDE they enjoy or are power users, the new dbt Cloud CLI allows a user to develop locally with an IDE while pushing everything to the cloud to keep everything fully synchronized from a local IDE to the dbt Cloud account.

Hosted Doc Site and dbt Explorer

One of the most powerful features of dbt can be the documentation you generate. This documentation can give different users insight into where data came from, what the profile of the data is, what the SQL looked like, and the DAG to know where the data is being used.

While dbt Core provides you the ability to generate these document sites, you still have to find a place to host these documents. 

With dbt Cloud, however, you are hosting this documentation in your dbt Cloud instance. In fact, you can even configure the default documentation for a project to be the documentation that is generated on a production deployment. Then your read-only accounts can log in and view the current production documentation while your dev team is able to view and generate dev documentation for their own review before pushing it out to your general business users.

On top of this, there is also the dbt Explorer which allows an organization to holistically analyze all of their dbt projects. dbt Explorer will look at all projects, create cross-project lineage documentation, and help monitor dbt projects holistically including performance and data tests.

To accomplish this within dbt Core, you would need to build this functionality to analyze all dbt projects into a singular view. Plus managing cross-project/multi-project repositories can be difficult. Especially if you wish to treat dbt models as APIs between projects.

Exposures and Their Quality

On the note of documentation, dbt provides a piece of documentation known as exposure. It allows you to tag which final models are being used for a particular data product or dashboard. The great thing about this is that you can tag your final models, and thanks to the DAG, the parent models will all be brought into that exposure. This is great for making it easier for analysts and business users to investigate their portion of the models in your project. 

But maybe your business users want to be able to know if the data they’re consuming is fresh and up to their standards for data quality. Those sorts of questions become exceptionally easy with dbt Cloud, where they provide a simple Metadata API that you can use to generate an iFrame to add to your dashboard that can let you know the freshness of your data AND if your tests have passed. 

Even if there is a failure, or if your data is stale, dbt Cloud provides a link to see the details of the tests and the sources that might be stale.

A screenshot from dbt with two green circles that have a white checkmark in them. The first circle says, "Data Freshness passed." The second one says, "Data quality passed"

dbt Semantic Layer

As the use of data modeling grows within an organization, it can be difficult to unify the businesses’ understanding of their metrics. Over time, this results in disparate metrics trying to calculate the same thing, which can cause unnecessary animosity within the organization.

To address this, dbt has created its semantic layer which allows teams to document and define metrics and how they are calculated. This helps unify and centralize metrics while embracing a headless BI process.

dbt provides not just the semantic layer to define metrics, but an API that will query metrics for the organization’s BI tools—all while handling the join traversal and returning the answers from the warehouse directly to the BI tool to optimize the metric calculation process.

Data Health with the Discovery API

As mentioned earlier, dbt Cloud provides an easy-to-query Discovery API that contains all the metadata about a dbt project and the various runs. This GraphQL API allows you to easily query various information about your dbt Cloud runs and projects and helps you evaluate not just your current data health, but your overall data health at a point in time.

The Discovery API is what drives dbt Explorer, and allows you to query the same information or even potentially combine it with other data to help supplement any analysis the organization needs of their dbt projects.

Without the Discovery API, you would have to collect and parse the various artifacts by hand when you initiate a deployment or data refresh. That can be a lot of work and rather difficult to do, however, the Discovery API that is provided with dbt Cloud makes this sort of analysis and data visualization arbitrary and easy.

Monitor and Initiate dbt Jobs with the Cloud API

Aside from the Metadata API, there is also a Rest API provided with dbt Cloud. This API allows you to monitor, create, run, cancel, or even overload your various dbt Jobs and pipelines, which is something you don’t have access to with dbt Core. 

This makes it very easy for you to configure certain actions to execute based on other activities, such as a merge in your git repository. You can even extract the artifacts from various job executions in case you see any issues or want to monitor what activity is going on. You can see this sort of information in action when you watch how a job is executed in dbt Cloud.

Slim CI Simplification

Slim CI is one of the best ways to optimize your pipelines and help reduce your time to value of your dbt deployments. While this tool is available within dbt Core, it can sometimes be difficult to decide how you are going to manage the manifest that Slim CI needs to optimize your runs. 

With dbt Cloud, this sort of optimization becomes trivial, you can even go so far as to have a particular job to monitor the manifest of another job. All of this makes it very easy to optimize your deployments and allow your Analytics Engineers to focus on creating value additive models instead of monitoring long running pipelines and deployments.

Closing

As you can see, there are a plethora of features that really separate dbt Cloud from dbt Core. Taking advantage of these features can make the development and monitoring of dbt a breeze. dbt Cloud also gives your end users certainty that the data they’re using to make decisions is clean and current.

Enjoyed what you read? Be sure to check out our Ultimate Modern Data Stack Migration Guide that features dbt and how it pairs perfectly with other modern technologies to maximize your data’s impact. 

Download the Ultimate Modern Data Stack Migration Guide!

Uncover the steps and technologies that will transform your organization into a modern, data-driven business.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit