Welcome to the lowdown on the Coalesce Conference, where the data community’s pulse thumps at its strongest, hosted by dbt Labs. Coalesce is where the brightest minds in data — from data practitioners to the largest companies — converge to share, inspire, and reshape the landscape with dbt, the gold standard tool for data transformation.
The conference was filled with various sessions, each offering a deep dive into dbt’s role in data strategy and innovation. Attendees don’t just watch; they immerse themselves in hackathons, tackle certification exams, and enjoy fun, happy hours. There is a palpable sense of anticipation for the unveiling dbt’s new features and a celebratory cheer for the victors in the awards ceremonies.
In this blog, we’ll spotlight the transformative announcements that emerged from the Coalesce Conference. Join us as we navigate the key takeaways defining the future of data transformation.
dbt Mesh
Enterprises today face the challenge of managing massive, intricate data projects that can slow down innovation. In mid-2023, many companies were wrangling with more than 5,000 dbt models.
Picture the hustle it takes to keep that many models in line—ensuring they’re reliable, the dependencies make sense, and the data is solid. It’s a jigsaw puzzle on a massive scale, and every piece has to be just right. That’s no small feat for any data team!
dbt Mesh arrives to solve these problems, providing a structure that segments data projects into clear and manageable smaller portions, where each team takes charge of their own data domain while sharing it across the company. But you can be a small company to use dbt mesh. No matter how big you are, you can leverage dbt mesh to get:
Empowerment through Decentralization: Teams get domain-specific autonomy, allowing them to build and own their business logic without cross-team dependencies that often lead to bottlenecks.
Streamlined Processes: Domain-level teams operate independently, governed by standards that ensure both quality and compliance while enabling faster development and delivery of data products.
Enhanced Collaboration: dbt Mesh fosters a collaborative environment by using cross-project references, making it easy for teams to share, reference, and build upon each other’s work, eliminating the risk of data silos.
Trust and Quality at the Forefront: Domain experts are empowered to create business logic that they understand best, ensuring the delivery of quality data products that stakeholders can trust.
But don’t think dbt Mesh is a standalone product; actually, it’s a pattern powered by a suite of interconnected dbt features:
Cross-project references: This is what enables multi-project deployments. The {{ ref() }}
macro can now be used across dbt Cloud projects on Enterprise plans. You just need to pass two arguments instead of one, with the first being the project’s name.
with my_cte as (
select * from {{ ref('another_project', 'downstream_model') }}
),
...
Governance: With dbt’s new governance capabilities, there’s precise management of who gets to access and interact with your dbt models, both within a single project and across multiple ones. For that, in your models, .yml
configuration file, use the group (allows the categorization of models to specific groups within a project) and access (determines who can reference models) keys.
# First, define the group and owner
groups:
- name: my_group
owner:
name: My Group's Name
email: my_group_email@jaffle.shop
# Then, add 'group' + 'access' modifier to specific models
models:
- name: my_dimension_model
group: my_group
access: public # Public models can be referenced by any group/team/project, they should be stable & mature
- name: my_intermediate_model
group: my_group
access: private # Private models can only be referenced by models in the same project
- name: my_staging_model
group: my_group
access: protected # Protected models can only be referenced by models in the same group
Model Versions: You can now treat data models like stable APIs when coordinating across projects and teams, and model versioning is the structured way to manage the lifecycle of models as they grow and change.
version: 2
models:
- name: model_name
versions:
- v: # required
defined_in: # optional -- default is _v
columns:
# specify all columns, or include/exclude columns from the top-level model YAML definition
- include:
exclude:
# specify additional columns
- name: # required
- v: ...
# optional
latest_version:
Model Contracts: Data contracts are the reliability piece of the mesh. They establish clear expectations for data structure, safeguarding against disruptions caused by upstream changes in dbt or within a project’s logic and ensuring downstream processes remain unaffected. For that, you can now, in the model’s .yml
configuration file, define the data_type
and constraints (such as null and unique) keys.
If you want to know more about data mesh, check out these links:
dbt Labs’ Perspective on Data Mesh
dbt Mesh Structures/Objects
Multi-project Deployments
Model Groups & Access
Model Versions
Model Contracts
dbt Explorer
Managing thousands of models is about more than just keeping them reliable; it’s also a real headache when you’re trying to figure out how they all connect. dbt Mesh comes to the rescue by slimming down those complex connections, making your lineage graph much more manageable but seeing less of the overall picture.
With dbt Explorer, you get rid of these problems. You now have the flexibility to view your lineage in full detail or drill down to specific sections, all through a straightforward and intuitive interface. It’s about more than just looking at one project; dbt Explorer lets you see the lineage across different projects, ensuring you can track your data’s journey end-to-end without losing track of the details.
dbt Explorer is a feature-rich tool for users with multi-tenant or AWS single-tenant dbt Cloud accounts on the Team or Enterprise plan, providing comprehensive lineage and metadata analysis capabilities. Here’s a summary of its core features:
Prerequisites: Requires a dbt Cloud account and successful job runs in a production environment for exploration.
Metadata Generation: Utilizes metadata from the Discovery API to display the current project state, updating automatically after each production job run.
Lineage Graph Visualization: Offers interactive lineage graphs of project DAGs, with color-coded nodes and iconography for different resource types, which can be zoomed, panned, and explored in detail.
Resource Interaction: Allows for detailed inspection of resources, including their metadata, by hovering, selecting, and interacting directly with the lineage graph.
Search Functionality: Supports keyword and advanced selector method searches to navigate resources efficiently, including graph and set operators for refined queries.
Sidebar Navigation: Provides a catalog sidebar for browsing resources by type, package, file tree, or database schema, reflecting the structure of both dbt projects and the data platform.
Version Tracking: Displays version information for models, indicating whether they are prerelease, latest, or outdated.
Detail Inspection: Enables access to the definition and latest run results of any resource, with a detailed view that includes a status bar, various informational tabs, and a model’s lineage graph.
Cross-Project Lineage: Shows project-level lineage, detailing cross-project resource usage and dependencies, with different iconography for upstream and downstream project relationships.
Project-Level Exploration: Allows for the examination of all projects in an account, providing search capabilities and the ability to view the details of public models and project dependencies.
dbt Explorer stands out as a versatile platform for users to manage and understand complex dbt Cloud projects, providing insights into the lineage and metadata that drive data transformation projects.
Read more about the dbt Explorer: Explore your dbt projects
dbt Semantic Layer: Relaunch
The dbt Semantic Layer is an innovative approach to solving the common data consistency and trust challenges. It leverages dbt, a popular transformation solution utilized by thousands of organizations, to create a well-defined, clean data foundation essential for an effective semantic layer.Â
The dbt Semantic Layer facilitates collaboration, version control, and smooth migration between different tools and platforms by integrating with various data platforms and supporting metric definitions in code.
This layer is enriched by the integration of MetricFlow, which further sophisticates the metric framework. The dbt Semantic Layer allows for defining semantic models and metrics on top of dbt models in a code-based environment. At query time, it dynamically generates a semantic graph that connects all the defined metrics and dimensions, making them readily available through robust APIs for downstream tools.Â
This ensures that every member of the organization has access to a consistent, transparent, and context-rich understanding of key metrics, empowering them to make informed decisions and independently explore data with confidence.Â
Here’s a quick summary of the dbt Semantic Layer:
Centralized Metric Definitions: The Semantic Layer becomes the single source of truth for metrics across your organization, reducing redundancy and ensuring uniformity in reporting.
Streamlined Metric Creation and Management: With MetricFlow, you can easily establish and oversee company metrics through flexible abstractions and SQL query generation.
Efficient Data Retrieval: Quick access to metric datasets from your data platform is made possible by MetricFlow’s optimized processes.
Seamless Integration with Downstream Tools: The setup process is tailored to enable consistent metric access across a variety of analytics and business intelligence tools.
Tableau (beta)
Google Sheets (beta)
Hex
Klipfolio PowerMetrics
Lightdash
Mode
Push.ai
Delphi
Prerequisites and Compatibility: It requires a dbt Cloud Team or Enterprise account and supports popular data warehouses like Snowflake, BigQuery, Databricks, and Redshift.
Development and Deployment: You’re guided through creating semantic models, defining dimensions and measures, and configuring your deployment environment to harness the full potential of the Semantic Layer.
Customized Querying Capabilities: Metrics defined can be queried directly in dbt Cloud or via API in downstream tools, with the necessary configurations outlined for both dbt Cloud and Core users.
Service Token Authentication: To connect and query metrics in downstream tools, a service token with specific permissions is required, ensuring secure access.
Typical Scenarios:
Business intelligence (BI), reporting, and analytics
Data quality and monitoring
Governance and privacy
Data discovery and cataloging
Machine learning and data science
Have a look at a complete semantic model in the new dbt Semantic Layer from dbt Docs. The semantic models are defined in the model’s .yml
configuration file. This process begins with the establishment of model defaults. Subsequently, entities are specified—these typically are ‘ids.’Â
Finally, dimensions and measures are defined to fully encapsulate the data’s characteristics. Dimensions serve as the non-aggregatable properties of your records, offering categorical or temporal context that enhances the interpretability and richness of your metrics.Â
Measures, on the other hand, are the building blocks upon which metrics are constructed. Through the aggregation capabilities of MetricFlow, these measures are transformed into insightful and actionable metrics.
semantic_models:
#The name of the semantic model.
- name: orders
defaults:
agg_time_dimension: ordered_at
description: |
Order fact table. This table is at the order grain with one row per order.
#The name of the dbt model and schema
model: ref('orders')
#Entities. These usually correspond to keys in the table.
entities:
- name: order_id
type: primary
- name: location
type: foreign
expr: location_id
- name: customer
type: foreign
expr: customer_id
#Measures. These are the aggregations on the columns in the table.
measures:
- name: order_total
description: The total revenue for each order.
agg: sum
- name: order_count
expr: 1
agg: sum
- name: tax_paid
description: The total tax paid on each order.
agg: sum
- name: customers_with_orders
description: Distinct count of customers placing orders
agg: count_distinct
expr: customer_id
- name: locations_with_orders
description: Distinct count of locations with order
expr: location_id
agg: count_distinct
- name: order_cost
description: The cost for each order item. Cost is calculated as a sum of the supply cost for each order item.
agg: sum
#Dimensions. Either categorical or time. These add additional context to metrics. The typical querying pattern is Metric by Dimension.
dimensions:
- name: ordered_at
type: time
type_params:
time_granularity: day
- name: order_total_dim
type: categorical
expr: order_total
- name: is_food_order
type: categorical
- name: is_drink_order
type: categorical
Once your semantic models are set, launch a production job in dbt Cloud to materialize your metrics. Then, configure the dbt Semantic Layer on both environment and project levels. This setup enables querying of your metrics through JDBC tools or direct integration with the dbt Semantic Layer.
The promise of dbt’s Semantic Layer lies in its ability to democratize data understanding and access, thereby reducing data inequality and fostering a unified approach to data-driven decision-making across an organization.
Read more about the dbt Semantic Layer at Explore the dbt Semantic Layer.
dbt Cloud CLI
For those who miss their code editor when working with dbt Cloud, the platform’s CLI feature bridges the gap between the comfort of your local development environment and the robust capabilities of dbt Cloud.Â
It empowers teams to run dbt commands directly from the local command line, enabling a mix of familiarity and advanced cloud functionality.
Seamless Local-Cloud Integration: Work with dbt Cloud straight from your local setup, executing commands without leaving your beloved code editor.
Secure and Convenient: Credentials are safely managed on the dbt Cloud, simplifying your workflow while keeping security tight.
Collaborative Development: With support for dbt Mesh, the CLI promotes effective teamwork through cross-project references.
Optimized Builds: Experience the advantage of quicker and more cost-efficient builds courtesy of dbt Cloud’s infrastructure.
Read more about the dbt Cloud CLI:
Continuous Integration & Deployment Dedicated Jobs
dbt Cloud has refined its job orchestration by introducing two specific job types: deploy jobs for building production data assets and continuous integration (CI) jobs for validating code changes.Â
This enhancement streamlines the user experience with tailored defaults and a guided setup, embedding best practices into each job type’s configuration.
The improvement is significant for CI jobs, as dbt Cloud now more efficiently determines when to run builds and tests. State comparisons between pull request (PR) code and production code are made more efficiently, minimizing unnecessary builds. dbt Cloud can now skip tests on unchanged code by deferring to an environment rather than an individual job.
Deploy jobs are designed to build into production databases, running sequentially to prevent conflicts. CI jobs, in contrast, are built to test new code, running in parallel to speed up team workflows. They also feature smart cancellation policies to avoid redundant work, increasing overall efficiency.
To maximize the efficacy of CI jobs, dbt Labs recommends setting them up in a dedicated environment linked to a staging database, ensuring isolation from production data builds. This setup allows for PR-based triggers, and with dbt Cloud’s native Git integration, it’s now simpler to manage these CI processes across GitHub, GitLab, or Azure DevOps.
To implement jobs, users must have a dbt Cloud account and the appropriate permissions. These jobs can be triggered via schedule or events, ensuring your data assets are always up-to-date.
Both CI and deploy jobs benefit from a detailed overview within dbt Cloud, providing insights into job triggers, commits, run timing, and step-by-step logs. This transparency aids in troubleshooting and optimizing the data build process.
In summary, dbt Cloud’s new CI and deploy job types elevate the efficiency and precision of building and testing data assets, offering organizations a robust and intelligent approach to data transformation workflows.
Read more about dbt Cloud new job types at:
Fivetran dbt Cloud Integration
Currently scheduled to be released in early 2024, we are excited to share that Fivetran has announced a new integration with dbt Cloud that allows you to seamlessly trigger, monitor, and observe your dbt transformations from Fivetran. With this, you’ll be able to use best-in-class tooling for the movement of all your data and sync it directly by triggering your transformations for end-to-end pipeline automation.
To learn more, please visit Fivetran’s announcement page.Â
Conclusion
Coalesce 2023 introduced a suite of impressive tools and features that have generated considerable excitement for their potential impact on data transformation. We at phData are eager to put these innovations into action.
For organizations looking to make the most of dbt, phData is ready to assist. As dbts’ 2023 Partner of the Year, we have the expertise to ensure your dbt setup is optimized and powerful, driving your organization forward.
Reach out to us to enhance your data capabilities with dbt!