February 8, 2024

What Are dbt Artifacts

By Arnab Mondal

Data Modeling, dbt has gradually emerged as a powerful tool that largely simplifies the process of building and handling data pipelines. dbt is an open-source command-line tool that allows data engineers to transform, test, and document the data into one single hub which follows the best practices of software engineering.

One of the key features of dbt is the tool’s ability to generate and manage artifacts that are used as a part of the data modeling procedure. 

In this blog, we will explore the different types of dbt artifacts as well as their significance in data modeling. 

What is a dbt Artifact and What are Its Uses?

A dbt artifact is a file generated by dbt that contains information about the dbt project and invocations. These files play significant roles in managing the modeling procedure and ensuring that the dbt project is consistent, error-free, and up-to-date. 

These files are generated and saved automatically by dbt every time you run a command. The dbt artifacts include JSON files such as: 

  • semantic_manifest.json

  • manifest.json

  • catalog.json

  • run_result.json

  • sources.json

JSON files store data in key-value pairs and arrays. The target software then accesses the data. Files in JSON format make it possible for developers to store various types of data in the form of code that is readable by humans, with the keys serving as names and the values containing other related data. 

The dbt artifacts also contain valuable information about the concerned dbt projects, such as connections between the model tags, between the test and models, owners, etc. They can also be utilized to: 

  • Probe into your dbt Semantic Layer

  • Do a calculation of project-level test coverage

  • Execute a longitudinal analysis of run timing

  • Identify all and any historical changes in the table structure

All About dbt Artifacts

Most dbt commands create artifacts on a successful run. Here we have provided some examples by creating a new Snowflake account and using the sample dataset provided along with the dbt cloud: 

Semantic Manifest

The file semantic_manifest.json is one of the most important dbt artifacts. It is an internal file that serves as the point of integration with MetricFlow. This file is needed by Metic Flow to generate and run metric queries for the dbt Semantic Layers, and it contains substantial and comprehensive information about your Semantic Layer. 

MetricFlow creates a data flow plan with this artifact and generates SQL from the query request within the semantic layer. The file’s importance lies in the fact that it can serve as a valuable reference that can help you develop a deep insight into the structure and details of data models. 

Semantic Manifest is stored in the /target directory of your dbt project and is produced whenever the dbt project is parsed. The dbt parse, dbt run, dbt build, and dbt compile commands can all be used to generate a semantic manifest dbt artifact. 

Top-level Keys: semantic_models, metrics, project_configuration

semantic_manifest.json:
				
					{
  "semantic_models": [],
  "metrics": [],
  "project_configuration": {
    "time_spine_table_configurations": [],
    "metadata": null,
    "dsi_package_version": {
      "major_version": "0",
      "minor_version": "4",
      "patch_version": "1"
    }
  },
  "saved_queries": []
}

				
			

Manifest

Any command that parses your project ( except deps, clean, debug, and init) can generate a manifest.json file. This one file is a full repository of your dbt project’s resources, including models, tests, macros, etc, as well as all the nodal configuration and resource properties. 

All resources with most of their properties appear in the manifest, even if you do something as rudimentary as running some models or tests. A few node properties, such as compiled_sql, will only appear for executed nodes. In recent times, the manifest file has been used by dbt to fill the docs site and to do state comparisons.

Top-level Keys: metadata, nodes, metrics, exposures, docs, parent_map, child_map, group_map, selectors, disabled, exposures, sources, macros.

manifest.json
				
					{
  "metadata": {
    "dbt_schema_version": "https://schemas.getdbt.com/dbt/manifest/v11.json",
    "dbt_version": "1.7.2",
    "generated_at": "2023-12-02T08:04:41.051242Z",
    "invocation_id": "56c6d4fe-dd3b-45f2-3fcfc946744c",
    "env": {
      "DBT_CLOUD_PROJECT_ID": "31615",
      "DBT_CLOUD_RUN_ID": "2234901",
      "DBT_CLOUD_JOB_ID": "46788",
      "DBT_CLOUD_RUN_REASON": "Kicked off from UI by amondal@phdata.io",
      "DBT_CLOUD_RUN_REASON_CATEGORY": "other",
      "DBT_CLOUD_RUN_TRIGGER_CATEGORY": "RUN_REASON_CATEGORY_UI",
      "DBT_CLOUD_ENVIRONMENT_ID": "26643",
      "DBT_CLOUD_ACCOUNT_ID": "1788"
    },
    "project_name": "my_new_project",
    "project_id": "faebc42304447d4427374679ecb5",
    "user_id": "a97195d5-021a-441b-ec9b94aba8dd",
    "send_anonymous_usage_stats": true,
    "adapter_type": "snowflake"
  },
  "nodes": {
    "model.my_new_project.TEST_STAGING": {
      "database": "DATAVAULT",
      "schema": "STAGING",
      "name": "TEST_STAGING",
      "resource_type": "model",
      "package_name": "my_new_project",
      "path": "example/TEST_STAGING.sql",
      "original_file_path": "models/example/TEST_STAGING.sql",
      "unique_id": "model.my_new_project.TEST_STAGING",
      "fqn": [
        "my_new_project",
        "example",
        "TEST_STAGING"
      ],
      "alias": "TEST_STAGING",
      "checksum": {
        "name": "sha256",
        "checksum": "5e639d9194d65f1c29ca74146cbfaf7b4a375886af6f8b977102c447528d2600"
      },
      "config": {
        "enabled": true,
    …..
"group_map": {},
  "saved_queries": {},
  "semantic_models": {}
}

				
			

Catalog

The catalog.json file contains metadata about the data sources used in the project. The file tracks changes to the data sources and ensures that the data model is consistent with the data sources. 

In other words, it houses information from the data warehouse about the tables and views generated and defined by the resources in your project. Dbt uses the manifest file to fill column sites table statistics and other metadata in the docs site. 

Top-level Keys: metadata, nodes, sources, errors

catalog.json
				
					{
  "metadata": {
    "dbt_schema_version": "https://schemas.getdbt.com/dbt/catalog/v1.json",
    "dbt_version": "1.7.2",
    "generated_at": "2023-12-02T08:04:47.145856Z",
    "invocation_id": "56c6d4fe-dd3b-45f2-3fcfc946744c",
    "env": {
      "DBT_CLOUD_PROJECT_ID": "31615",
      "DBT_CLOUD_RUN_ID": "22349071",
      "DBT_CLOUD_JOB_ID": "46788",
      "DBT_CLOUD_RUN_REASON": "Kicked off from UI by amondal@phdata.io",
      "DBT_CLOUD_RUN_REASON_CATEGORY": "other",
      "DBT_CLOUD_RUN_TRIGGER_CATEGORY": "RUN_REASON_CATEGORY_UI",
      "DBT_CLOUD_ENVIRONMENT_ID": "26643",
      "DBT_CLOUD_ACCOUNT_ID": "1788"
    }
  },
  "nodes": {
    "model.my_new_project.TEST_STAGING": {
      "metadata": {
        "type": "BASE TABLE",
        "schema": "STAGING",
        "name": "TEST_STAGING",
        "database": "DATAVAULT",
        "comment": null,
        "owner": "ACCOUNTADMIN"
      },
      "columns": {
        "DEP_ID": {
          "type": "NUMBER",
          "index": 1,
          "name": "DEP_ID",
          "comment": null
        },
        "DEP_NAME": {
          "type": "TEXT",
          "index": 2,
          "name": "DEP_NAME",
          "comment": null
        },
        "DEP_CODE": {
          "type": "TEXT",
          "index": 3,
          "name": "DEP_CODE",
          "comment": null
        }
      },
      "stats": {
        "bytes": {
          "id": "bytes",
          "label": "Approximate Size",
          "value": 1536,
          "include": true,
          "description": "Approximate size of the table as reported by Snowflake"
        },
        "last_modified": {
          "id": "last_modified",
          "label": "Last Modified",
          "value": "2023-12-02 08:04UTC",
          "include": true,
          "description": "The timestamp for last update/change"
        },
        "row_count": {
          "id": "row_count",
          "label": "Row Count",
          "value": 2,
          "include": true,
          "description": "An approximate count of rows in this table"
        },
        "has_stats": {
          "id": "has_stats",
          "label": "Has Stats?",
          "value": true,
          "include": false,
          "description": "Indicates whether there are statistics for this table"
        }
      },
      "unique_id": "model.my_new_project.TEST_STAGING"
    }
  },
  "sources": {},
  "errors": null
}

				
			

Run Results

The run_results.json file provides details about the successful execution of dbt, such as the duration and status of each node (model, test, etc) that was processed. By analyzing multiple run_results.json files collectively, you can derive insights such as the average runtime of models, the rate of test failures, and the number of record changes captured by snapshots.

The commands that generate a Run result dbt artifact are as follows: build, compile, docs, generate, run, seed, snapshot, test, and run-operation.

Top-level Keys: metadata, args, elapsed_time, results

run_results.json
				
					{
  "metadata": {
    "dbt_schema_version": "https://schemas.getdbt.com/dbt/run-results/v5.json",
    "dbt_version": "1.7.2",
    "generated_at": "2023-12-02T08:04:43.154074Z",
    "invocation_id": "56c6d4fe-dd3b-45f2-3fcfc946744c",
    "env": {
      "DBT_CLOUD_PROJECT_ID": "31618",
      "DBT_CLOUD_RUN_ID": "22349071",
      "DBT_CLOUD_JOB_ID": "46788",
      "DBT_CLOUD_RUN_REASON": "Kicked off from UI by Arnab Mondal",
      "DBT_CLOUD_RUN_REASON_CATEGORY": "other",
      "DBT_CLOUD_RUN_TRIGGER_CATEGORY": "RUN_REASON_CATEGORY_UI",
      "DBT_CLOUD_ENVIRONMENT_ID": "26643",
      "DBT_CLOUD_ACCOUNT_ID": "1788"
    }
  },
  "results": [
    {
      "status": "success",
      "timing": [
        {
          "name": "compile",
          "started_at": "2023-12-02T08:04:43.138443Z",
          "completed_at": "2023-12-02T08:04:43.148128Z"
        },
        {
          "name": "execute",
          "started_at": "2023-12-02T08:04:43.149773Z",
          "completed_at": "2023-12-02T08:04:43.149787Z"
        }
      ],
      "thread_id": "Thread-1",
      "execution_time": 0.014298439025878906,
      "adapter_response": {},
      "message": null,
      "failures": null,
      "unique_id": "model.my_new_project.TEST_STAGING",
      "compiled": true,
      "compiled_code": "select\n    *\n\nfrom DATAVAULT.STAGING.SRC_DEP",
      "relation_name": "DATAVAULT.STAGING.TEST_STAGING"
    }
  ],
  "elapsed_time": 2.059567451477051,
  "args": {
    "log_format": "json",
    "exclude": [],
    "profiles_dir": "https://i0.wp.com/www.phdata.io/tmp/jobs/223490741/.dbt",
    "version_check": true,
    "print": true,
    "use_colors": true,
    "use_colors_file": true,
    "printer_width": 80,
    "profile": "user",
    "strict_mode": false,
    "invocation_command": "dbt --log-format json --debug docs generate --target default --profile user --profiles-dir /tmp/jobs/223490741/.dbt --project-dir /tmp/jobs/223490741/target",
    "log_file_max_bytes": 10485760,
    "static": false,
    "empty_catalog": false,
    "quiet": false,
    "debug": true,
    "partial_parse": true,
    "project_dir": "/tmp/jobs/223490741/target",
    "compile": true,
    "log_path": "/tmp/jobs/223490741/target/logs",
    "vars": {},
    "defer": false,
    "indirect_selection": "eager",
    "write_json": true,
    "log_level": "info",
    "partial_parse_file_diff": true,
    "favor_state": false,
    "send_anonymous_usage_stats": true,
    "cache_selected_only": false,
    "log_level_file": "debug",
    "populate_cache": true,
    "static_parser": true,
    "target": "default",
    "log_format_file": "json",
    "warn_error_options": {
      "include": [],
      "exclude": []
    },
    "which": "generate",
    "macro_debugging": false,
    "introspect": true,
    "select": [],
    "show_resource_report": false,
    "enable_legacy_logger": false
  }
}

				
			

Sources

The sources.json file contains information about the data sources (with freshness checks) used in the project. The file is used to manage the data sources and ensure that they are included in the project. Today, this file is used to power its Source Freshness Visualisation. 

Top-Level Keys: metadata, elapsed_time, results

sources.json
				
					{
  "metadata": {
    "dbt_schema_version": "https://schemas.getdbt.com/dbt/sources/v3.json",
    "dbt_version": "1.7.2",
    "generated_at": "2023-12-02T08:08:03.735417Z",
    "invocation_id": "cf253a9e-b1b5-4b7d-7e766b1815d8",
    "env": {
      "DBT_CLOUD_PROJECT_ID": "31615",
      "DBT_CLOUD_RUN_ID": "22349131",
      "DBT_CLOUD_JOB_ID": "46788",
      "DBT_CLOUD_RUN_REASON": "Kicked off from UI by Arnab Mondal",
      "DBT_CLOUD_RUN_REASON_CATEGORY": "other",
      "DBT_CLOUD_RUN_TRIGGER_CATEGORY": "RUN_REASON_CATEGORY_UI",
      "DBT_CLOUD_ENVIRONMENT_ID": "26643",
      "DBT_CLOUD_ACCOUNT_ID": "1788"
    }
  },
  "results": [],
  "elapsed_time": 0
}

				
			

Best Practices to Keep in Mind

When you are applying the dbt artifacts using packages such as the dbt_artifacts package or Elementary dbt Package, here are a few practices and tips to keep in mind: 

  • Always be consistent about the naming and structure of dbt models. This ensures the generation of reliable artifacts. 

  • For best optimization, use the run_results.json artifact to identify and deal with slow-running models. 

  • Consistently share your models with team members to ensure uniformity in the data modeling process. 

Conclusion

That dbt artifacts are a literal blessing in data modeling is undeniable. It provides deep insights into the dbt projects. By analyzing them with efficiency, it is possible for businesses to optimize their data transformation procedure and ensure quality-assured data ideally. 

It is certainly possible to manage everything on your own, but why not have the assistance of an experienced team close at hand? Contact us today to benefit from the expertise of our team. 

If your organization is looking to succeed with dbt, phData would love to help!

As dbt’s 2023 Partner of the Year, our experts will ensure your dbt instance becomes a powerful transformation tool for your organization.

FAQs

The structure of dbt artifacts follows JSON schemas hosted at schemas.getdbt.com. Each artifact has its own version number, which may change in any minor version of dbt (v1.x.0).

dbt packages are essentially self-contained dbt projects that contain models and macros designed to solve a specific problem. As a dbt user, when you add a package to your project, the models and macros in the package become a part of your project. This implies that when you run dbt, the models in the package will be materialized as well.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit