December 13, 2023

How to Use a dbt Package in Your Project

By Arnab Mondal

Developers often need to segment code and place it into libraries in software development. The advantages of such an approach lie in a multi-line area. It allows for a more focused grouping of cases that align with specific business needs. 

When working on a shared code base with multiple team members, they can search the codes created and perfected for specific use cases. This allows them to leverage it to fulfill a portion of their requirement or gain insight into how they can develop for a similar requirement.

In this blog, we will discuss dbt packages, when you should use a package, and how to use them in your project.

What Are dbt Packages?

dbt libraries exist for the same reason, known as ‘Packages.’ The reason why dbt packages are so popular is because of the easy access to frequently encountered business use case solutions. Some examples of important business use cases are as follows: 

  • Transformation: You can use dbt packages to easily transform data from various SaaS sources like Segment pageviews or Snowplow into session data. You can also transform Facebook Ads or AdWords spend data into a consistent format and keep the data segregated.

  • Reusable Code: You are writing dbt macros that can be reused, and hence people do not need to reinvent the wheel in the future. You can generate SQL code to unite two relations and create surrogate keys or pivot columns.

  • Tool Integration: You can have prebuilt code to handle all the interactions with external tools like Redshift privileges or macros, which can work with the data loaded by Stitch.

Why Should You Use dbt Packages?

dbt packages can be stated as standalone dbt projects where the models and macros solve a particularly problematic area, and then other people can use that in their code. This essentially means that: 

  • All the models in the package will be materialized when you use the command dbt run.

  • You can use the ref command in your models to refer to the various models in your package.

  • All the macros in the package are now available in your project.

Note: Installing and defining dbt packages is different than defining and installing python packages.

Use Cases

From dbt v1.6 onwards, dependencies.yml has replaced the packages.yml file. The main benefit of using dependencies.yml is that you can use both package and project dependencies. 

  • Package Dependency: It lets you add the whole source code of someone else’s dbt project, also known as a library, into your project.

  • Project Dependency: You can create or build on top of someone else’s source code or work on a dbt project.

When to Use Project Dependencies

Project Dependencies are mainly used with cross-project reference workflow and dbt Mesh workflows.

  • Use it to set up a cross-project reference between various dbt projects, like in a dbt Mesh setup.

  • Use it when you want to include both projects and non-private dbt packages. Private packages are currently not supported to ensure compatibility and prevent configuration issues.

  • Use it to specify the dbt Hub packages, for example, dbt_utils. 

When to Use Package Dependencies

Package dependencies will allow you to add a source code from someone else’s dbt project into your project.

  • Use it to download various dbt packages into your own dbt project.

  • Use it to reference a private package.

  • Use it when you need support for Jinja rendering and dynamic configurations. For example, when you need to insert an Environment Token into your package specifications.

How do I Add a Package to my Project?

The following steps show how to add specific or multiple packages to your dbt project.

  1. Add a YAML file named packages.yml/dependencies.yml to your dbt project. This file should be at the same level as your dbt_project.yml file.

  2. As shown below, add the package(s) with the proper syntax.

				
					 packages:
  - package: dbt-labs/snowplow
    version: 0.7.0

  - git: "https://github.com/dbt-labs/dbt-utils.git"
    revision: 0.9.2

  - local: /opt/dbt/redshift

				
			
  1. Run dbt deps to install the packages. The default directory to install packages is the dbt_packages.

How do I Specify a Specific Package For my Project?

There are multiple ways to reference a package, and it depends on where the package is stored or which package you want to reference:

Hub Packages

dbt Hub is a registry that contains numerous packages, and it is recommended to install from there so that duplicate packages can be avoided.

Example
				
					packages:
  - package: dbt-labs/snowplow
    version: 0.7.3 # version number
				
			

Git Packages

You can use the packages that are stored in a Git Server, and you can use either a branch, tagged release, or a specific commit (By providing the complete 40-character hash).

Example
				
					packages:
  - git: "https://github.com/dbt-labs/dbt-utils.git" # git URL
    Revision: 0.9.2 # tag or branch name

				
			
Example 2
				
					packages:
  - git: "https://github.com/dbt-labs/dbt-utils.git"
    revision: 4e28d6da126e2940d17f697de783a717f2503188

				
			

Internally Hosted Tarball URL

If your organization has security restrictions and you can only pull packages from internal services, then to access hosted environments like Artifactory or various cloud storage buckets, you can install the packages from the internally hosted tarball URLs.

Example
				
					packages:
  - tarball: https://codeload.github.com/dbt-labs/dbt-utils/tar.gz/0.9.6
    name: 'dbt_utils'

				
			

Note: Here, the name dbt_utils specifies the particular subfolder of dbt_packages, which was created for the package source code to be installed within.

Private Packages

To access a private package, there are multiple available methods, such as: 

SSH Key Method

This method is only available via the command line. You do not need to provide your username and password; you need to generate an SSH key and add them to the git provider. You can find some examples like GitHub or GitLab.

Example
				
					packages:
  - git: "git@github.com:dbt-labs/dbt-utils.git" # git SSH URL

				
			

Note: The SSK key method does not work for dbt Cloud, but you can use an HTTPS Git Token Method.

Git Token Method

You can clone via HTTPS if you pass the git token via an environment variable.

Github
				
					packages:
  # use this format when accessing your repository via a github application token
  - git: "https://{{env_var('DBT_ENV_SECRET_GIT_CREDENTIAL')}}@github.com/dbt-labs/awesome_repo.git" # git HTTPS URL

  # use this format when accessing your repository via a classical personal access token
  - git: "https://{{env_var('DBT_ENV_SECRET_GIT_CREDENTIAL')}}@github.com/dbt-labs/awesome_repo.git" # git HTTPS URL
 
   # use this format when accessing your repository via a fine-grained personal access token (username sometimes required)
  - git: "https://GITHUB_USERNAME:{{env_var('DBT_ENV_SECRET_GIT_CREDENTIAL')}}@github.com/dbt-labs/awesome_repo.git" # git HTTPS URL

				
			
Gitlab
				
					packages:
  - git: "https://{{env_var('DBT_USER_NAME')}}:{{env_var('DBT_ENV_SECRET_DEPLOY_TOKEN')}}@gitlab.example.com/dbt-labs/awesome_project.git" # git HTTPS URL

				
			
Azure DevOps
				
					packages:
  - git: "https://{{env_var('DBT_ENV_SECRET_PERSONAL_ACCESS_TOKEN')}}@dev.azure.com/dbt-labs/awesome_project/_git/awesome_repo" # git HTTPS URL

				
			
Bitbucket
				
					packages:
  - git: "https://{{env_var('DBT_USER_NAME')}}:{{env_var('DBT_ENV_SECRET_PERSONAL_ACCESS_TOKEN')}}@bitbucketserver.com/scm/awesome_project/awesome_repo.git" # for Bitbucket Server

				
			

Local Packages

A local package is a particular dbt project which can be accessed from the local file system. It can be installed by specifying the path of the project. The best-case scenario is when you nest the project within a subdirectory relative to the current project’s directory.

Example
				
					packages:
  - local: relative/path/to/subdirectory

				
			
Example 2
				
					packages:
  # not recommended - support for these patterns vary
  - local: /../../redshift   # relative path to a parent directory
  - local: /opt/dbt/redshift # absolute path on the system

				
			

Local packages are preferred when either: 

  • Monorepo: When multiple projects are nested in a subdirectory. Inside a mono repo, the local packages allow you to combine projects and deploy and develop in a coordinated manner.

  • Testing Changes: If you want to test changes in a project or package in context with a downstream package or project that uses it. If you switch the installation to a local package, you can quickly change the upstream and immediately test it in the downstream project or package.

  • Nested Project: When you have a nested project that defines the fixtures and tests for a specific project for utility macros.

Conclusion

Optimize your project and eliminate unnecessary work by leveraging dbt packages. For additional information, reach out to our team of experts, who can guide you in the effective usage of dbt packages.

FAQs

A dbt model is how you want to create a table or view in your data model. You can use the SQL Select statement to write a model.

A dbt python model is a model that uses the Python language and is defined using the .py extension. They can be used to integrate Python and dbt together, like SQL syntax in .sql files.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit