March 20, 2024

How to Ace dbt with Jinja

By Ajith Kumar Rajasekaran

dbt’s SQL-based approach democratizes data transformation. However, python and other programming languages edge out SQL with its metaprogramming capabilities. dbt’s Jinja integration bridges the gap between the expressiveness of Python and the familiarity of SQL.

Jinja’s usage will significantly empower you to build dynamic and reusable data pipelines, especially when dealing with conditional logic and templatization within dbt.

In this blog, we will extract the essence of Jinja in dbt.

What is Jinja?

Jinja is a powerful and versatile templating engine. Jinja acts as embedded mini-programs, boosting code dynamism and efficiency. Think of it as a flexible tool that separates the static layout of your code from the dynamic data that fills it. Instead of hardcoding, you can use Jinja’s expressions to integrate data seamlessly.

Pairing Jinja with dbt

This section will dive deep into the powerful synergy between Jinja and dbt.

We are going to explore the following topics in detail:

  • Parameterization

  • Decision-making constructs

  • Iteration

  • Dynamic code building blocks

  • Methods

Parameterization: Allowing Models to Adapt Based on Input Values

Jinja’s variables store frequently used values, environment-specific values, and table prefixes in your dbt_project.yml file. The variable can be assigned as strings, lists, and dictionaries.

				
					{% set schema = var("my_schema") %} 
--var keyword will fetch varible from dbt_project.yml file

{% set states = ['Victoria', 'Queensland', 'Tasmania'] %}

{% set countries = {"state_countries": ['India','United States of America'] ,
                   "province_countries": ['China','Canada']  }
%}

				
			

Jinja provides default variables that will contain predefined values based on the profile. 

A few commonly used default variables are:

Variable Description Variable types Value
ref Reference one model with another ref('model_name') latest model version
ref('model_name', version=2) Custom model version
target Contains connection information target.profile_name active profile
target.name active target
target.schema dbt schema name
target.type active adapter
target.threads number of threads
this Database representation of the current model this fully qualified table name
this.database database name
this.schema schema name

The variables can also be passed by using the --vars command line option as a key-value pair. The variables defined with the --vars command line argument have the highest order of precedence.

				
					dbt build -m model_name --vars '{"schema": "prod"}'
				
			

Decision-making Constructs: The Ability to Make Logic-based Choices

Jinja’s if statements help the models to adapt to the circumstances. Tailor the data transformations depending on dynamic conditions or even build different pipelines for different environments.

				
					{% if target.name == 'prod' %}
    {% if country in state_countries %}
        SELECT
                state_name, district_name, zip_code
        from  {{  var("my_schema") }}.country
    {% elif country in province_countries %}
            SELECT
                province_name, county_name, postal_code
            from  {{  var("my_schema") }}.country
    {% endif %}
{% else %}
   select * from {{  var("my_schema") }}.country
{% endif %}

				
			

Jinja is loaded with various operators to support the conditional statements.

Logic OperatorsComparison Operators
  • AND
  • OR
  • NOT
  • Equal to  (==)
  • Not equal to (!= or <>)
  • Greater than (>)
  • Less than (<)
  • Greater than or equal to (>=)
  • Less than or equal to (<=)

 

Iteration: Repeated Execution of a set of Instructions

Jinja’s for loop provides the ability to automate repetitive tasks, eliminating boilerplate code and streamlining your data transformations. It lets you iterate over lists and dictionaries to dynamically construct SQL statements on the fly, for loops improve the code readability immensely. The snippet below shows the use of a for loop by iterating over a dictionary and list.

				
					{% set countries = {"state_countries": ['India','United_States_of_America'] ,
       "province_countries": ['China','Canada']  }
%}

{%  for  key, value in  countries.items() %}
    {% if key == 'state_countries' %}
        {% for state in value %}
            SELECT 
                state_name, county_name, age_group, 
                access_to_electricity, access_to_technology
                FROM {{  var("my_schema") }}.{{ state }}_stats
            {% if not loop.last %} union all {% endif %}
        {% endfor %}
    {% endif %}
{% endfor %}

				
			

Jinja provides multiple looping properties to unlock more use cases.

PropertyData TypeDescription
loop.firstBooleanReturns True during the first iteration
loop.lastBooleanReturns True in the previous iteration
loop.indexINTReturns the loop iteration number. Start with 1.

Dynamic Code Building Blocks: Ability to Create Dynamic Code

Jinja’s macros are predefined blocks of Jinja code that you can reuse throughout your dbt models. 

Major benefits of macros:

  • Repetitive Code Reduction: Eliminates the need to copy and paste the same logic across different models.

  • Improve Code Maintainability: Updating common transformations in one place.

  • Increase Code Modularity: This breaks down complex logic into smaller reusable units, leading to more organized code.

				
					-- Add the code to a .sql on the macros folder 

{% macro audit_columns() %}
   current_timestamp as load_timestamp,
   current_date as batch_date
{% endmacro %}

				
			

The macro can be used in other models and macros.

				
					select *
 , {{ audit_columns() }} 
FROM {{  var("my_schema") }}.country

				
			

Remember, Jinja compiles the code to plain SQL. Use the dbt compile command to check the compiled code. Ensure that the syntax and logic are correct.

				
					--compiled query
select *
 , current_timestamp as load_timestamp,
   current_date as batch_date
FROM continent.country
				
			

The macro can be executed independently as well with the dbt run-operation command.

				
					{% macro create_new_schema(schema) %}
    create schema if not exists {{ schema} }; 
{% endmacro %}

				
			

dbt run-operation create_new_schema --args '{"schema": "new_schema"}' command will execute the macro.

Methods: Transformation and Formatting Utilities

Jinja streamlines your data transformations by offering built-in methods for handling both text and numbers.

MethodSyntaxOutput/ Description 
Split"dbt transformation".split(' ')["dbt," "transformation"]
Replace"dbt" | replace("d", "t")"dbt"
Trim"dbt   " | trim"dbt"
Length"dbt" | length3
Title"dbt transforms data" | capitalize"Dbt  transforms data"
Capitalize"dbt" | capitalize"Dbt"
Upper"dbt" | upper"DBT"
Lower"DBT" | lower"dbt"
Printprint("dbt is an open source tool")dbt is an open-source tool
String11 | string"11"
Int"11" | int11
Float11 | float11.0
Round11.123 | round(1)11.1
Command block{# comment line #}Codes can be documented using the command line block

Statement

block

{% set dbt = "jinja" %}Jinja functions will be defined inside a statement block
Template block{{ ref("model") }}Methods and variables need to be passed inside a template block
Whitespace removal{%- set dbt = "jinja" -%}- is used for removing white spaces. This will make the compiled code look cleaner.

Conclusion

Jinja offers a dynamic toolkit that enhances your dbt models and elevates our data-wrangling skills. We can unlock valuable insights and drive data-driven decisions efficiently by wielding Jinja’s power.

If you have any further questions regarding how to ace dbt with Jinja, reach out to our team of dbt experts!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit