July 11, 2023

How to use Variables in Matillion

By Adam Vigneau

In this blog, we’ll discuss how Matillion’s variables can help ease the management of metadata configurations across components, jobs, and environments. With the help of variables, we can simplify the management of projects that can quickly become cumbersome to manage in Matillion.

What is Matillion ETL?

Matillion ETL is a platform designed to help you speed up your data pipeline development by connecting it to many different data sources, enabling teams to rapidly integrate and build sophisticated data transformations in a cloud environment with a very intuitive low-code/no-code GUI.

Why Do Variables Matter in Matillion?

Variables can be used in various parameters and expressions, allowing users to pass and centralize environment-specific configurations. This allows code in Matillion to be consistent and repeatable. 

There are two primary types of variables in Matillion. They can be scalar values configured by environment and job variables. They can also take on a 2D-Array form used in Grid Variables.

Variables in Matillion are name-value pairs that can pass configurations across jobs and environments. They are also accessible through Python and Bash scripts, which can be substituted at run time. 

There are four main variable types available for use in Matillion. They are as follows: 

  • Grid Variables

  • Job Variables

  • Environment Variables

  • Automatic Variables

Job Variables

As you might expect, job variables are defined within an individual job. They will also override environment variables of the same name within that specific job. They will be included when jobs are imported and exported.

Creating Job Variables

We start by navigating the job variables window to configure a job variable. We can achieve this in two ways. First, we can navigate to the menu by right-clicking a job, clicking Variables, and clicking Manage Job Variables.

Alternatively, you can click into a job, right-click anywhere within the job’s design grid, and navigate to the Manage Job Variables option on the menu.

Job Variables Menu

Within the Job Variables Menu is some configurations that must be defined. They are as follows:

Environment Variables

Environment Variables are defined across a given environment. Key-value (or values) pairs can act as job variables passed to other jobs. To navigate the menu, click on the Project icon in the top left corner, then Manage Environment Variables in the Environment Variables Menu. Like the Job Variables Menu, some configurations must be filled in.

What makes the Environment Variables menu unique is that there are columns that represent the different environments available in a given project. 

For each of these columns, an additional default value can be set. This can be particularly useful if you have Matillion jobs consistent across environments, such as Development, QA, and Production. 

Rather than having to hard code each individual variable in each job, you can set them per environment all in one place and then call the variables in the job consistently. This helps maintain consistency in your jobs.

Automatic Variables

Another beneficial aspect of Environment Variables is the preset Automatic Variables. These are pre-defined in Matillion, and their values cannot be changed.

Variable

Type

Description

project_group_name

Text

Name of the current project group. It can be set via Rename Project Group from the Project menu.

project_group_id

Numeric

Internal ID of the current project group.

project_name

Text

Name of the current project. It can be set via Manage Project from the Project menu.

project_id

Numeric

Internal ID of the current project.

version_name

Text

Name of the current version. Versions can be renamed via Rename Version from the Project menu unless locked. Read Version Control for more information.

version_id

Numeric

The internal ID of the current version.

environment_name

Text

Name of the current environment. This can be set by right-clicking the environment in the Environments panel and selecting Edit Environment. Read Environments for more information.

environment_default_schema

Text

The name of the default schema for the current environment.

environment_database

Text

The name of the database for the current environment.

environment_endpoint

Text

URL of the Matillion ETL instance with the current environment.

environment_id

Numeric

The internal ID of the current environment.

environment_port

Numeric

The port number of the current environment.

environment_username

Text

Username for the environment connection.

job_name

Text

The name of the current job. This can be set by right-clicking the job in the Project panel and selecting Manage Job.

job_id

Numeric

The internal ID of the current job. All jobs have a unique ID that can be used to refer to it within a project. Note that this is not the ID of a particular run of a job.

component_name

Text

The name of the current component, as defined by the user. Components can be renamed by selecting them and editing the Name property.

component_id

Numeric

The internal ID of a given component in Matillion ETL.

component_message

Text

An error message returned by a component can be used in job error handling.

run_history_id

Numeric

The ID of a task in Matillion ETL. These can also be viewed via Task History.

detailed_error

Text

A detailed error description that contains the job_name, the component_name, and the component_message.

task_id

Numeric

The task ID is associated with the current running job.

Grid Variables

Grid Variables are a special type of job variable. They are 2D arrays that hold scalar values in named columns. They can be used in many different components, mainly using the Use Grid Variable checkbox in the component property menu.

How to set up a Grid Variable

1. In a transformation job, right-click and select Manage Grid Variables on the dropdown menu.

2. This is the Grid Variables management section.  Click the + sign to set up a new grid variable.

When the Use Grid Variable checkbox is selected, you are moved to a menu that displays the configuration settings that need to be filled out.

Populating Grid Variables

There are a couple of ways to populate the values for a grid variable. First, we can set default values like other variable types, like the example below.

Using the Append to Grid job component, we can also populate data into a grid variable. This component can either populate a grid variable with fixed inputs or grab data from another grid variable.

The following are the settings for this component.

NameName of the Component
Append/PrependDetermining whether to append or prepend data to the grid
Target Grid VariableName of the grid variable being populated
Fixed / GridChoose between fixed inputs or populating from a different grid variable
Values Grid
(for Fixed input)
Fixed user inputs that will populate a grid variable
Source Grid Variable
(for Grid Input)
Source Grid Variable if Grid is chosen
Column Mapping
(for Grid Input)
Mapping columns from source to target grid variable

Values Grid Example:

Column Mapping Example:

Variable Behavior

It’s essential to understand how different types of variables behave in Matillion. Variable information can be passed onto other components or jobs in some cases. 

Behavior can be broken down into two different sets of categories. The first is shared and copied. The second is job and environment.

Shared vs. Copied Variables

Copied variables can be updated within individual branches of a job and can do so without affecting other branches. A branch using a copied variable will begin by taking its default value, even if this variable has been updated in another parallel branch. 

Shared variables, in contrast, are updated across all branches of a job. Think of this like local and global variables in other systems.

To demonstrate how this behavior works, let’s use a simple example. 

For this example, we’ve created two simple variables, phData_copy_example and phData_shared_example, and initialized them with the values 2 and 1.

We’ve then created a simple fixed-flow population, using the values from these variables to populate a simple table. For this example, the results are truncated each run.

The syntax in which we call variables is ${variable_name}. Below is an example of calling the two variables we established and using them in our fixed flow.

If we run this job locally, we can see the two columns populated with the above values.

To demonstrate how the shared and copied variables are affected, we’ll create a simple job that runs the fixed flow with an iterator that updates the variables. We’ll first do this.

We’ll update the copy and shared values in this iteration to 4 and 3.

After running the job, we see the table populated with the new values 4 and 3.

Now, let’s add a second iteration of the run, but this time, without its own iterator that updates the variable values.

When we check the table’s contents, we see that the shared value is still 3, and the copied value is 1. This is because the shared variable from our iterator was updated for both components, which means the shared value was populated with a 3 twice. 

Conversely, the copied variable was only updated in the first iteration of the job. So, when the job ran a second time, the Copied column populated with the original default value of 1.

This is an odd example, but it shows the difference between copied and shared variables.

Environment vs. Job Variables

Environment and job variables have a similar parallel to global and local variables.  Environment variables are global. They can be used across Matillion ETL across many components and jobs. 

Job variables are local. They are defined within the scope of a job. Importantly, they also override environment variables of the same name.

To demonstrate this, we’ll create a simple variable at the environment level with the default value environment.

We’ll then create a simple data flow that populates a table with the variable value.

When we run the job, we see that the Variable_Type column is populated with environment.

Now, let’s make a job variable with the same name and give it a different default value of job.

When we look at the new output, we see that the variable_type column is now populated with job. This is because the job variable took precedence over the environment variable, similar to how a local variable overrides a global variable in other programming languages.

Closing

Within this blog, we learned about the different variable types available in Matillion and saw examples of how they work. They can be used in various components and jobs to pass data along and make job development more consistent and efficient for your organization.

Are you looking for more Matillion assistance? phData excels at assisting organizations in achieving success with Matillion. Contact us today for help, advice, and actionable strategies.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit