May 12, 2023

How to Create Loops in Matillion

By Michael Gohn

Loops are one of the most powerful features of any programming language and can make any data solution much more scalable. Since loops have this much power, it becomes imperative that you understand the ins and outs of each kind of loop as well as when and how to use them. With this knowledge, you can start to get the most out of your Matillion ETL instance.

In this blog, we will guide you through the concept of loops and provide you with a step-by-step process for creating them in Matillion to improve your workflow.

What is Matillion ETL?

Matillion ETL is an ETL (or, more specifically, ELT) tool made for cloud database platforms such as the Snowflake Data Cloud. Matillion accomplishes this via a drag-and-drop user interface where components are placed on a canvas and connected together to form jobs.

Each component corresponds to a specific function and is connected to tell Matillion the order in which to run each component.

Matillion breaks down their jobs into two types: orchestration jobs and transformation jobs.

As the name suggests, transformation jobs transform the data already loaded into tables (generally via DML operations).

Orchestration jobs, contrarily, typically pertain to managing database objects and loading data from external sources (generally via DDL operations).

What is a Loop in Matillion?

Within this broader context, loops (called “iterators” in Matillion) are simply one type of component used in orchestration jobs. Functioning similarly to loops in programming, iterator components directly connect to one component, thereby running that component multiple times.

Matillion offers five distinct iterators that can be used for different situations: file, fixed, grid, loop, and table iterators. We will discuss the different types of iterators later.

Why use Iterators in Matillion?

Allow me to paint you a picture. Imagine there is a component that you need to run on data for each of the past 20 years.

With an iterator, you can bring in the component once and loop through it 20 times, once for each year, using a variable. Without the iterator, 20 nearly identical components would have to be brought in and configured once each year to achieve this business requirement.

Although this scenario is small in scale, if the scale grows, it becomes very apparent very quickly that an iterator can save lots of time and effort. It is easy to imagine situations where an iterator would be run not just 20 times but hundreds or thousands of times or even more.

Hence, you should use an iterator whenever you need to run through a component multiple times with different inputs.

Using Iterators in Matillion

Iterators can be found very easily in Matillion, simply search “iterator” in the components pane in the bottom left corner of the user interface, and all five iterators will appear. Once you decide to use an iterator, the first question is, “Which iterator to use?” To answer this, let’s explore each iterator in some detail.

File Iterator

The file iterator is dedicated to looping through files in specific locations: Azure Blob Storage, Cloud Storage, FTP, HDFS, S3, SFTP, or Windows Fileshare. 

After inputting the file system, the configuration finishes with the location of the desired files within the file system. The iterator will then loop through and return the desired information on each file in the configured location. 

The iterator can return the base folder of the file system, the subfolder where each file is located, the filename itself, and the file’s last modified date. Environment variables are used to pass this information to the component attached to the iterator.

Fixed Iterator

A fixed iterator is more straightforward than a file iterator. Like the file iterator, environment variables pass values from the iterator to the connected component. You manually entered the values to be iterated through and passed into the connected component.

Grid Iterator

Grid iterators rely on and loop through grid variables. One or multiple columns in the grid variable are passed into the attached component via environment variables. One row from the selected columns will be run for each iteration.

Loop Iterator

The loop iterator functions very similarly to a for loop in programming. A loop iterator passes on a sequence of numbers defined in the iterator’s configuration. 

The configuration requires a starting, ending, and iteration number. Hence the sequence of numbers will always have a constant difference from one number to the next.

Table Iterator

The table iterator loops over the rows in a preexisting table. The iterator is configured to select the table, the desired columns from that table, and the variables that each column points to. Those variables are then passed into the connected component, one row per iteration.

Additional Iterator Properties

Three additional properties of iterators make iterators even more powerful and can simplify Matillion’s jobs even more.

Number of Iterators

In any individual Matillion job, any number of iterators can be used, as there is no limit to the number of iterators used in a job. This once again frees up iterators to be used in a great number of use cases, as an iterator can be used anywhere in a job, even if there are already other iterators in the job. This is especially important as each iterator is limited to 5000 iterations per run. Thus, if you need to iterate more than 5000 times, you can set up 2 or more iterators next to each other in order to run the iteration however many times is necessary.

Stacking Iterators Together

In Matillion, iterators can be stacked on top of each other, allowing you to iterate through each value in the lower iterator once for each value in the upper iterator. 

An example will help demonstrate why this is useful. Imagine that a component must be run every quarter of every year between 1980 and 2022. If only one iterator was to be used, a fixed iterator would work best, and 172 entries would have to be manually entered from the first quarter of 1980 through the fourth quarter of 2022.

Contrarily, a loop iterator can loop through the years, connected to a fixed iterator that loops through all four quarters. 

This second solution is much quicker and more efficient to configure, but it is also easier to alter the iterators if necessary.

Say the business requirements change and the iterator changes have to run each month instead of quarter for the same years. 

Using only the single fixed iterator, all previously entered 172 entries are now useless, and a new iterator must be created with 516 manual entries for each month from January 1980 to December 2022. 

However, with two iterators, this change is much simpler to handle. The loop iterator, firstly, would not change; then, the fixed iterator would drop the four manually entered quarters and manually add the 12 months, which is a quicker transition than dropping 172 entries and adding 516.

Also important to mention is that nested iterators should not necessarily be your first option in every scenario. Nested iterators open up a host of new possibilities in Matillion, however, they can be slow and inefficient, making them less optimized than other solutions and so should be evaluated on a case-by-case basis.

Iterating Through an Entire Job

One helpful feature in Matillion is that each job (either transformation or orchestration) can be used as a component in an orchestration job (this cannot be done in a transformation job). 

Building on our example above, what if, instead of needing to iterate through a single component, running two or more components on the data for each quarter or month in your data set would be more beneficial? 

Matillion iterators can accomplish this business requirement easily as these components can be placed into their own orchestration or transformation job (whichever is appropriate). Then this job can be brought into the orchestration job with the iterator to be iterated through, thereby allowing the iterator to be applied to the entire job simultaneously

Closing

Iterators are flexible and powerful components that perform looping in Matillion. Like loops in programming, iterators come in different flavors and help reduce repetitions in Matillion jobs, making them very scalable. Iterators can be configured quickly and easily, helping you get the most out of your Matillion ETL instance.

Are you looking for more Matillion assistance? phData excels at assisting organizations in achieving success with Matillion. 

Contact us today for any Matillion help, advice, and actionable strategies.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit