Modern business operations rely heavily on data engineering and transformation processes to turn raw data into valuable insights.
After completing a recent project, I’ve seen firsthand how tools like Matillion can significantly help organizations wrangle their data, making it more accessible and useful for business analysts and teams. Matillion, a robust ELT (Extract, Load, Transform) platform, simplifies data integration and transformation complexities with a no-code or high-code experience. One feature that caught my attention is Matillion’s ability to create and share jobs (also called pipelines), fostering better collaboration and boosting efficiency while developing.Â
In this blog, I’ll guide you through creating and sharing a Matillion job. We’ll leverage Matillion’s Data Productivity Cloud—designed for cloud data platforms like Snowflake, Databricks, and AWS Redshift—to build an example shared job.
What is a Matillion Job?
A Matillion job, also known as a pipeline in Matillion DPC, is a workflow that defines how data should be extracted from various sources, transformed according to specified rules, and loaded into a target destination, such as a data warehouse. Matillion jobs can range from simple data transfers to complex transformations involving multiple steps and logic.
Types of Matillion Jobs
Orchestration Jobs: These jobs control the overall ELT process, managing the sequence and dependencies of various transformation steps.
Transformation jobs: focus on transforming the data, applying business logic, and preparing it for loading into the target system.
Understanding Shared Jobs
Shared jobs in Matillion are reusable components or jobs that can be called and executed within other jobs. They are designed to promote code reuse, streamline development, and enhance collaboration among team members.Â
It is important to note that when a job is executed through the Run Orchestration or Run Transformation components, all the jobs involved will be included in the shared job. This implies that orchestration jobs and the transformation jobs they are associated with can be shared in one action.
Benefits of Using Shared Jobs
Code Reusability: Shared jobs enable you to reuse ELT logic across multiple projects or jobs. This saves time and effort by reducing the need to recreate standard processes.
Use Case: Data Validation in Retail
In a retail environment, data from multiple stores or sales channels must be validated for consistency. Instead of building separate validation logic for each data source, a shared job can handle data validation across all sources. This ensures consistent validation rules and reduces development time.
Consistency:Â When using shared jobs, any modifications made to the shared logic are reflected across all jobs that use it. This provides consistency and reduces the risk of errors.
Use Case: Standardizing Date Formats in Finance
In a financial services company, multiple teams run different ELT jobs that deal with date and time data from various regions. A shared job can standardize date formats across all ELT jobs, ensuring that reports generated across departments follow the same format and eliminating discrepancies.
Collaboration: Shared jobs enable teams to work together. Multiple team members can use and modify the shared jobs, promoting teamwork while reducing the risk of duplicating efforts.Â
Use Case: Transforming Data in Marketing
The marketing and data engineering teams are collaborating on a customer data project. The marketing team wants to contribute to the development of the overall process by adding complex business logic for their upcoming product release. The engineering team can establish the general structure of the joint project and insert notes within the workflow to indicate where the marketing team can update their logic.
Creating a Matillion Job
In this blog, we will focus solely on the essential steps for creating and sharing a Matillion job. We will not cover setting up environments, variables, or navigation within Matillion DPC. For more information on these topics, please refer to the resources linked throughout this blog.
Logging into Matillion
Ensure necessary permissions to create and share jobs.
Matillion offers a 14-day free trial; sign up today!
Creating a Matillion Job/Pipeline
-
In the user interface, click the Pipelines tab.
-
Click Add at the top right of the Pipelines tab, and select either Orchestration pipeline or Transformation pipeline. In our example, we will create the Transformation pipeline first and then the Orchestration pipeline.
Choose a unique name for the Transformation pipeline.Â
We intend to create a pipeline called
dmn_date
to generate a sequence of dates for potential date scaffolding and data visualization techniques for business users and specialists.
Click Add. The new pipeline will open a new tab on the canvas.
Click the Generate Sequence component on the design canvas to start the job.
Update the component settings to establish a Start Value, Increment Value, End Value, and Output Column name.
Use the following blog to understand how to use variables in Matillion.
Click and drag the Calculator component to the design canvas.
Connect to the previous component and update the component settings by creating the following calculations.
--starting_date--
CAST('${job_start_date}' as date)
--calendar_date--
"starting_date"+"sequence"
--calendar_year--
DATE_PART(year,"calendar_date")
--calendar_month--
DATE_PART(mm,"calendar_date")
--calendar_week--
DATE_PART(week,"calendar_date")
--calendar_day--
DATE_PART(day,"calendar_date")
Click and drag another Calculator component to the design canvas.
Connect to the previous component and update the component settings by creating the following calculations.Â
--relative_year--
DATEDIFF(years, CURRENT_DATE, "calendar_date")
--relative_month--
DATEDIFF(months, CURRENT_DATE, "calendar_date")
--relative_day--
DATEDIFF(days, CURRENT_DATE, "calendar_date")
Drag the Map Values component to the design canvas and map values for
calendar_month
. Update the values from digits (1, 2, 3) to month names (January, February, March).Update the component settings to Use Grid Variables.Â
The grid variables must be created before using the Map Value component. Refer to the following blog to understand how to use variables in Matillion.
Identify the Column Mapping Fields and click Save.
Name the Output Column to
month_name
.Other can be set to null, indicating that anything not defined in the Grid Variable will be set to null.
Here’s how your Grid variable should look. By the way, with Matillion, you can easily switch to Text Mode to add your Map Values. You can also create the map values using Matillion’s UI.Â
Click and drag the Window Calculation component to the design canvas. In this component, we will create a new field called
days_in_month
, essentially counting the number of days in a month.Connect to the previous component and update the component settings below.
Partition Data – calendar_year
, calendar_month
Ordering within Partition – calendar_date
, ASC
Functions –Â Count, calendar_date
, days_in_month
Lower Bound – Autofill (unbounded preceding)
Upper Bound – Autofill (unbounded following)
Wrapping up, let’s review and refine our final output by removing unnecessary fields and ensuring field names and data types are designated appropriately. The final output should look something like the sample below.
dmn_date
Transformation pipeline:
dmn_date
Transformation sample:
Lastly, drag the Rewrite Table component to the design canvas.
Connect and update the component settings to establish a Warehouse, Database, Schema, and Target Table.Â
* Remember to use pipeline variables to set parameters instead of hard coding values or relying on the Environment Default. This makes the parameters more transparent and more customizable for the person or system using the pipeline. It also simplifies the process of understanding, modifying, and maintaining the pipeline configuration across different environments or use cases.
Now, let’s create the Orchestration pipeline. Click Add at the top right of the Pipelines tab, and select Orchestration pipeline.
Name the Orchestration pipeline,
dmn_date_orchestration
.
Click and drag the Run Transformation component to the design canvas.
Connect and update the component settings to establish the newly created
dmn_date
transformation pipeline.Set scalar variables. These are the same variables used in the component setting for the Rewrite Table component in the
dmn_date
Transformation pipeline. They can be customized for those using the shared pipeline. In this example, we use variables to update the schema and table name.Click Validate on the design canvas to confirm your orchestration can run successfully.
We are now ready to share our Matillion pipelines!
Sharing a Matillion Job
To start sharing a pipeline, click the “…” next to the pipeline you want to share and then click Share.
* Alternatively, you can export the job/pipeline and share it with the Matillion Community or others outside your team via a YAML file. When sharing externally, avoid sharing sensitive company data.
** In our example, the dmn_date_orchestration
and dmn_date
transformation jobs are not published. Therefore, you may encounter a message on the shared pipeline pop-up indicating, “The pipeline must be published to become accessible in other projects.“
To stop sharing a pipeline, click the “…” next to the pipeline you want to stop sharing and then click Unshare.
Best Practices Tips for Matillion Jobs
Avoid hardcoding default settings into shared pipeline component parameters. Use pipeline variables instead.
Don’t use the same name for a transformation and an orchestration pipeline in the same folder.
Set variable visibility to Private for values the consumer can’t override.
Component names should be both unique and descriptive without being excessively long.
Conclusion
From ingesting raw data to curating enterprise tables, Matillion has become essential for streamlining and managing data effectively. By following the steps outlined in this blog, you can create robust shared jobs that promote code reuse, ensure consistency, and facilitate teamwork. Remember to check out Matillion Exchange for more shared jobs and content.
FAQs
How secure are shared jobs in Matillion?
Shared jobs in Matillion are secure, and access can be managed using Matillion’s robust access control features to ensure that only authorized users can modify them.
Can multiple users modify shared jobs?
Yes, multiple users can modify shared jobs with appropriate permissions, promoting collaboration and teamwork.
What are common errors in shared jobs, and how can they be fixed?
Connection issues, transformation logic errors, and load failures are common errors. Use Matillion’s debugging tools to identify and resolve these issues.
What is the difference between Copied and Shared variable behavior?
Changes made in one job or component will be seen in all others in a shared variable. With a copied variable, changes in one do not affect the others. Shared variables are for global modifications, and copied variables are for independent versions.