A well-structured workflow can make all the difference—it’s the key to working smarter, not harder. With a background in Data Visualization and BI tools, I’ve always approached things with a visual mindset.
That naturally carried over into Matillion, where designing clear, intuitive workflows isn’t just about making things look nice—it’s about making them easier to build, debug, and scale. A messy, hard-to-follow workflow can slow everyone down, but a well-organized one keeps projects moving smoothly and makes collaboration a breeze.
While working on a project in the Matillion Data Productivity Cloud environment, my colleague and I focused on building workflows using separate branches and integrating GIT practices. Along the way, we saw firsthand how important it is to create workflows that aren’t just functional but also easy for others to pick up and work with. Whether you’re collaborating with teammates or handing something off to a client, a well-structured workflow saves time, reduces headaches, and makes life easier for everyone.
We’ll start by breaking down what a Matillion pipeline is, then dive into some best practices to keep your workflows clean, scalable, and easy to maintain. As a bonus, we’ll check out Matillion’s AI Copilot and see how AI can help take workflow design to the next level.
For those unfamiliar with GIT or GIT practices, please refer
Git for Business Users with Matillion DPC
What is a Matillion Pipeline?
A Matillion pipeline is a collection of jobs that extract, load, and transform (ETL/ELT) data from various sources into a target system, such as a cloud data warehouse like Snowflake. These pipelines are designed using a drag-and-drop interface with components that define the flow of data and transformations applied at each step.
Matillion allows users to create both orchestration jobs (which manage overall workflow execution) and transformation jobs (which handle data processing). Given the complexity of data workflows, following best practices to maintain clarity and efficiency is crucial.
Matillion is now available directly through the Snowflake Marketplace, making it easier to integrate with your Snowflake environment.
Workflow Design Best Practice Recommendations
Matillion provides a visual, component-based approach to data transformation. However, without thoughtful design, workflows can quickly become difficult to navigate. Through experience, I’ve found that following these simple best practices helps keep workflows clean, readable, and easy to maintain.
The workflow we’ll reference throughout this blog was built using customer data from TrellisMart, a fictional retail company. Its primary goal is to create a comprehensive customer data table enriched with information on States, Regions, and Consumer Categories. This workflow will streamline data analysis, uncover shopping trends, and enable more personalized marketing strategies.
TrellisMart aims to enhance decision-making, drive overall sales growth, and curate a final data table that supports robust data visualization by integrating and visualizing key data points, such as customer demographics, purchase history, and geographic trends.
We’ll walk through the overall layout of the workflow but won’t dive into the individual components. For a deeper look at building workflows and jobs, check out this blog:
How You Can Create and Share Your Own Matillion Shared Job.
1. Modularity: Break It Down for Simplicity
One of the biggest takeaways from my recent project was the power of modularity. Structuring workflows into well-defined sections improves clarity, reduces complexity, and makes troubleshooting more efficient. By logically grouping components—such as data ingestion, transformation, reference table joins, calculations, and output formatting—the workflow becomes more structured, easier to navigate, and adaptable to future changes.
Best Practice
Group components into logical sections when designing jobs.
Use shared jobs for repeatable logic (e.g., data cleansing, transformations).
Minimize excessive branching—keep each module focused and concise
While there’s no one-size-fits-all approach to workflow design, organizing components into clear sections improves readability and efficiency. In this case, we needed to flatten the main data table before integrating it with reference tables to build the final Customer_Clean
table.
For this transformation job, we grouped the components into five sections:
Ingestion – Brings data into the workflow.
Transformation – Structures and flattens the customer dataset.
Reference Joins – Integrates additional reference tables.
Calculations – Applies business logic and configures new data fields.
Formatting & Output – Prepares the final structured dataset.
These groupings are flexible—adjust them to fit your specific workflow, job, or dataset while maintaining an organized structure.
2. Clarity and Readability: Workflows Should Speak for Themselves
Throughout our project, switching between branches meant quickly getting up to speed with each other’s changes. Well-structured workflows made this process much easier. While GIT integration provided helpful notes for auditing, a visually organized workflow had an even bigger impact. Personally, I find GIT notes useful for tracking changes, but as a developer, nothing beats being able to see the workflow itself to fully understand what’s been updated.
Best Practice
Name components descriptively (e.g.,
Customer_Raw
instead ofJob1
).Use consistent naming conventions for jobs, tables, and variables.
Keep components aligned neatly and spaced appropriately—don’t scatter them randomly across the canvas.
Structuring Input Tables
To ensure workflows remain clear and easy to follow, structure input tables in a way that visually represents their relationships:
Perform one join at a time instead of stacking multiple joins in a single component.
Position the main input table at the top of the workflow.
Place secondary/joining tables below the main input, indenting them slightly to indicate hierarchy.
Align transformations and associated workflows with their respective input tables to improve readability.
When joining tables (e.g., A and B), place the Join component on the same line as table A (whichever is logically first).
Maintain vertical alignment so each join step follows a clear, logical data flow without excessive crossing lines.
In this illustration, we see an example of a natural data table hierarchy:
This approach ensures that the workflow remains structured and easy to debug. When multiple joins are combined into a single component, troubleshooting can become difficult, requiring extra effort to reverse engineer potential issues or track down root data problems. By keeping joins separate and aligning tables logically, users can quickly trace data relationships and understand how inputs connect.
While not every workflow will follow this exact structure, using it as a guideline helps maintain clarity, making workflows more intuitive and easier to manage.
Descriptive Naming Conventions
Zooming in on Group 4 – Calculations, identified in the Modularity section, we can see how clear and descriptive naming conventions improve workflow readability. At a glance, the component names indicate key operations, such as converting data types, creating date calculations, and applying business logic. This structured approach makes it easy to locate specific transformations, simplifying troubleshooting and updates.
The key takeaway? Component names should be self-explanatory, requiring little to no additional explanation. Keeping them short, clear, and descriptive ensures that the workflow’s purpose is immediately understood, making it easier to navigate and maintain.
3. Component Connections: Avoid Tangled Workflows
During debugging, the last thing anyone wants is to untangle a web of crisscrossing connections. A clean, well-structured workflow makes troubleshooting faster and less frustrating.
Best Practice
Minimize overlapping lines—ensure components are aligned vertically or horizontally to avoid tangled connections.
Make the workflow intuitive—if you have to trace or follow lines too much, it might indicate a design that’s too complicated or difficult to follow, making debugging more challenging.
Use flow control components (when necessary)—to manage complexity and keep the workflow organized.
Be mindful of flow logic connectors (when necessary)—Matillion offers connectors for true, false, and error logic:
Blue (top): Followed when the expression evaluates to
true
.Red (center): Followed if a runtime error occurs during evaluation.
Orange (bottom): Followed when the expression evaluates to
false
.
Intuitive Workflow Design
Workflows should be easy to follow and visually organized, much like clean, well-structured SQL or Python code. Just as poor SQL practices—such as using SELECT *
instead of specifying columns or relying on complex subqueries instead of efficient joins—can make queries harder to understand and slow performance, the same applies to workflows.
Tightly packed components or crisscrossing connectors can create confusion and hinder progress, making the workflow more difficult to manage and maintain.
-- Bad: Using a subquery
SELECT name FROM customers WHERE department_id IN (SELECT id FROM departments WHERE name = 'Sales');
-- Good: Using a JOIN
SELECT c.customer
FROM customer c
JOIN departments d
ON c.department_id = d.id
WHERE d.name = 'Sales';
Matillion is designed as a no/low-code ELT tool, so let’s leave the SQL deep dive for another time and focus on making workflows as clean and intuitive as possible!
Best Practice
Not Best Practice
You might be thinking, “No one actually designs workflows like this…”—but believe it or not, it happens more often than you’d expect. While there may be some attempt at structure—like grouping calculations and aggregations together—the overall design makes it difficult to follow, debug, or scale.
Crisscrossing connectors, inconsistent input table placement, and tightly packed components create unnecessary confusion. A workflow like this might technically work, but as workflows grow in complexity, they become a nightmare for your team or client to maintain.
4. Comments and Notes: Documenting for Future You (or Someone Else)
Good documentation makes life easier—not just for you but for anyone who might need to pick up your work later. Keeping notes within the workflow ensures smoother handoffs and makes it easier to track changes, troubleshoot issues, and validate transformations. Think of it as leaving breadcrumbs for future you (or your teammates) to follow.
Best Practice
Use notes to explain complex sections of your workflow.
Label critical variables and key transformations for clarity.
Document business rules and assumptions directly within the workflow.
Apply consistent color coding to indicate statuses (e.g., success, failure, review).
Keeping Notes Dynamic: A Living Record of Your Workflow
Your documentation should evolve alongside your workflow. As a developer, you should be adding or updating notes as new information comes in. Here are some recommended note-taking styles:
1. Overall Workflow Notes
Think of this as a running log that captures key insights during the data discovery and development process, including:
Business logic gathered from SMEs or business teams.
Issues, successes, and areas for improvement.
Key decisions made and why they were chosen.
Data tables used and their role in the workflow.
Workflow objectives to ensure everything stays aligned with project goals.
This is the backbone of your documentation. Anyone stepping into the workflow (including future you) will immediately get a sense of its purpose, the logic behind it, and any pitfalls to watch for.
Tracking Progress & Collaboration
Adding a Data Discovery/Development Note as Overall Workflow notes is an easy way to track objectives, blockers, next steps, and ongoing progress. It’s especially useful for:
Keeping track of personal reminders or tasks in progress.
Documenting updates and changes when working with a team.
Ensuring quick knowledge transfer if you’re unavailable or transitioning projects.
2. Component-Specific Notes
Adding component-specific notes is a great way to isolate your comments and notes around the specific component. It’s especially useful for:
Documenting row counts when joining tables to validate data integrity.
Flagging questions about the joining method or approach.
Creating placeholders for follow-ups when meeting with SMEs or business units.
Utilize color-coded notes to provide quick visual cues for status updates (e.g., pending review, required validation, confirmed logic).
3. AI-Generated Notes
Matillion’s AI-generated notes are a powerful tool for documenting and understanding workflows. They provide clear explanations of individual components or groups of components, making it easier to validate business logic and track transformations.
Formula Summaries – AI notes offer concise summaries of calculations, helping to validate business logic.
Simplified Process Overviews – For complex transformations, AI-generated notes break down the process into digestible insights or provide a detailed step-by-step explanation.
Workflow-Level Summaries – When selecting an entire workflow, AI notes compile a comprehensive breakdown of all selected components into a single note. This eliminates the need to click into each component individually, making it easier to capture the purpose of every step at a glance.
Managing Notes: When to Keep, Remove, or Move Them
Notes aren’t set in stone—they should shift and evolve as your workflow develops. Here’s how to manage them effectively:
Temporary Notes – Used for testing, validation, or blockers. Once the workflow is complete, these can be removed or moved to the bottom of the workspace to declutter.
Long-Term Notes – Important insights like business rules, transformations, and logic should stay within the workflow for future reference.
Team-Agreed Standards – There’s no single “right” way to document. Find what works best for your team and stick to a consistent approach.
By keeping your documentation clear, concise, and well-structured, you make your workflows easier to maintain, debug, and hand off—long after the project wraps up.
Leverage AI Features in Matillion
Matillion has introduced AI-powered features, such as Matillion Copilot, to assist users in designing and optimizing workflows more efficiently. These AI-driven tools can help with job creation, component recommendations, and even in-tool documentation.
Benefits of Using AI in Matillion:
Faster Workflow Creation – AI suggests components and automates repetitive tasks, accelerating development.
Optimization Insights – AI offers recommendations to enhance job performance and resource efficiency.
Enhanced Decision-Making – AI-driven tools analyze data structures and recommend the most efficient transformations.
Notes Using AI – Add notes about a component or workflow using AI, providing context and documentation for easier understanding and collaboration.
Example Use Case: Using Copilot to Optimize String Replacements
Imagine you’re designing a workflow that cleanses customer data by standardizing state names. Your dataset contains multiple variations like “TX”, “Tx”, “texas”, “TEXAS”, and you need to replace them all with “Texas”.
Instead of manually creating multiple Replace functions, Matillion Copilot suggests using a single CASE
statement with UPPER()
, optimizing performance:
Original Approach
REPLACE(
REPLACE(
REPLACE(
REPLACE(state, ' TX', ''),
' Tx', ''),
' texas', ''),
' TEXAS', '')
)
CASE WHEN state = 'TX' OR state = 'Tx' OR state = 'texas' OR state = 'TEXAS' THEN 'Texas' END
Optimized Copilot suggestion
CASE WHEN TRIM(UPPER("state")) IN ('TX', 'TEXAS') THEN 'Texas' END
By leveraging Copilot’s AI-driven recommendations, you reduce redundant logic, improve efficiency, and ensure data consistency across the workflow. For additional Matillion AI Features, visit Matillion AI Overview.
Final Thoughts
Having gone through this process firsthand, I’ve seen how following these best practices makes a real difference. A well-structured workflow is more than just aesthetics—it directly impacts collaboration, troubleshooting, and long-term maintainability.
Whether you’re working within a team, handing off work to a client, or future-proofing your projects, these design principles ensure your workflows remain clear, effective, and easy to manage.
With the rise of AI tools like Matillion Copilot, best practices are evolving to enhance workflow efficiency by automating repetitive tasks, refining transformations, and ensuring consistency. However, the core principles—modularity, workflow clarity and readability, component connections, and note documentation—remain just as important across all ETL/ELT platforms.
While AI can greatly boost efficiency, it’s essential to integrate it into best practices in a way that still allows for human oversight and guidance. This balance ensures that workflows are optimized for performance while retaining the quality and control that comes from human expertise.
By following these best practices, you can create workflows that are clear, scalable, and efficient, all while taking full advantage of Copilot’s capabilities.
Need assistance?
If you have any additional questions or need assistance with creating workflows in Matillion, reach out to our team of Matillion experts at phData!
FAQs
How Do I Set Up Gen AI in Matillion DPC?
Gen AI in Matillion’s Data Productivity Cloud enhances workflow creation by providing AI-driven suggestions and automation. Since Matillion DPC is a fully managed platform, there’s no complex setup required—Gen AI capabilities are built directly into the environment. Users can immediately start leveraging AI-assisted transformations and recommendations within the interface.
Review the following article for detailed steps on managing and setting up Snowflake Cortex LLM or AWS Bedrock functions.
I’m a Matillion customer, but I don’t see Copilot. Why?
Currently, Copilot is only available for transformation pipelines for Enterprise Edition customers and is exclusive to Snowflake projects.
To explore plan options, visit Matillion Pricing for more details.
How does Matillion Copilot handle my data?
Matillion’s AI Note feature is powered by a large private language model (LLM) securely hosted within Matillion’s AWS infrastructure.
No customer data is shared with the large language model.
AI Notes are generated using metadata from pipeline components, structured in Matillion’s proprietary Data Pipeline Language (DPL).
While component parameters are included as metadata, actual table contents and file data are never accessed or processed.
Where are user prompts for Matillion Copilot stored?
User prompts are stored within a database in the control plane of the Data Productivity Cloud (DPC). This allows Matillion to provide context to the Large Language Model (LLM) when a user sends multiple messages within a session.
All user inputs sent to Copilot and the agent’s responses are recorded. This includes:
Information about existing tables within a schema and their structure.
Every action taken by Copilot, such as adding components.
The actual textual responses generated by Copilot.
Can an admin review or maintain a history of user interactions with Copilot?
Currently, this feature isn’t available. However, the idea has been shared in the Matillion Roadmap Portal! If you’d like to see this feature included in Matillion’s development of Copilot, feel free to give it an upvote!