January 19, 2024

Using Matillion Data Productivity Cloud to Call APIs

By Marcus Montenegro

Most systems now have API endpoints to facilitate application interactions and enable new automation. Matillion Data Productivity Cloud also offers API endpoints, but it can also assist you in integrating other systems by making API calls that you may orchestrate inside your workflows!

Let’s see how that works.

What is Matillion Data Productivity Cloud?

Matillion’s Data Productivity Cloud is a versatile platform designed to increase the productivity of data teams. It provides a unified platform for creating and managing data pipelines that are effective for both coders and non-coders. 

The platform features AI-powered tools that enable the integration of large language models (LLM) into your data pipelines, as well as a great connector library and a visual, low-code design that supports a wide range of data movement and transformation operations.

The platform simplifies data pipeline orchestration by providing tools for automation, scheduling, and comprehensive visibility. It’s designed to work with people of all skill levels and interact smoothly with existing technology stacks. 

Matillion is also built for scalability and future data demands, with support for cloud data platforms such as Snowflake Data Cloud, Databricks, Amazon Redshift, Microsoft Azure Synapse, and Google BigQuery, making it future-ready, everyone-ready, and AI-ready.

Its core, PipelineOS power, uses stateless microservice agents for scalable data flow and transformation while keeping costs low and performance high, with consumption pricing based on time spent running data pipelines rather than simply being active. As a result, Matillion is an excellent choice for businesses wishing to optimize their data operations in a scalable and user-friendly environment.

Why Connect to API Endpoints

API stands for Application Programming Interface. This is a type of software interface that allows two applications to speak to each other. Simply put, this is a means for you to pass and get information through a system without having to look at the internal workings.

Connecting to API endpoints enables the automation of various tasks, which leads to increased process efficiency. An API can automate the process of retrieving data from other applications, thereby avoiding the need to enter it manually. For example, a corporation may use an API to connect an internal application to a CRM (Customer Relationship Management) system to automate data synchronization.

How to Connect to an API Endpoint

A connector is required to link a Data Productivity Cloud job to an API. Matillion provides a large number of connectors to help you connect to various systems, so check first if there is yet to be a connector already in place. You will design your own if you cannot find an existing connector!

The next items will demonstrate how to build a custom connector to the API you want to use.

Make Sure no Connectors are Already There

Go to Matillion Hub and sign in to your account to see the full list of connectors. On the homepage page, look for Manage custom connector at the bottom of the Quicklinks section, which should be used to access the list of connectors.

Explore the list of Flex connectors on this page or use the search bar to locate the system you would like to connect to the Data Productivity Cloud. You might find what you’re looking for here, but if you still need to, you’ll have to design a new custom connector.

If you find the system you’re looking for in this list, you can take it as an example to start configuring it to your specific needs. 

Customizing a Flex Connector

For this example, I used the GitHub Flex connector. When you select a Flex connector from the list, you will be led to a wizard that will assist you in properly configuring it. People familiar with Postman or most other programs to facilitate API calls will recognize this. 

As you can see, multiple endpoints are already on the left side, each containing the majority of the information required to connect. You simply need to make a few changes to connect to the repository where you would like to work.

For this example, I’ll configure my Flex connector for the Repository Contributors endpoint, but you can use any other endpoint or even configure multiple ones. If you use a different connector to test it, you will see other endpoints than those indicated below, but it will be a matter of properly configuring them.

The first step in configuring any endpoint for GitHub API requests is to generate a token that will allow you to access it. Log in to your account on the GitHub platform. I won’t get deep into this part, but it is easily accessible. Usually, following these instructions will lead you to the location where you can make a token:

Your profile picture is in the up-right corner > Settings > Developer Settings > Personal access tokens > Tokens (classic).

Return to the Flex Connector configuration once you’ve obtained the token. Select the endpoint you want to set up and enter your token in the Authentication field. Because we are utilizing a token in this scenario, keep Bearer Token as the authentication type.

Now, notice that Matillion has already built the HTTPs URL for you, but you can see this {owner} and {repo} there. These are parameters that have been added to the URL in order to make it more dynamic. Let’s go to the Parameters tab and assign them values. Matillion will look into these values and replace them with the variables before making the API call for us. 

Two parameters are waiting for you to assign values to them. Enter the values you want to assign to each parameter, and Matillion will utilize them when making the API request. There are also options to add extra parameter variables if necessary for your particular case.

That endpoint does not require any headers or body, although there are tabs if your API call requires it.

Once you’ve finished configuring it, click the Send button to test it. If everything ran well, you should have received a response from the API, which will be displayed at the bottom, as seen in the image.

If that doesn’t work, check the configuration. That is most likely missing something or is not yet correctly configured.

Please keep in mind that if you are utilizing the GitHub Flex connector, your repository must have at least one file. It could be as basic as a README.txt file. The GitHub API will not find repositories that contain no files.

Your connector is now ready to use! Save it, and then add it to your orchestration jobs.

This is a reminder that tokens used during the configuration process will not be saved in the connector. To use it, first create a secret for your token in the project you started by navigating to the Secret definitions page before going into the branch you’ll be working on.

Create a Custom Connector

If you need to create your own connector, go to the Manage custom connectors page shown in the previous section. The Add connector button at the top will allow you to begin the connector creation process.

If your environment has no custom connectors, the following screen will show up:

Create your custom connectors screen

Otherwise, a list of your existing custom connectors will be displayed. The Add connector button will be on the right side in this scenario.

You will be taken to a new page when you click to add a connector. That will be familiar to people who are comfortable with Postman or most other applications designed to facilitate API calls. That page will provide a wizard for configuring all of the API settings needed for the connection. For this project, we will utilize a simple OpenLibrary API to find many books based on a subject and a time window. Each API has its own set of requirements. Check the API documentation to discover what parameters must be passed into the API call and configured in this wizard.

Check out the API documentation for our sample.

Now, we’ll make a GET request to the following endpoint, which is set up to look for analytics books released between 2014 and 2024.

				
					"http://openlibrary.org/subjects/analytics.json?published_in=2014-2024"
				
			

There are no additional parameters or authorization required for this API. Change the name of the connector at the top and the endpoint name on the left as preferred. You can do this by clicking the pencil icon. I updated the default names to OpenLibrary API and Subjects in this case.

To test your API call, simply click the Send button. The response flag, as well as the response body and headers, can be found at the bottom. Once you’ve completed the configuration, click the Save button in the top right corner.

This completes the design of your own connector. Excellent work! 

The next step is to put your newly custom connector to use.

Using Your Custom Connector

Now that you’ve installed the custom connector, you can return to your Matillion Hub home page and navigate to your Data Productivity Cloud by selecting the Design data pipelines option and browsing through the options until you see the branch where you want to use the new connector.

Open or create an Orchestration job and search for the connector’s name in the Add Components section’s search bar. There should be a new component with the name you choose and the Custom tag on the right side showing that this is a custom connector.

show how to search new custom connectors

Drag the new component into your orchestration job and configure it to use the endpoint you specified during the connector creation process. Aside from that, you will choose where the data will be stored in your data warehouse and the staging location. Additional setup is typically optional.

Validate the workflow to ensure that everything is working correctly. After that, your connection is ready, transferring data from your API endpoint to the selected destination table!

Closing

In this blog, you learned how easy it is to make new connections to API endpoints with which you may need to interact. The custom connector works very similarly to the API extract feature in Matillion ETL. With that, you can cover most of the necessary connections. For specific scenarios that the present custom connectors feature cannot adequately protect, you can always contact the Matillion team to request a new connector or their assistance with your specific issue.

Do you require assistance designing and implementing a data pipeline or leveraging your organization’s Matillion Data Productivity Cloud?

Please contact our team for assistance in accomplishing this goal

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit