August 15, 2022

Using Serverless Functions to Extend Matillion for Snowflake

By Jason Kimmerling

Anyone who has used Matillion probably recognizes its resourcefulness in simplifying and accelerating the ETL/ELT process. However, it is not out of the ordinary to run into situations where Matillion’s built-in tooling cannot accomplish all that is necessary on its own. 

In this blog post, I will illustrate how both serverless functions and the Snowflake Data Cloud’s built-in features can bridge this gap, easily managing both structured and unstructured data.

The bulk of this guide will involve the AWS ecosystem, but I will include details on how to carry out these tasks in Azure and GCP as well.

Why Are Extra Tools Needed?

While Matillion provides a diverse array of features and components with which one can connect and then precipitate the ingestion of multisource data, some edge cases can inevitably cause problems. Examples of these edge cases include the following:

  • API endpoints requiring unconventional OAuth configurations 
  • JSONs using inconsistent keys
  • proprietary (non-standard) interfaces

All of these require a custom solution, and serverless functions serve as an excellent option.

To get started, let’s delve into a real-world example where extra tooling is needed.

Real-World Example

One finds that an API has information that needs to be ingested every hour. The API requires a FULL OAuth 1.0 header – signature, nonce, timestamp, etc. Because Matillion’s API tooling cannot accomplish this alone, cloud provider services will be used to fill in the gaps. The additional services which will be employed will be AWS SQS, SNS, S3, and Lambda. The steps for this solution are detailed below:

1. A scheduled Matillion job sends an SNS message to trigger the Lambda function.

A graphic showing the SNS message to trigger the Lambda function

2. The Lambda, utilizing the requests_oauthlib library to meet the API’s OAuth requirements, grabs the data from the API endpoint and saves it to an S3 bucket.

3. Once the Lambda is done loading the data to S3, it sends an SQS message to Matillion.

4. The SQS message triggers a Matillion orchestration job that then grabs the JSON from S3, loading it into a VARIANT column in a staging table in Snowflake.

A screenshot of when the SQS message triggers a Matillion

5. Optional) Matillion loads the VARIANT data from the staging table, transforming it and loading it into fact/dimension tables.

A screenshot showing when Matillion loads the VARIANT data

While the previous example could be viewed as helpful enough on its own, the SNS and SQS services can also be used to send configuration and status information between Matillion and the serverless function. 

For instance, the Lambda function can be coded to interface with multiple API endpoints, and the SNS message from Matillion (which triggers the Lambda function) can also send the config info to the function to specify which API to ingest and which parameters to use. 

On the other hand, the Lambda function can send status information to Matillion using SQS to tell Matillion whether or not the data was successfully sent to S3. Using this information, Matillion can determine whether to load the S3 data into a Snowflake variant in a staging table–or whether errors should be entered into an error logging table.

Because the steps listed above are AWS service-focused, I am including some equivalent tools for the other two primary Cloud providers below:

AWS Lambda:

AWS SQS and SNS:

AWS S3:

Actions taken through other Cloud providers would be nearly indistinguishable from the actions in my example above–with the exception of Azure and GCP having singular services replacing both AWS SNS and AWS SQS.

Conclusion

Hopefully I have successfully demonstrated that serverless functions, used in conjunction with Snowflake’s data type features, can assist Matillion with bringing in any type of data from any source.

The versatility of the serverless functions allows for the ability to connect to any data source and apply any pre-load transformations, and the robustness of data types such as VARIANT, ARRAY, and OBJECT allows any type of data to be loaded into Snowflake. Combining these tools allows for endless possibilities with data extraction and transformation. 

As a Gold partner for Matillion and an Elite Partner in Snowflake, phData specializes in helping growth-forward businesses get the most out of both platforms. If you’re looking for guidance, help, or answers, reach out to phData today!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit