Like many other ETL tools in the market, Azure Data Factory is just one that allows you to bring in data into your reporting environment. Azure Data Factory can be a powerful tool to use for businesses with pre-existing environments that already have it configured and for organizations familiar with the Microsoft Ecosystem.
In this blog, we will cover how to write data from Azure Data Factory, set up your Azure Blob Storage, and how to create a table within the Snowflake Data Cloud so you can map your fields to the table.
Before we begin, some terms that will be expanded on further will be Azure Data Factory, Service Principal, and Azure Blob Storage.
What is a Service Principal?
A service principal is an identity created within Azure to be able to access Azure resources such as applications, tools, and hosted services. To learn more about creating a service principal please refer to the documentation created by Microsoft.
In essence, what this document will walk you through is how to use the Azure portal to create an application registration which you can use to get an application ID and create a client secret which you will need for a step later in this blog.
What is Azure Data Factory?
The purpose of Azure Data Factory in this blog is to use it as a tool to create an ETL process for you to call your Power BI REST API and GET the data from it which you will then store in a source of your choice. For our choice we used Snowflake, however you are more than welcome to copy it to your source of choice.
What is Azure Blob Storage?
Azure Blob Storage will allow you to store and access your unstructured data at scale. What this means for this blog is that you will be connecting your Azure Blob Storage as an interim staging store. The Copy Activity you will use within your Azure Data Factory Pipeline will automatically manage the flow from staging to source for you.
Before we begin talking about how to copy your data using Azure Data Factory, there are a few prerequisites you need to have set up to ensure you are able to configure this for your organization. You will need:
- Azure Blob Storage Linked Service
- Snowflake (Create, Read, Update, Delete (CRUD) Access
- Sink and Source Linked Services Created
How to Copy Data to Snowflake
To begin, we will go through some core steps that have been covered in the Getting Activity Data from Power BI Service with REST APIs blog. All limitations around Direct Copy to Snowflake are listed in the Microsoft Copy and Transform data in Snowflake Document.
Step 1: Setting up Your Linked Services
To start configuring your ADF Pipeline, you will first need to set-up your linked services, which will serve as your Source and Sink for your pipeline. For this blog, we use pre-existing connections that have already been discussed, but you can feel free to use whichever source you would like for your Source connection.
To do so launch your Azure Data Factory Studio and select the buttons in the order shown below:
Step 2: Setting up your Source (REST):
After clicking on the briefcase icon with the wrench in it, click on NEW. Then you will type in or locate REST as that will be your source for the dataset. After you select Continue, you will fill in all of the information and click on Test Connection (Located on the Bottom right.) Once your test connection passes successfully, you will click on SAVE to confirm your credentials.
Step 3: Setting up your Sink (Snowflake):
Repeat Step 1 and then Search up your data store, this may be any sink you would like set-up. In our case, we went with Snowflake.
*Reminder: Your Snowflake must be hosted on Azure for this to succeed*
If you are not familiar with where to find this information within Snowflake, you can refer to the image below:
Step 4: Setting up your Blob Storage for Staging
Repeat Step 1 and set-up you Blob Storage
Step 5: Building Your Copy Activity
Bring in the Copy Data Activity. Feel free to name it as you wish.
Connecting Your Source
After naming your Activity, you can then proceed to the “Source” tab.
Within the steps for “Source” you need to:
- Select your Linked Service that is going to be your Source dataset
- Proceed with adding in the required information for it.
Connecting your Sink (Snowflake)
Next we will walk through the steps to set up your “Sink” tab.
Within the steps for “Sink” you need to:
- Select your Sink Dataset
- *If you do not have a sink dataset* Select the “Open” Pencil Icon
- Select your Linked Service that you set-up earlier
- Select the table you would like to load the data into
- *If you do not have a table within Snowflake* You can create one for your dataset.
Mapping Your Fields
Once you have set-up your source table. You will need to map all fields into your ADF pipeline and you can do so by navigating to the “mapping” button.
After this step you will be mapping all fields as shown below:
Once you have mapped all of your fields you can proceed to the “Settings” tab.
Settings (Copying to Blob Storage Staging)
Once you are in your settings you will connect the Blob Storage Linked Service you created earlier in this blog and use the storage path of your choice.
Ensure these settings match otherwise you may run into some unexpected errors.
Step 6: Testing/Automating Your Pipeline
To wrap, you can test out your pipeline by hitting the “Debug” button and selecting the number of previous days you would like to bring in.
To automate your pipeline, you can set up a trigger with your desired times to refresh. Once daily is what is recommended to pull the prior day’s data.
Please make sure to keep these limitations in mind when creating your Pipelines
Congratulations! You have successfully set up your connection from ADF to Snowflake! You can now directly connect your data to feed it to Snowflake and analyze it further from there. If you would like to automate your pipeline please proceed to the next steps.
For more information on building out your ADF Pipelines to ingest your data and feed it to Snowflake or questions on building out pipelines using ADF for your organization, please reach out to us.