Python is the top programming language used by data engineers in almost every industry. Python has proven proficient in setting up pipelines, maintaining data flows, and transforming data with its simple syntax and proficiency in automation.
Having been built completely for and in the cloud, the Snowflake Data Cloud has become an industry leader in cloud data platforms. Snowflake has a versatility that is unmatched by scaling compute and storage independently to meet the needs of the user.
In this post we’ll explain why connecting these two data engineering powerhouses together is important, explain how to connect them securely, and provide some tips to help make creating your connection from Python to Snowflake go smoothly.
Why Connect Snowflake to Python?
Connecting Snowflake to Python can be a game changer for your data services. Python can be used to migrate your data from a previous platform to Snowflake, create or manage data pipelines for Extract, Transform, and Load (ETL) processes, perform data science tasks such as machine learning or create data analysis visualizations. Truly a must-have tool in your data engineering arsenal!
Before we begin our connection steps, some prerequisites are required:
- Snowflake account
- Snowflake user
- A supported version of Python installed:
How to Connect Snowflake to Python
Step 1: Install the Snowflake Connector for Python
Snowflake provides a specific package to allow easy connection to Python. The connector is available in Linux, Windows, and macOS.
- To install the connector, first check to ensure you have a supported version of Python installed:
- Second, ensure that the latest version of pip is installed to be used to install the connector:
- Finally, run the following command to install the latest version of the Snowflake connector:
- Note: If you require an installation of a specific version of the connector, add ==<version> to the install command where <version> is the version you’d like to install. For example, to install version 2.7.9 the command would be:
Step 2: Create the Connection
Create a new Python file in your favorite IDE. First, we need to import the newly installed Snowflake connector package:
Now that the package is imported, we can create our connection. We use the connect() function of the package to connect to Snowflake. This function requires the Snowflake account identifier to be passed along with authentication information based on the type of auth used.
The account identifier is the first segment of the domain of your Snowflake URL along with the cloud provider and region. Example:
- Snowflake URL: http://abc123.us-east-1.aws.snowflakecomputing.com
- Account identifier: abc123.us-east-1.aws
Three types of authentication are allowed: Default, Single Sign On (SSO), or Key Pair.
Note: All of these examples display hardcoded sensitive information. It is highly recommended to either create environment variables or pass these in as parameters when running your Python code for security reasons.
- Using default authentication
- The default authentication requires a simple username and password
- Using SSO authentication
- To use SSO authentication, simply pass authenticator=’externalbrowser’ in the connect() function. This will open a web browser when the Python code is run.
- Using Key Pair Authentication
- To use key pair authentication, we need to decrypt the private key file and pass it as a parameter to the connect() function to create a connection:
- For more information on how to configure key pair authentication and key rotation in Snowflake, see Key Pair Authentication & Key Pair Rotation.
Step 3: Test the Connection
A simple way to test the connection is to run a metadata query that doesn’t require a warehouse in Snowflake. For example:
If the connection is successful, you should see the current version of Snowflake printed on the console.
Here is a full example using basic authentication:
3 Additional Tips:
- The connect function in Python takes optional parameters of the warehouse, database, and schema to be used when running queries that aren’t metadata.
- If you are unsuccessful at connecting, contact your Snowflake administrator to check if a network policy is enabled within Snowflake. If so, the administrator will have to whitelist the IP addresses here to allow Informatica to connect to Snowflake.
- Instead of using a personal username and password, a service account user and role should be created for Python to use and take the human element out of the equation
Snowflake is leading the way in cloud data platforms and being able to programmatically connect to your data warehouse via Python is essential for any data engineer. Engineers are able to connect Snowflake to Python in a matter of minutes using the Snowflake Connector for Python, as detailed above. Once the connection is established, Python can be used for an almost infinite amount of use cases in your data engineering stack.
Looking For Help With Your Next Data Engineering Project?
From Data Engineering projects to Cloud-Migrations, phData helps organizations make more data-driven decisions.