Navigating Snowflake’s Snowsight and Snowpark can be a challenging task, especially when it involves developing stored procedures using Python and external libraries.
In this blog post, we’ll explore the creation, deployment, and testing of a Snowpark stored procedure using Python within the recently introduced Python worksheets in Snowsight.
What is Snowsight?
Snowsight is a modern, interactive web-based interface that’s part of the Snowflake Data Cloud platform. It was created to replace the classical interface and is designed to elevate the user experience when it comes to exploring, visualizing, and reporting data.
Snowflake has been enhancing worksheets with Snowsight in 2023, and customers are slowly adapting to the new Snowsight from last year. Snowflake is also improving Snowsight to support newer features like Task Observability, Snowpark, etc.
What is Snowflake’s Snowpark?
Snowpark is a new developer experience for Snowflake that allows developers to write code in their preferred language and run that code directly on Snowflake. It exposes new interfaces for development in Python, Scala, or Java to supplement Snowflake’s original SQL interface.
SQL is, of course, the lingua franca for data, but there are many applications and development teams that rely heavily on other languages.
What’s New with Snowpark?
With the introduction of Snowpark, customers can write stored procedures in Python, Scala, Java, etc. There are several IDEs (Integrated development environment) and plugins available to do Snowpark development in the local environment.
But our customers often ask, “Can I write a Python stored procedure in a regular SQL worksheet?” Of course! Snowflake and Snowsight worksheets allow us to create Python or Java stored procedures, but it’s not always intuitive.
Snowflake listened to customers and recently introduced Python worksheets (In Public Preview) in Snowsight, which will significantly improve the development run time. There are several built-in features within the Python worksheet.
In this blog, we’ll look specifically at how to create a Snowpark stored procedure using an external library and deploy them.
How to Create a Snowpark Stored Procedure Using an External Library
Snowflake supports several Anaconda Python packages, but there will always be a need to implement an external library that is not part of the default packages.
What do we need for this blog?
- Snowflake Account
- Local environment (Terminal) to package Python Libraries with SnowSQL installed
What will we be doing?
- Create new Python template worksheet using Snowsight
- Update worksheet with new code
- Set up required packages
- Compile and run
Let’s see how to develop some simple Python code that uses an external library and how to build a Snowpark Stored Procedure.
Step 1: Create a New Python Template Worksheet Using Snowsight
Login to your Snowflake (Snowsight) environment.
Click on the + button and select Python worksheet. Snowflake will create a worksheet with sample Python code, as shown below.
Click on the run button to check if the sample code works after setting database, schema and warehouse.
Step 2: Update Worksheet With New Code
Now let’s write a simple Python code that parses a JSON string using the JSONPath library and click on the run button.
# Parse JSON using JSONPath
import snowflake.snowpark as snowpark
import json
from jsonpath_ng import jsonpath, parse
def main(session: snowpark.Session):
json_string = '{"id":1, "country":"USA"}'
json_data = json.loads(json_string)
jsonpath_expression = parse('$.id')
match = jsonpath_expression.find(json_data)
return match
Snowflake will run the Python code and throw the following error.
This means that the Snowflake Anaconda package does not have this library jsonpath_ng as part of the internally supported list of packages and will need to load externally.
Step 3: Setup Required Python Packages
For this purpose, we will use Snowflake CLI from Snowflake Labs to package the required library.
# Create demo directory
mkdir demo
cd demo
# Create conda environment with python 3.8
conda create -n snowpark python=3.8
conda activate snowpark
# Install necessary packages
pip install snowflake-connector-python==2.7.12
pip install snowflake-cli-labs
# Note: Snowsql must be installed if it is not installed already
# https://docs.snowflake.com/en/user-guide/snowsql-install-config
# Configure Snow CLI environment
snow connection add
# Fill in the following information
Name for this connection: demo
Snowflake account:
Snowflake username:
Snowflake password:
Wrote new connection demo to ~/.snowsql/config
# Configure Snowflake DB/Schema etc
snow configure
# Fill in the required information
Snowflake database: dg_blog
Snowflake schema: demo
Snowflake role: sysadmin
Snowflake warehouse: adhoc
Configuring environment #dev...
# Verify if external package exists
snow package lookup jsonpath_ng
# Snowflake will if the package exists and returns the result. It will also tell if it requires any specific native library. In this case , it is ply, decorator and six
Now that we have verified that the package exists, we can create a package and load it into a Snowflake stage. We can do everything using Snowflake CLI.
# Create required external Python package
snow package create jsonpath_ng
# This creates a package.zip file in the current folder. In this case jsonpath_ng.zip.
# Login to Snowflake and create stage
snow login
Path to Snowsql config [~/.snowsql/config]:
Connection name (for entry in snowsql config): demo
Using connection name demo in ~/.snowsql/config
snow stage create pypackages
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Stage area PYPACKAGES successfully created. │
└─────────────────────────────────────────────┘
# Now upload the package to a stage location. Create Snowflake
snow package upload -f jsonpath_ng.zip -s pypackages
Uploading jsonpath_ng.zip to Snowflake @pypackages/jsonpath_ng.zip...
Package jsonpath_ng.zip UPLOADED to Snowflake @pypackages/jsonpath_ng.zip.
The package is now loaded into the Snowflake environment. Next, go back to the Snowsight worksheet and include the package as shown below. Click on Packages and select tab “Stage Packages” Fill in the package location along with the stage path and click on the Import button.
Before running, click on the Settings menu and change the return type.
Click on the Run button. Now we can see that the Snowflake compiler is throwing an error that one of the native packages is missing.
Let’s return to the Packages menu and ensure we add the ply package under Anaconda Packages.
Step 4: Compile and Run
Now if we click on the Run button, Snowflake will compile and run the code. As we can see below, it is much simpler to develop, debug, and test Python code using the Snowsight worksheet.
It is also easy to deploy or convert this Python code into a Snowflake Stored Procedure. Click on the “Deploy” button at the right corner to convert this into a stored procedure.
Conclusion
We hope this blog gives you an idea of how to use the Snowsight Python worksheet for development and then deploy them into a Stored Procedure. This feature significantly speeds up development without needing to set up an IDE or Jupyter kind environment locally.
At phData, we have completed hundreds of implementations and migrations to Snowflake. If you have any questions or if you’re curious about how we can help migrate your workloads to Snowflake, don’t hesitate to contact us!