July 19, 2023

Snowpark Stored Procedure Development using Python Worksheets

By Deepa Ganiger

Navigating Snowflake’s Snowsight and Snowpark can be a challenging task, especially when it involves developing stored procedures using Python and external libraries. 

In this blog post, we’ll explore the creation, deployment, and testing of a Snowpark stored procedure using Python within the recently introduced Python worksheets in Snowsight.

What is Snowsight?

Snowsight is a modern, interactive web-based interface that’s part of the Snowflake Data Cloud platform. It was created to replace the classical interface and is designed to elevate the user experience when it comes to exploring, visualizing, and reporting data. 

Snowflake has been enhancing worksheets with Snowsight in 2023, and customers are slowly adapting to the new Snowsight from last year. Snowflake is also improving Snowsight to support newer features like Task Observability, Snowpark, etc. 

What is Snowflake’s Snowpark?

Snowpark is a new developer experience for Snowflake that allows developers to write code in their preferred language and run that code directly on Snowflake. It exposes new interfaces for development in Python, Scala, or Java to supplement Snowflake’s original SQL interface.

SQL is, of course, the lingua franca for data, but there are many applications and development teams that rely heavily on other languages. 

What’s New with Snowpark?

With the introduction of Snowpark, customers can write stored procedures in Python, Scala, Java, etc. There are several IDEs (Integrated development environment) and plugins available to do Snowpark development in the local environment. 

But our customers often ask, “Can I write a Python stored procedure in a regular SQL worksheet?” Of course! Snowflake and Snowsight worksheets allow us to create Python or Java stored procedures, but it’s not always intuitive. 

Snowflake listened to customers and recently introduced Python worksheets (In Public Preview) in Snowsight, which will significantly improve the development run time. There are several built-in features within the Python worksheet. 

In this blog, we’ll look specifically at how to create a Snowpark stored procedure using an external library and deploy them. 

How to Create a Snowpark Stored Procedure Using an External Library

Snowflake supports several Anaconda Python packages, but there will always be a need to implement an external library that is not part of the default packages.

What do we need for this blog?

  • Snowflake Account
  • Local environment (Terminal) to package Python Libraries with SnowSQL installed


What will we be doing?

  • Create new Python template worksheet using Snowsight
  • Update worksheet with new code
  • Set up required packages
  • Compile and run


Let’s see how to develop some simple Python code that uses an external library and how to build a Snowpark Stored Procedure. 

Step 1: Create a New Python Template Worksheet Using Snowsight

Login to your Snowflake (Snowsight) environment.

Click on the + button and select Python worksheet. Snowflake will create a worksheet with sample Python code, as shown below.

Several lines of code depicting a Python worksheet.

Click on the run button to check if the sample code works after setting database, schema and warehouse.

Step 2: Update Worksheet With New Code

Now let’s write a simple Python code that parses a JSON string using the JSONPath library and click on the run button.

				
					# Parse JSON using JSONPath


import snowflake.snowpark as snowpark
import json
from jsonpath_ng import jsonpath, parse


def main(session: snowpark.Session):


   json_string = '{"id":1, "country":"USA"}'
   json_data = json.loads(json_string)
   jsonpath_expression = parse('$.id')
   match = jsonpath_expression.find(json_data)
  
   return match

				
			

Snowflake will run the Python code and throw the following error.

An error message taken from Snowflake

This means that the Snowflake Anaconda package does not have this library jsonpath_ng as part of the internally supported list of packages and will need to load externally.

Step 3: Setup Required Python Packages

For this purpose, we will use Snowflake CLI from Snowflake Labs to package the required library.

				
					# Create demo directory
mkdir demo
cd demo


# Create conda environment with python 3.8


conda create -n snowpark python=3.8
conda activate snowpark


# Install necessary packages


pip install snowflake-connector-python==2.7.12
pip install snowflake-cli-labs


# Note: Snowsql must be installed if it is not installed already
# https://docs.snowflake.com/en/user-guide/snowsql-install-config


# Configure Snow CLI environment
snow connection add
# Fill in the following information
   Name for this connection: demo
   Snowflake account: <account name>
   Snowflake username: <user id>
   Snowflake password: <password>
   Wrote new connection demo to ~/.snowsql/config


# Configure Snowflake DB/Schema etc
snow configure
# Fill in the required information
   Snowflake database: dg_blog
   Snowflake schema: demo
   Snowflake role: sysadmin
   Snowflake warehouse: adhoc
   Configuring environment #dev...


# Verify if external package exists
snow package lookup jsonpath_ng


# Snowflake will if the package exists and returns the result. It will also tell if it requires any specific native library. In this case , it is ply, decorator and six

				
			

Now that we have verified that the package exists, we can create a package and load it into a Snowflake stage. We can do everything using Snowflake CLI.

				
					# Create required external Python package
snow package create jsonpath_ng


# This creates a package.zip file in the current folder. In this case jsonpath_ng.zip.


# Login to Snowflake and create stage
snow login
   Path to Snowsql config [~/.snowsql/config]:
   Connection name (for entry in snowsql config): demo
   Using connection name demo in ~/.snowsql/config


snow stage create pypackages
   ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
   ┃ status                                      ┃
   ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
   │ Stage area PYPACKAGES successfully created. │
   └─────────────────────────────────────────────┘


# Now upload the package to a stage location. Create Snowflake
snow package upload -f jsonpath_ng.zip -s pypackages
   Uploading jsonpath_ng.zip to Snowflake @pypackages/jsonpath_ng.zip...
   Package jsonpath_ng.zip UPLOADED to Snowflake @pypackages/jsonpath_ng.zip.

				
			

The package is now loaded into the Snowflake environment. Next, go back to the Snowsight worksheet and include the package as shown below. Click on Packages and select tab “Stage Packages” Fill in the package location along with the stage path and click on the Import button.

A popup window from Snowflake that shows several lines of code.

Before running, click on the Settings menu and change the return type.

Another popup window from Snowflake that contains several lines of code.

Click on the Run button. Now we can see that the Snowflake compiler is throwing an error that one of the native packages is missing.

An error message from Snowflake that has several lines of code.

Let’s return to the Packages menu and ensure we add the ply package under Anaconda Packages.

Step 4: Compile and Run

Now if we click on the Run button, Snowflake will compile and run the code. As we can see below, it is much simpler to develop, debug, and test Python code using the Snowsight worksheet.

Several lines of code

It is also easy to deploy or convert this Python code into a Snowflake Stored Procedure. Click on the “Deploy” button at the right corner to convert this into a stored procedure.

Conclusion

We hope this blog gives you an idea of how to use the Snowsight Python worksheet for development and then deploy them into a Stored Procedure. This feature significantly speeds up development without needing to set up an IDE or Jupyter kind environment locally. 

At phData, we have completed hundreds of implementations and migrations to Snowflake. If you have any questions or if you’re curious about how we can help migrate your workloads to Snowflake, don’t hesitate to contact us!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit