March 31, 2023

How Snowpark & Hex Supports Data Science

By Andrew Evans

As businesses continue to adopt Machine Learning at a rapid pace, their data scientists need tools that are all about time-to-value. With Hex, it has never been easier to unlock insights from your data. With Snowpark, those insights are efficiently processed and production-ready from day one.

In this blog, we will dive deeper into the capabilities of Snowpark for Python and how combining with Hex supports data science by providing faster and more efficient data processing and analytics capabilities on top of Hex’s easy notebook collaboration and sharing. 

So, let’s get started!

How Does Snowpark Enable Data Science?

Snowpark brings the full power of SnowSQL running in the Snowflake Data Cloud as well as the flexibility to bring your favorite transformation and machine learning packages to the data. Snowpark supports several languages. 

Since the most popular language for data science and ML is Python, the full Snowpark API is available in Python. This puts the power of Snowflake at the fingertips of data science practitioners—all with the familiarity of Spark-like commands.

Dataframe API

We can use the Snowpark dataframe API to codify complex transformation logic as well as build native apps. Data operations are built with an interface similar to Pandas or Spark. With Snowpark, operations are planned and optimized in SnowSQL. Computation is pushed down to the Warehouse once execution is required, i.e. a .collect() call.

UDF API

Snowpark also supports UDFs to develop logic using typical data science packages and execute on rows within a dataframe. User-defined functions can be written with traditional Python data packages. For ML inference, model artifacts can be loaded from a Snowflake stage, and run inference on rows passed to the UDF.

UDF API

Scalable Performance

Both APIs combine to form a complete set of tools for performing complex data transformations, including ML inference—all in a Snowflake Compute Warehouse. As business needs grow, the computation will efficiently scale out to meet that need. 

The Snowpark execution planner optimizes operations and runs them in parallel across a warehouse. The warehouse itself can dynamically scale with demand, saving costs when usage is less intensive, and ensuring responsiveness during peak need.

For data-intensive tasks like ML inference, Snowpark lets you send your logic to the data, eliminating the time-consuming process of copying data to where a model is hosted and copying inferences back to your database. With the UDF API, even ML inference tasks can scale out dynamically to right-size compute resources to match immediate demand.

How Does Hex Enable Data Science? 

Hex is a powerful and flexible notebooking environment with a ready-built Snowpark Python kernel. Hex also provides an easy connector with Snowflake, making it an incredibly simple and powerful way to perform analysis, prototype, and deploy data logic running on Snowflake. 

Hex notebooks are hosted, can be developed collaboratively, and integrate with GitHub for tracking. Even better, Hex automatically tracks cell I/O and constructs an execution DAG, allowing easy downstream/upstream cell reruns.

Flexible Execution

Hex logic can be built with SQL, Python, R, and no-code cells. Execution in Hex is robust and repeatable. Cell output consumption is tracked to automatically create a DAG of all of the cells. This makes sure that data is up-to-date and processed in the correct order, every time. High-level orchestration is also supported and Hex notebooks can be executed with Airflow.

Hex supports secure connections to your favorite data warehouse in Snowflake and has a ready-to-go Python Snowpark kernel. When leveraging Snowflake’s optimized compute is as easy as opening a Hex notebook, teams can move at incredible velocity.

Powerful UI

Hex provides several ways to view a notebook. You can view your notebook in a traditional fashion with the Logic tab, or see the cell execution DAG in the Graph tab. The App view allows for a selection of cell IO and markdown to be shown for quick dashboarding. 

Easy Sharing

At the development stage, multiple users can collaborate and develop simultaneously in the notebook and changes can be tracked and committed with git. Once it is ready, the App view can be published and shared with anyone that has a link, with or without a Hex account. The dashboard can be static or contain no-code input cells for a dynamic experience. 

Conclusion

Together, Hex and Snowpark are an epic combo for data teams. Hex delivers incredible time-to-value for data scientists exploring and sharing insights with their company’s data. With Snowpark, they can develop machine learning and complex transformations efficiently and rapidly deploy and scale with all of the computation pushed down to a Snowflake warehouse. 

Want to learn more? We’re hitting the road with Snowflake and giving hands-on labs around the US this Spring of 2023…Stay tuned to phData’s LinkedIn for more updates.

Can’t wait? Check out these blogs and reach out to our Data Science and ML team!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit