June 22, 2022

5 Data Science & Machine Learning Takeaways from Snowflake Summit 2022

By Dominick Rocco

Snowflake Summit 2022 created a whirlwind of announcements and excitement within the Snowflake Data Cloud ecosystem.  One clear trend at Summit was a focus on features for data science and machine learning workloads within Snowflake.  Let’s take a look at some of the most exciting features announced at Summit this year. 

Snowpark Python is in Public Preview

Snowpark Python is Snowflake’s new Python interface for working with data within Snowflake.  It will allow users to write complex transformations in Python and run those workloads on Snowflake compute.  

Snowpark Python has two key components: an API that allows users to perform SQL-style operations across entire tables, and user-defined functions (UDFs) that perform custom logic on individual rows.  The Snowpark API is great for feature engineering and analysis, while UDFs allow machine learning models to perform inference right inside Snowflake.  

Snowpark Python has been in private preview for the last few months but is now available for all Snowflake users to try in public preview. 

A graphic titled, "UDFs VS API" with a few snippets of code and a graph.

Python Worksheets

Snowflake announced that it will be adding Python Worksheets to the Snowflake UI.  This means that users will be able to write and run Python code right inside of the Snowflake UI, much like a notebook environment.  Python worksheets will be a great environment to work with Snowpark Python and will form a rich platform for data scientists to perform ad hoc analyses on Snowflake.

Streamlit Integration

Earlier this year, Snowflake announced its acquisition of Streamlit. Streamlit is a powerful framework for composing rich interactive visualizations by writing Python code.  While Streamlit apps can be complex and dynamic, the simplest usage of Streamlit looks a lot like Python notebook output.  

At Summit, we got a lot more detail on how Streamlit will be integrated into Snowflake.  When users incorporate Streamlit into Python Worksheets in Snowflake, the corresponding apps will be rendered right there in the UI.  Users will then be able to deploy those Streamlit apps on Snowflake as Native Apps and make them available to others through role-based authentication, just like data in Snowflake. 

Large Memory Instances

Another important announcement was support for Large Memory Instances.  ML and data science workloads generally need more memory than traditional data transformation processes.  This is because models themselves have many weights or other attributes that are learned during the training process.  

Snowflake will add support for Large Memory Instances to make sure that ML workloads have all the memory they need to perform training or inference.

Geospatial Improvements

Many data science workloads depend on geospatial data.  At Summit, Snowflake announced two major enhancements for geospatial workloads.  First, they have added a planar coordinate system for geospatial data types to complement the existing round-earth coordinate system.  This will allow data represented in planar coordinates to be used natively within Snowflake.  On top of that, Snowflake announced optimizations that speed up geospatial query performance by a factor of five.

Conclusion

Summit 2022 kicked off an exciting time for data scientists using Snowflake.  In the coming year, these features will combine to create a rich environment for both ad hoc analysis, model development, and model deployment.  

If you’ve got questions on how to best leverage these features, reach out to us at phData to get started!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit