The Snowflake AI Data Cloud has been on a roll, building predictive ML capabilities into their platform. With so many teams eager to work with these new products, one of the most common questions we get is: “How do I turn this thing on?”
The great news is that Snowflake builds all of its products with the same core platform technologies and charges through the same warehouse and cloud services credits teams are familiar with.
For IT and Data teams looking to better understand how to manage these new Snowflake objects, we’ve put together a cheat sheet for each, revealed what they’re made of, and uncovered any permissions to be aware of.
Predictive ML in Snowflake
Snowflake’s predictive ML platform covers the lifecycle of a typical machine learning project. In this cheat sheet, we call out four core units: Preprocessing, Feature Store, Training & Inferencing, and the Model Registry.
Preprocessing
Snowflake provides Pythonic ways to efficiently process tons of data, scaled out across Snowflake compute warehouses. The interface supports SparkSQL and Pandas-style functions that lazy-execute.
What is it?
These are pure APIs. You’ll just need access to the data you’re working with and a warehouse to compute from. Snowpark-optimized warehouses are recommended if you need more memory.
Good-to-Knows
Snowpark Pandas dataframes are not currently supported objects in the model training API. You’ll need to convert them first.
You can do a lot of stuff with UDFs. These can be written in Java, JS, Python, Scala, SQL.
GPU workloads aren’t currently supported, but you absolutely can run them in Snowflake using Snowpark Container Services jobs. See our blog on SPCS.
Feature Store
Feature Stores are the bread to a data scientist’s sandwich. They might not always get the most attention, but without ML-modeling-specific dataset management, things quickly become a sloppy mess. Snowflake’s Feature Store manages ML feature definitions, computation, and point-in-time correctness to ensure models predict and train on consistent data. It also allows data scientists to collaborate on critical data features that make their projects tick.
What is it?
A Snowflake Enterprise Edition capability, built out of a few everyday Snowflake objects. There’s also Snowsight UI for it.
Good-to-Knows
Feature Store is a schema. You can use an existing one or make a new one.
Feature Views are tables of features organized by common entities and grain. There are two types:
Snowflake Managed – defined as a dynamic table.
Externally Managed – just a regular on a table managed by your team.
Entities – like a user-id or a product-id, define part of the grain of a Feature View.
Entities are tags managed by Snowflake.
Training & Inferencing
Model training and inferencing on Snowflake let you build your favorite models, say with xgBoost, entirely within the Snowflake environment. Complex tasks like hyperparameter optimization are now straightforward as each training run can seamlessly scale across a compute warehouse.
What is it?
Pure APIs again. Models run as a temporary store procedure bundled by Snowflake within their code. Artifacts/files from model training are stored in stages. Once a model is trained, it can be put in the Snowflake model registry. Inferenceing is done within that API as well.
Model Registry
A powerful centralized repository for managing and storing trained models, complete with associated metrics and version history.
What is it?
Machine learning models are schema-level objects. Snowflake manages the schema’s contents, which are most easily accessed through the Model Registry API. At a high level, the schema will contain stages for model artifacts/files and tables for metrics.
Good-to-Knows
Models can also have multiple versions.
You must own the schema or have
CREATE MODEL
on it to create a model.To use a model, you must either own it or have
USAGE
.USAGE
does not allow you to see the internals of the model.
Models do not currently support replication, nor can they be shared or cloned.
Models can be moved with a
RENAME TO
If you have ownership, you can do underlying operations to view and pull artifacts within a model.
Next Steps
If your team is ready to take action and expand its capabilities on Snowflake and Snowpark, we highly recommend attending one of our free hands-on Generative AI workshops. You’ll see firsthand what a generative AI use case for the Snowflake Feature Store can do, plus much more!
Lastly, phData made an appearance on a recent webinar with Snowflake, where we discussed how to build a personalized chatbot for Stride, a leader in remote and online learning. This webinar is fantastic to watch, especially if you’re curious about:
How to build consistent downstream ML pipelines that are continuously updated on fresh data with Snowflake Feature Store.
How Stride uses Snowflake Feature Store for fully contextualized chats and automatic personalization that learns from interactions.
Tips for getting started with Snowflake Feature Store.