October 2, 2023

What is a Feature Store?

By Dominick Rocco

This blog was originally written by Dominick Rocco and updated for 2023 by Lawrence Liu.

You’ve got machine learning questions, we’ve got machine learning answers.

In this blog post, we’ll explore what a feature store is in the first place, explore a few of the key advantages (and disadvantages), and touch on when is the right time for your organization to build or adopt a feature store.

But first, let’s equip you with a brief background of a feature. 

What is a Feature in ML?

Machine learning (ML) models learn to make predictions based on past examples. For the vast majority of use cases, the data used by ML models can be visualized as a table where rows are examples and columns are attributes describing those examples. 

A feature is an attribute used to describe each example.

ML models are effective because they learn to generate predictions for new examples; this process is called inference. The examples used during the learning process are commonly referred to as training data. In a process known as feature engineering, data scientists apply transformations to raw data to create features suitable for ML models to consume.

A sample data set used to illustrate a feature within an ML model

Example dataset that could be used for an ML model. Each row is an example that can be used for training. The columns represent features, and each cell would be called a feature value. 

What is a Feature Store?

A feature store is a tool for storing commonly used features. When data scientists develop features for a machine learning model, those features can be added to the feature store. This makes those features available for reuse. 

When new examples (e.g. users of an application, customers of a business, or items in a product catalog) arrive, the data can be pre-computed and stored as features in your feature store so that features are readily available for training or inference.

A feature store aims to be a solution for feature management and feature consistency.

Feature management is the ability to maintain a registry of existing features, allowing teams to store, discover, and reuse features for model training and inference.

Feature consistency is the ability to sync online and offline data stores to ensure consistent feature values. 

An example diagram of a Feature Store that features dbt and Snowflake.
Feature stores capture features from enterprise data warehouses or streaming applications in an online and offline store, syncing the values between the two stores. Those features can then be retrieved and served for model training jobs or inference applications.

Feature Stores vs. Feature Platforms

Many open-source feature stores sit on top of your existing data warehouse to provide feature management and feature consistency capabilities.

A downside with feature stores is that the tool does not have the ability to conduct feature computation. Adopting a feature store like Feast would require you to have a separate feature engineering pipeline solution to update your features.

Then, there are feature platforms, a managed solution for feature stores.

Feature platforms offer all the capabilities of feature stores and more. A major value add of feature platforms is the feature engine, which provides the ability to define feature computation logic and orchestrate the transformations to run without the need for third-party orchestration tools like Airflow.

What are the Advantages and Disadvantages of a Feature Store?

Pros

Cons

When Should an Organization Build or Adopt a Feature Store?

Feature stores are especially powerful when an organization has many machine-learning use cases with models based on common entities (i.e. customers).

When the same type of example is used for many applications, it makes a lot of sense to reuse features across many models. In these cases, data scientists develop features for a single model, and then add them to the feature store for use by other models or analyses.

What Are Some Popular Feature Store Tools?

  • Feast is an open-source feature store used to manage features. Feast does not compute features or stream new data, it just tracks features and retrieves them for training or inference. It does not store data, but simply manages data stored in other data sources like Google BigQuery, Google Cloud Storage (GCS), and Amazon S3. Feast can run natively on Google Cloud Platform (GCP), or on Kubernetes in AWS. Future releases aim to bring enhanced support for AWS. Since it’s open-source, it is free to use, but may come with a higher learning curve to work with associated storage and transformation technologies.
  • Tecton is an enterprise-grade feature store built on top of Feast. In addition to exposing features through its UI and SDK, it also provides a platform for orchestrating feature transformation pipelines.  As a fully managed feature platform, Tecton comes with a higher price tag than open-source Feast, but it provides a best-of-breed developer experience for data scientists and ML engineers, and it can be a great accelerator for AI/ML initiatives.
  • Hopsworks is another enterprise-grade feature store that can manage the transformations, storage, and retrieval/serving of features. It can run on an exceptionally wide array of infrastructure options, including AWS, Azure, GCP, Kubernetes, or even on-premise hardware. It also supports a vast selection of data sources, including the Snowflake Data Cloud, Redshift, and HDFS. Like Tecton, Hopsworks includes a web UI for browsing and exploring existing features.  Hopsworks has both open-source (free) and supported (paid) offerings.

Takeaways

Feature stores are a very powerful tool available for organizations that intend to build many models based on one or a few entities (e.g. users, customers, products).  

If you find yourself continually repeating efforts to code up feature transformations or copying and pasting feature-engineering code from project to project, a feature store could greatly simplify your life.

For more information on features, be sure to check out our Ultimate MLOps Guide, which has a robust section dedicated to feature stores and so much more.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit