What is a Feature Store?

This blog was originally written by Dominick Rocco and updated for 2023 by Lawrence Liu.

You’ve got machine learning questions, we’ve got machine learning answers.

In this blog post, we’ll explore what a feature store is in the first place, explore a few of the key advantages (and disadvantages), and touch on when is the right time for your organization to build or adopt a feature store.

But first, let’s equip you with a brief background of a feature.

What is a Feature in ML?

Machine learning (ML) models learn to make predictions based on past examples. For the vast majority of use cases, the data used by ML models can be visualized as a table where rows are examples and columns are attributes describing those examples.

A feature is an attribute used to describe each example.

ML models are effective because they learn to generate predictions for new examples; this process is called inference. The examples used during the learning process are commonly referred to as training data. In a process known as feature engineering, data scientists apply transformations to raw data to create features suitable for ML models to consume.

Example dataset that could be used for an ML model. Each row is an example that can be used for training. The columns represent features, and each cell would be called a feature value.

What is a Feature Store?

A feature store is a tool for storing commonly used features. When data scientists develop features for a machine learning model, those features can be added to the feature store. This makes those features available for reuse.

When new examples (e.g. users of an application, customers of a business, or items in a product catalog) arrive, the data can be pre-computed and stored as features in your feature store so that features are readily available for training or inference.

A feature store aims to be a solution for feature management and feature consistency.

Feature management is the ability to maintain a registry of existing features, allowing teams to store, discover, and reuse features for model training and inference.

Feature consistency is the ability to sync online and offline data stores to ensure consistent feature values.

Feature Stores vs. Feature Platforms

Many open-source feature stores sit on top of your existing data warehouse to provide feature management and feature consistency capabilities.

A downside with feature stores is that the tool does not have the ability to conduct feature computation. Adopting a feature store like Feast would require you to have a separate feature engineering pipeline solution to update your features.

Then, there are feature platforms, a managed solution for feature stores.

Feature platforms offer all the capabilities of feature stores and more. A major value add of feature platforms is the feature engine, which provides the ability to define feature computation logic and orchestrate the transformations to run without the need for third-party orchestration tools like Airflow.

What are the Advantages and Disadvantages of a Feature Store?

Pros

Feature reusability. Feature stores offer the ability to reuse and discover features, enhancing collaboration between teams.
Quicker time to value. Pre-computed features for training and inference with a uniform interface to interact with features.
Data platform abstraction. Data scientists and ML engineers do not need to worry about feature value consistency between online and offline data stores.
Idempotency. Retrieve feature values based on a point in time, allowing for the repeatability of previous training or inference jobs.

Cons

Complex integration. Feature stories can require the integration of diverse technologies, such as data warehouses, streaming pipelines, and processing engines.
Lacking feature computation. Requires a bring-your-own feature engineering solution.
Use case limitations. Feature stores may be limited in the type of features they can store.

When Should an Organization Build or Adopt a Feature Store?

Feature stores are especially powerful when an organization has many machine-learning use cases with models based on common entities (i.e. customers).

When the same type of example is used for many applications, it makes a lot of sense to reuse features across many models. In these cases, data scientists develop features for a single model, and then add them to the feature store for use by other models or analyses.

What Are Some Popular Feature Store Tools?

Feast is an open-source feature store used to manage features. Feast does not compute features or stream new data, it just tracks features and retrieves them for training or inference. It does not store data, but simply manages data stored in other data sources like Google BigQuery, Google Cloud Storage (GCS), and Amazon S3. Feast can run natively on Google Cloud Platform (GCP), or on Kubernetes in AWS. Future releases aim to bring enhanced support for AWS. Since it’s open-source, it is free to use, but may come with a higher learning curve to work with associated storage and transformation technologies.

Tecton is an enterprise-grade feature store built on top of Feast. In addition to exposing features through its UI and SDK, it also provides a platform for orchestrating feature transformation pipelines. As a fully managed feature platform, Tecton comes with a higher price tag than open-source Feast, but it provides a best-of-breed developer experience for data scientists and ML engineers, and it can be a great accelerator for AI/ML initiatives.
Hopsworks is another enterprise-grade feature store that can manage the transformations, storage, and retrieval/serving of features. It can run on an exceptionally wide array of infrastructure options, including AWS, Azure, GCP, Kubernetes, or even on-premise hardware. It also supports a vast selection of data sources, including the Snowflake Data Cloud, Redshift, and HDFS. Like Tecton, Hopsworks includes a web UI for browsing and exploring existing features. Hopsworks has both open-source (free) and supported (paid) offerings.

Takeaways

Feature stores are a very powerful tool available for organizations that intend to build many models based on one or a few entities (e.g. users, customers, products).

If you find yourself continually repeating efforts to code up feature transformations or copying and pasting feature-engineering code from project to project, a feature store could greatly simplify your life.

For more information on features, be sure to check out our Ultimate MLOps Guide, which has a robust section dedicated to feature stores and so much more.

What is a Feature Store?

What is a Feature in ML?

What is a Feature Store?

Feature Stores vs. Feature Platforms

What are the Advantages and Disadvantages of a Feature Store?

Pros

Cons

When Should an Organization Build or Adopt a Feature Store?

What Are Some Popular Feature Store Tools?

Takeaways

More to explore

How to Use Maps in Sigma Computing

Alteryx Server 2024.2: New Features and How to Update

From Pipelines to Loops: How Fivetran + Census Reflects a Shift in Data Architecture

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit

Industries

Solutions

Company

Technology Partners

Other Technology Partners

Check out our latest insights

How to Use Maps in Sigma Computing

Alteryx Server 2024.2: New Features and How to Update

Data Engineering

Consulting, Migrations, Data Pipelines, DataOps

Change Management, Enablement & Learning

COE, Coaching, PMO

Data Science and Machine Learning Services

MLOps Enablement, Prototyping, Model Development and Deployment

Strategy Services

Data, Analytics, and AI Strategy, Architecture and Assessments

Reporting, Analytics, and Visualization Services

Self-Service, Integrated Analytics, Dashboards, Automation

Elastic Operations

Data Platforms, Data Pipelines, and Machine Learning