What is a Feature Store?

You’ve got machine learning questions, we’ve got machine learning answers.

In this blog post, we’ll explore what a feature store is in the first place, explore a few of the key advantages (and disadvantages) of them, and touch on when is the right time for your organization to build or adopt a feature store.

But first, let’s equip you with a brief background of a feature. 

What is a Feature in ML?

Machine learning (ML) models learn to make predictions based on past examples. For the vast majority of use cases, the data used by ML models can be visualized as a table where rows are examples and columns are attributes describing those examples. A feature is an attribute used to describe each example. ML models are effective because they learn to generate predictions for new examples; this process is called inference. The examples used during the learning process are commonly referred to as training data. In a process known as feature engineering, data scientists apply transformations to raw data to create features suitable for ML models to consume.
A sample data set used to illustrate a feature within an ML model

Example dataset that could be used for an ML model. Each row is an example that can be used for training. The columns represent features, and each cell would be called a feature value. 

What is a Feature Store?

A feature store is a tool for storing commonly used features. When data scientists develop features for a machine learning model, those features can be added to the feature store. This makes those features available for reuse. 

When new examples (e.g. users of an application, customers of a business, or items in a product catalog) are added, the previously developed features will be pre-computed so that the features are available for inference.

A full-fledged feature store:

a short diagram displaying an overview of what a feature store looks like

Feature stores pull data from enterprise data warehouses or streaming applications.  The data is transformed to produce features for ML models and applications, then it’s stored in the feature store. Those features can then be retrieved and served to model training jobs or scoring applications.

What are the Advantages and Disadvantages of a Feature Store?

Pros

Cons

When Should an Organization Build or Adopt a Feature Store?

Feature stores are especially powerful when an organization intends to build many models based on a common entity, such as customers, users, members, products, or items.  

When the same type of example is used for many applications, it makes a lot of sense to reuse features across many models. In these cases, data scientists develop features for a single model, and then add them to the feature store for use by other models or analyses.

What Are Some Popular Feature Store Tools?

  • Feast is an open-source feature store used to manage features. Feast does not compute features or stream new data, it just tracks features and retrieves them for training or inference. It does not store data, but simply manages data stored in other data sources like Google BigQuery, Google Cloud Storage (GCS), and Amazon S3. Feast can run natively on Google Cloud Platform (GCP), or on Kubernetes in AWS. Future releases aim to bring enhanced support for AWS. Since it’s open-source, it is free to use, but may come with a higher learning curve to work with associated storage and transformation technologies.
  • Tecton is an enterprise-grade feature store built on top of Feast. It adds features to make Feast more manageable for organizations, such as storage of the underlying features and execution of pipelines for transformation. It also includes a web UI to browse and explore features. As a managed solution on top of Feast, Tecton comes with a higher price tag but will be easier for organizations to adopt and leverage.
  • Hopsworks is another enterprise-grade feature store that can manage the transformations, storage, and retrieval/serving of features. It can run on an exceptionally wide array of infrastructure options, including AWS, Azure, GCP, Kubernetes, or even on-premise hardware. It also supports a vast selection of data sources, including Snowflake, Redshift, and HDFS. Like Tecton, Hopsworks includes a web UI for browsing and exploring existing features.  Hopsworks has both open-source (free) and supported (paid) offerings.

Takeaways

Feature stores are a very powerful tool available for organizations that intend to build many models based on one or a few entities (e.g. users, customers, products).  The key benefit of a feature store is that it encapsulates the logic of feature transformations to automatically transform new data and serve up examples for training or inference. If you find yourself continually repeating effort to code up feature transformations or copying and pasting feature-engineering code from project to project, a feature store could greatly simplify your life.

For more information on features, be sure to check out our Ultimate MLOps Guide that has a robust section dedicated to feature stores and so much more.

Accelerate and automate your data projects with the phData Toolkit

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.