You’ve got machine learning questions, we’ve got machine learning answers.
In this blog post, we’ll explore what a feature store is in the first place, explore a few of the key advantages (and disadvantages) of them, and touch on when is the right time for your organization to build or adopt a feature store.
But first, let’s equip you with a brief background of a feature.
What is a Feature in ML?
Example dataset that could be used for an ML model. Each row is an example that can be used for training. The columns represent features, and each cell would be called a feature value.
What is a Feature Store?
A feature store is a tool for storing commonly used features. When data scientists develop features for a machine learning model, those features can be added to the feature store. This makes those features available for reuse.
When new examples (e.g. users of an application, customers of a business, or items in a product catalog) are added, the previously developed features will be pre-computed so that the features are available for inference.
A full-fledged feature store:
- Transforms raw data into feature values by executing data pipelines.
- Stores and manages feature values.
- Retrieves data for training or inference.
Feature stores pull data from enterprise data warehouses or streaming applications. The data is transformed to produce features for ML models and applications, then it’s stored in the feature store. Those features can then be retrieved and served to model training jobs or scoring applications.
What are the Advantages and Disadvantages of a Feature Store?
Pros
- Feature reusability. Enhanced collaboration between teams
- Quicker time to value. Features are already computed for training or inference
- Centralization of complex logic. Data scientists and ML engineers won’t need to worry about calculating complex feature values
- Monitoring potential. Feature stores can support health monitoring and drift detection to observe issues with features before they propagate to ML model predictions
Cons
- Potential inflexibility. Organizations need a different feature store for each type of entity
- Complex integration. They can require the integration of diverse technologies, such as data warehouses, streaming pipelines, and processing engines.
- Limits customization of models. Different applications may benefit from different feature encodings that would be overlooked when all are using the same feature store
When Should an Organization Build or Adopt a Feature Store?
Feature stores are especially powerful when an organization intends to build many models based on a common entity, such as customers, users, members, products, or items.
When the same type of example is used for many applications, it makes a lot of sense to reuse features across many models. In these cases, data scientists develop features for a single model, and then add them to the feature store for use by other models or analyses.
What Are Some Popular Feature Store Tools?
- Feast is an open-source feature store used to manage features. Feast does not compute features or stream new data, it just tracks features and retrieves them for training or inference. It does not store data, but simply manages data stored in other data sources like Google BigQuery, Google Cloud Storage (GCS), and Amazon S3. Feast can run natively on Google Cloud Platform (GCP), or on Kubernetes in AWS. Future releases aim to bring enhanced support for AWS. Since it’s open-source, it is free to use, but may come with a higher learning curve to work with associated storage and transformation technologies.
- Tecton is an enterprise-grade feature store built on top of Feast. It adds features to make Feast more manageable for organizations, such as storage of the underlying features and execution of pipelines for transformation. It also includes a web UI to browse and explore features. As a managed solution on top of Feast, Tecton comes with a higher price tag but will be easier for organizations to adopt and leverage.
- Hopsworks is another enterprise-grade feature store that can manage the transformations, storage, and retrieval/serving of features. It can run on an exceptionally wide array of infrastructure options, including AWS, Azure, GCP, Kubernetes, or even on-premise hardware. It also supports a vast selection of data sources, including Snowflake, Redshift, and HDFS. Like Tecton, Hopsworks includes a web UI for browsing and exploring existing features. Hopsworks has both open-source (free) and supported (paid) offerings.
Takeaways
Feature stores are a very powerful tool available for organizations that intend to build many models based on one or a few entities (e.g. users, customers, products). The key benefit of a feature store is that it encapsulates the logic of feature transformations to automatically transform new data and serve up examples for training or inference. If you find yourself continually repeating effort to code up feature transformations or copying and pasting feature-engineering code from project to project, a feature store could greatly simplify your life.
For more information on features, be sure to check out our Ultimate MLOps Guide that has a robust section dedicated to feature stores and so much more.