This article aims to answer many frequently asked questions about model registries. For a broader perspective on how model registries fit into an MLOps framework, check out the Tracking section of our 4 Pillars of MLOps: How to Deploy ML Models Into Production.
What is a Model Registry?
A model registry is a repository used to store and version trained machine learning (ML) models. Model registries greatly simplify the task of tracking models as they move through the ML lifecycle, from training to production deployments and ultimately retirement.
In addition to the models themselves, a model registry stores information (metadata) about the data and training jobs used to create the model. Tracking these requisite inputs is essential to establish lineage for ML models. In this way, a model registry serves a function analogous to version control systems (e.g. Git, SVN) and artifact repositories (e.g. Artifactory, PyPI) for traditional software.
Another way to think about model lineage is to consider all of the details that would be necessary to recreate a trained model from scratch. Establishing lineage through a model registry is a vital component of a robust MLOps architecture.
How Does a Model Registry Work?
Each model stored in a model registry is assigned a unique identifier, also known as a model ID or UUID. Many off-the-shelf registry tools also include a mechanism for tracking multiple versions of the same model. The model ID and version can be used for data science and ML teams to refer to specific models for comparison and confidence in deployment.
Registry tools also allow for storage of parameters or metrics. For instance, training and evaluation jobs could write hyperparameter values and performance metrics (e.g. accuracy) when registering a model. Storing these values allows for simple comparison of models. As they develop new models, having this data on hand can help teams see whether new versions of a model are improving upon previous versions. Many registry tools also include a graphical interface to visualize these parameters and metrics.
Parameters and metrics tracked by MLflow autologging for LightGBM.
Under the hood, model registries are generally comprised of the following elements:
- Object storage (such as Amazon S3 or Azure Blob Storage) to hold model artifacts and large binary files
- A structured or semi-structured database to store model metadata
- A graphical user interface (GUI) that can be used to inspect and compare trained models
- A programmatic API that can be used to retrieve model artifacts and metadata by specifying a model ID or query
What Can Go Wrong Without a Model Registry?
Without a model registry, data scientists and machine learning engineers are more likely to cut corners or make costly mistakes.
Here are some common pitfalls that we’ve seen:
Mislabeled model artifacts – Keeping track of which artifacts (files) came from which training job can be difficult. If these details are shared by email or instant message, it’s not hard for things to get crossed in the mail.
Lost or deleted data – Without keeping track of which datasets were used for what, teams will have no idea which datasets can be deleted and which can not.
Missing source code or unknown versions – Even good models will at times produce surprising or erroneous results. Without care, it is easy to lose track of the source code or know which version was used to train the model. This can lead to duplicated effort as the previous model is invalidated and a new one is trained to help understand the issue.
Undocumented model performance – As teams iterate, they will quickly end up with many versions of a model for a particular task. If model performance results are stored across different locations or notebooks, it can be hard to compare different versions.
What Information Should a Model Registry Store?
Key forms of information stored in a model registry fall into the following categories: software, data, metrics, and models.
A robust model registry should be able to store all details necessary to establish model lineage.
Model registry tools can also store input parameters to training jobs and performance metrics to enable comparisons between different models or versions of models. These elements can usually be captured completely by storing the following forms of information:
Software – The model registry should contain references to all software used to train the model. If custom code is used to transform data or train the model, the code should live in a separate version control system (e.g. Git) and the model registry should include the latest version ID from that system. Most projects also use external libraries or other software dependencies that must be tracked; this is a common oversight that can inhibit model reproducibility. Docker containers and Conda environments are common tools used to document and recreate software environments. When those tools are used, it makes sense to include a copy of the project Dockerfile or Conda environment file in the version control system or model registry.
Data – Since ML models learn their behavior based on data, reproducing a model requires access to the original training data. A model registry should contain a reference to a static copy, view, or snapshot of the original training data. Copies of data can be placed in object storage such as Amazon S3 and referenced in the model registry. If datasets are too large to be copied, organizations should consider advanced storage solutions such as S3 versioning or Apache Atlas to create snapshots and maintain lineage.
Metrics – Most model registry tools have a system for storing named parameters as key/value pairs. Storing the values of input parameters and model performance metrics can help quickly compare models when new versions are created. Training jobs should write all configurable input parameters to the model registry. After training, evaluation and performance metrics should be written to the registry to quickly see if new models are performing better or worse than previous versions. Some registry tools also automatically log advanced attributes like feature importance or loss curves.
Models – While the previous elements establish lineage, storing model artifacts themselves allows organizations to quickly deploy models. ML frameworks generally have some mechanism for preserving a model artifact (for example: exporting a Scikit-learn model to Python Pickle format or a Tensorflow model to its custom SavedModel format). These artifact files should be stored in the model registry so that models are ready to be deployed to production whenever the business needs are established.
How Does a Model Registry Help Data Scientists?
Model registry tools help data scientists by enabling reproducible research during model development. You can think of a good model registry as a specialized lab notebook for machine learning models. As such, they simplify the bookkeeping process for data scientists. By logging metrics, data, and software to a model registry, data scientists can quickly see how the changes they make impact model performance. From their observations, they can quickly move on to new experiments because their previous ones have already been documented in the registry.
Reproducible models are also easier to operationalize, which reduces friction for data scientists. A model in a registry is easier to hand off to engineering teams for deployment. When model artifacts are stored in a registry with lineage, the engineering team doesn’t need to invest effort into training the models using a more robust framework. If subsequent retraining is necessary, operations teams can take it on – or engineering teams can automate it, since the process is already documented. This frees up the time of data scientists to create new innovations rather than retraining old models.
Using a registry to track models may initially seem like an extra burden on data scientists, but they will quickly see that a small amount of extra code will greatly accelerate the work of data scientists.
Experiment Tracking vs. Model Registration
A good model registry should support both experient tracking and model registration. What’s the difference?
Data scientists often train many experimental models – for instance, when optimizing hyperparameters – many of which never make it to production. Experiment tracking is the practice of capturing models and metadata for each of those experiments.
Model registration, on the other hand, is the process of promoting a model beyond the experiment stage and putting that model on the path to production. To support both experiment tracking and model registration, a tool should allow models to be tagged as follows:
Experiment – An experiment can include one or more models (and metadata) regardless of whether they were ever promoted into an application.
Staging – A staged model (that was originally tracked as part of an experiment) has been promoted for use in an application and staged for testing and quality assurance, but not yet promoted to production.
Production – A production model is actively producing inferences in a production setting.
Archived – An archived model was once used in a production application and should be preserved for historical purposes according to governance policies.
How Does a Model Registry Fit into an MLOps Framework?
Model registries provide a common source of truth for referencing models and underlying versions. When data scientists communicate with engineering teams, they can use the unique ID stored in the registry to refer to a model with zero ambiguity. Similarly, applications can take the unique ID as a parameter in their deployment pipeline and fetch the associated artifacts from the registry to make updating models painless.
Lineage established in a model registry can eliminate the need for engineers to rewrite training code because details necessary to reproduce models are readily available. And if model retraining jobs also publish models and metrics, it is easy for any team member to track performance over time. Monitoring performance can establish the return on a particular ML investment and for justifying operational costs.
How Does a Model Registry Contribute to Governance, Compliance, and Auditing?
Model governance, compliance, and audits are increasingly important in the context of machine learning.
Regulations have been increasing in this space; for example, GDPR contains language about a consumer’s right to explanations. But, even without regulations, models may sometimes produce erroneous or confusing predictions that require audit. In these cases, you’ll need to trace the specific version of a model that generated such predictions, as well as the underlying training data. Proper use of a model registry ensures that this is possible.
Data governance can also create issues for ML models. Many models are trained using sensitive data, and some forms of data are required to be destroyed after some period of time. Other times, data might seem relevant and get deleted even though it was used to train a model. What happens to models that were trained using this data? A model registry can help organizations manage the dependence of their models on specific data and put appropriate guardrails around data governance.
What are Some Popular Model Registry Tools?
There are many model registry tools available, but the following tools can get you and your team to get started:
MLflow is an open-source platform for the ML lifecycle that includes a robust model-registry solution. Data scientists can track experiments and runs, with built-in tracking features for Git, Conda, and Docker. There are also logging plugins for common ML frameworks such as Scikit-learn, XGBoost, LightGBM, TensorFlow, and more. With a few lines of code in Python or R, users can have their experiments tracked with performance metrics, software dependencies, and model artifacts.
Sagemaker Model Registry is a good solution for users of AWS Sageamaker. You can register models with metadata such as metrics and data references, with model artifacts stored automatically in S3 for deployments. Sagemaker Model Registry integrates with AWS CI/CD tools for deployment automation and management of model approval status. It also integrates well with Sagemaker Endpoints for smooth and reliable deployments.
Neptune is an enterprise-grade model registry solution available as managed SaaS or for installation on your own hardware. In addition to the common model-registry features, it includes more sophisticated features for visualizing and analyzing models. This tool is a good choice for teams that want to pay more for a more refined solution.
Data Science Platforms such as Dataiku and DataRobot include model registry features. Models developed on these platforms can be easily stored to the registry, and lineage is automatically linked based on ML tools integrated within the platform. If you’re already using one of these tools, you may already have exposure to these features.
Using a model registry is a key component to building a robust MLOps framework. Model registries simplify research and development for data scientists and streamline the model deployment process. They also enable complex auditing and governance that would otherwise be virtually impossible.
If your team is interested in integrating a model registry into your MLOps framework but unsure how to start, check out our Ultimate MLOps Guide.