Using AWS Sagemaker to Set Up a Production ML Pipeline: Part 1

Moving a machine-learning algorithm to production can be a headache. Actually, it is a headache and I doubt many ML engineers would disagree. But it’s a headache that we enjoy.

To better understand how to make this headache more successful on AWS Sagemaker, I will be outlining a few techniques to hopefully make it easier.

Amazon Sagemaker comes in two flavors:

AWS Sagemaker Studio
AWS Sagemaker Instances

AWS AI implementation is a lot simpler when you understand that. Something I didn’t get from the documentation when I started.

Amazon Sagemaker studio spins up a full pipeline across AWS services. It can be very buggy because it’s a new service. You will also have more service charges throughout the AWS ecosystem. It also may not play nicely with some of your package versions. If you go with Sagemaker studio, I recommend starting in it from the beginning.

It uses a version of AWS Code Pipelines. You can bring your own model in a docker image and that does help. Amazon Sagemaker instances automatically spins up a Jupyter notebook instance where you can run an ML model but will have set up automatic runs on EventBridge or via an API.

Setting up AWS Sagemaker Studio

It’s actually really simple to set up. By default, you do not have an active Sagemaker user role. You’ll need to set one up when you start. If you work on a team, you can set up a Team configuration as well. Talk to your Solutions Architect before messing around with permissions. If you search in the AWS search bar and click on the Sagemaker service, you will be led to this dashboard. That big orange button on the right is how you access the Sagemaker instance. They couldn’t have made it easier. You will have to set up a user to access the studio. Create a username and use the default execution role.

Once you’ve created it, you will be able to open the studio which looks very, very cool. Great, UX decision Amazon.

Once inside, you can set up a cool example project in the Jumpstart Menu area. It’s the first orange icon on the left-hand menu bar.

Setting up AWS Sagemaker Instance

Amazon who is on a mission to help Data Scientists everywhere decided that instead of having you spin up EC2 instances to run Jupyter Notebooks that they would provide you a simpler option.

Name your Instance and then choose your processing speed. The more compute power required the large instance you will need.

You have the ability to encrypt the notebook or access the notebook from only within a VPC adding more security to your instance.

Lastly, and one of my favorite features, is that you can upload an existing git repository. That repository can be stored on AWS CodeCommit or on GitHub. On GitHub, you will need to use AWS Secrets Manager to manage the password. You can also add Git Repositories on the main dashboard, but we’ll talk about it another time.

You can also run example ML models from the SageMaker notebooks. Just click ‘ SageMaker Examples’ and you can begin planning.

Which method to choose will depend on your goal and level of understanding of AWS. The studio is so full-featured that it can be more confusing to start. Instances will have you up and running quickly. You’d have to play with the automatic run pipeline which we will cover in a later lesson.

Thanks for reading! Do you have more questions about AWS SageMaker or Machine Learning? Talk to our expert consultants today and have all your questions answered!

Using AWS Sagemaker to Set Up a Production ML Pipeline: Part 1

Setting up AWS Sagemaker Studio

Setting up AWS Sagemaker Instance

More to explore

How to Use Maps in Sigma Computing

Alteryx Server 2024.2: New Features and How to Update

From Pipelines to Loops: How Fivetran + Census Reflects a Shift in Data Architecture

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit

Industries

Solutions

Company

Technology Partners

Other Technology Partners

Check out our latest insights

How to Use Maps in Sigma Computing

Alteryx Server 2024.2: New Features and How to Update

Data Engineering

Consulting, Migrations, Data Pipelines, DataOps

Change Management, Enablement & Learning

COE, Coaching, PMO

Data Science and Machine Learning Services

MLOps Enablement, Prototyping, Model Development and Deployment

Strategy Services

Data, Analytics, and AI Strategy, Architecture and Assessments

Reporting, Analytics, and Visualization Services

Self-Service, Integrated Analytics, Dashboards, Automation

Elastic Operations

Data Platforms, Data Pipelines, and Machine Learning