January 1, 2022

Using AWS Sagemaker to Set Up a Production ML Pipeline: Part 1

By Christina Bernard

Moving a machine-learning algorithm to production can be a headache. Actually, it is a headache and I doubt many ML engineers would disagree. But it’s a headache that we enjoy. 

To better understand how to make this headache more successful on AWS Sagemaker, I will be outlining a few techniques to hopefully make it easier.

Amazon Sagemaker comes in two flavors:

  • AWS Sagemaker Studio
  • AWS Sagemaker Instances

AWS AI implementation is a lot simpler when you understand that. Something I didn’t get from the documentation when I started. 

Amazon Sagemaker studio spins up a full pipeline across AWS services. It can be very buggy because it’s a new service. You will also have more service charges throughout the AWS ecosystem. It also may not play nicely with some of your package versions. If you go with Sagemaker studio, I recommend starting in it from the beginning. 

It uses a version of AWS Code Pipelines. You can bring your own model in a docker image and that does help. Amazon Sagemaker instances automatically spins up a Jupyter notebook instance where you can run an ML model but will have set up automatic runs on EventBridge or via an API. 

Setting up AWS Sagemaker Studio

It’s actually really simple to set up. By default, you do not have an active Sagemaker user role. You’ll need to set one up when you start. If you work on a team, you can set up a Team configuration as well. Talk to your  Solutions Architect before messing around with permissions. If you search in the AWS search bar and click on the Sagemaker service, you will be led to this dashboard. That big orange button on the right is how you access the Sagemaker instance. They couldn’t have made it easier. You will have to set up a user to access the studio. Create a username and use the default execution role. 

Once you’ve created it, you will be able to open the studio which looks very, very cool. Great, UX decision Amazon.

Once inside, you can set up a cool example project in the Jumpstart Menu area. It’s the first orange icon on the left-hand menu bar.

Setting up AWS Sagemaker Instance

Amazon who is on a mission to help Data Scientists everywhere decided that instead of having you spin up EC2 instances to run Jupyter Notebooks that they would provide you a simpler option. 

Name your Instance and then choose your processing speed. The more compute power required the large instance you will need. 

You have the ability to encrypt the notebook or access the notebook from only within a VPC adding more security to your instance. 

Lastly, and one of my favorite features, is that you can upload an existing git repository. That repository can be stored on AWS CodeCommit or on GitHub. On GitHub, you will need to use AWS Secrets Manager to manage the password. You can also add Git Repositories on the main dashboard, but we’ll talk about it another time. 

You can also run example ML models from the SageMaker notebooks. Just click ‘ SageMaker Examples’ and you can begin planning.

Which method to choose will depend on your goal and level of understanding of AWS. The studio is so full-featured that it can be more confusing to start. Instances will have you up and running quickly. You’d have to play with the automatic run pipeline which we will cover in a later lesson.

Thanks for reading! Do you have more questions about AWS SageMaker or Machine Learning? Talk to our expert consultants today and have all your questions answered!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit