How to Load an AWS RDS Snapshot Into Snowflake

In this blog, we’re going to walk you through how to load data from AWS RDS Snapshot into the Snowflake Data Cloud. We will be:

Preparing our environments
Backing up our RDS data
Exporting our RDS backup to an S3 bucket
Creating a Snowflake stage
Querying data & creating tables

Why is Loading RDS Data Into Snowflake Important?

Relational databases are at the heart of your most critical applications. More often than not we find ourselves building architectures with many different types of databases and data that would be much more meaningful when aggregated in one place for analytics.

Throughout this blog, we will walk you through how to load data from Amazon Web Services (AWS) Relational Database Service (RDS) into Snowflake using an RDS Snapshot. The RDS Snapshot method is great for one time loads to explore the data available and determine overall value before setting up ongoing incremental ingestion mechanisms.

This solution will be using AWS S3 for the storage of data, and AWS IAM for managing access to data between services.

While this post uses an RDS instance with a Postgres Engine, these steps should work with any database engines supported through Amazon RDS (MySQL, PostgreSQL, MariaDB, Oracle, or SQLServer).

Now let’s begin.

Step 1: Setup Environment

Set Up Snowflake

- Before you start, you will need to have a Snowflake Virtual Warehouse up and running with a User that can connect and query the cluster. Here are some helpful links below to get you started:

- - Create a Snowflake account and cluster
  - Create a new user
Set Up RDS
- Here are some resources for setting up RDS
- Will need DBInstanceIdentifier (database name) for Step3
Set Up Sample Data
- Feel free to use your own data but for the sake of convenience when going through this post you can use the code snippet below.

Sample Table Used To Demo

				
					CREATE TABLE public.airline (
    "CODE" VARCHAR,
    "AIRLINE" VARCHAR
)
;

				
					INSERT INTO public.airline (code,airline) VALUES
    ('UA','United Air Lines Inc.'),
    ('AA','American Airlines Inc.'),
    ('US','US Airways Inc.'),
    ('F9','Frontier Airlines Inc.'),
    ('B6','JetBlue Airways'),
    ('OO','Skywest Airlines Inc.'),
    ('AS','Alaska Airlines Inc.'),
    ('NK','Spirit AirLines'),
    ('WN','Southwest Airlines Co.'),
    ('DL','Delta Air Lines Inc.'),
    ('EV','Atlantic Southeast Airlines'),
    ('HA','Hawaiian Airlines Inc.'),
    ('MQ','American Eagle Airlines Inc.'),
    ('VX','Virgin America')
;

Set Up S3
- Will need S3 url for Step4
  - s3://bucket-rds-s3-phdata
Create IAM role for rds to S3
- Create rds_exporter role
  - This is used to export an RDS backup to an S3 bucket
  - You will need the iam_rds_exporter_arn for Step3
- Create “rds_snowflake_policy” policy
  - This is used to allow Snowflake to read from S3 bucket
Create User with Pragmatic Access
- Attach rds_snowflake_policy role
- aws_key_id will be used by Step4
- aws_secret_key will be used by Step4
Create AWS Key
- This is used for exporting the RDS snapshot to an S3 bucket
- kms_key_id will be used by Step3
Install AWS CLI

Step 2: Take Snapshot of RDS

Will need “DBSnapshotArn” for Step3

				
					#!/bin/bash

timestamp=$(date +%Y-%m-%d-%H-%M-%S)
db_name="phdata-rds-snowflake"
db_backup="${db_name}-backup-${timestamp}"

aws rds create-db-snapshot \
    --db-instance-identifier ${db_name} \
    --db-snapshot-identifier ${db_backup}

Step 3: Export Snapshot to S3

				
					#!/bin/bash

timestamp=$(date +%Y-%m-%d-%H-%M-%S)
export_task_id="my-export"
db_snapshot_arn="********"
target_bucket_name="********" 
iam_rds_exporter_arn="********"
kms_key_id="********"

aws rds start-export-task \
    --export-task-identifier ${export_task_id} \
    --source-arn ${db_snapshot_arn} \
    --s3-bucket-name ${target_bucket_name} \
    --iam-role-arn ${iam_rds_exporter_arn} \
    --kms-key-id ${kms_key_id}

Step 4: Create STAGE for Querying AWS S3 Bucket

				
					CREATE OR REPLACE file format parquet_format   type = 'parquet';


CREATE OR REPLACE STAGE rds_s3_stage  
    file_format = parquet_format   
    credentials = (
        aws_key_id='********',
        aws_secret_key='********'
    )   
    url = 's3://bucket-rds-s3-phdata/my-export/phDataRDS';

LIST @rds_s3_stage;

Step 5: Confirm Access to Data in S3 Bucket by Querying Parquet

				
					SELECT * FROM 
    @rds_s3_stage/public.airline/part-00000-0f4cd80c-8d8d-46e5-a617-ce11368ebf33-c000.gz.parquet;

Step 6: Create Table and Insert S3 Data

				
					CREATE OR REPLACE TABLE staging.airlines AS
    Select 
        $1:airline::varchar as "airline",
        $1:code::varchar as "code"
    FROM 
        @rds_s3_stage/public.airline/part-00000-0f4cd80c-8d8d-46e5-a617-ce11368ebf33-c000.gz.parquet;

Confirm creation and insertion of data.

				
					SELECT * FROM staging.airlines;

Conclusion

In this post, we’ve demonstrated how to load data from an AWS RDS Snapshot into Snowflake. This gives you the flexibility to further develop your Analytics Infrastructure by being able to quickly load data from your Amazon RDS environment into an advanced analytics for initial exploration and determining the overall value.

This is a one time load of the RDS database and does not set up ongoing ingestion. For ongoing CDC ingestion from RDS to Snowflake check out solutions from our partner Fivetran/HVR.

If your team is interested in learning more about how to extract more value from your Snowflake account, please reach out! We’d love to help kick off your project!

If you’re looking for more helpful Snowflake resources, be sure to check out our free guide to getting started with Snowflake.

How to Load an AWS RDS Snapshot Into Snowflake

Why is Loading RDS Data Into Snowflake Important?

Step 1: Setup Environment

Step 2: Take Snapshot of RDS

Step 3: Export Snapshot to S3

Step 4: Create STAGE for Querying AWS S3 Bucket

Step 5: Confirm Access to Data in S3 Bucket by Querying Parquet

Step 6: Create Table and Insert S3 Data

Conclusion

More to explore

Data Prep to Production: Object Detection with LandingAI

Marketing Questions phData Can Answer with Data

Healthcare Questions phData Can Answer with Data

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit

Industries

Solutions

Company

Technology Partners

Other Technology Partners

Check out our latest insights

Data Prep to Production: Object Detection with LandingAI

Marketing Questions phData Can Answer with Data

Data Engineering

Consulting, Migrations, Data Pipelines, DataOps

Change Management, Enablement & Learning

COE, Coaching, PMO

Data Science and Machine Learning Services

MLOps Enablement, Prototyping, Model Development and Deployment

Strategy Services

Data, Analytics, and AI Strategy, Architecture and Assessments

Reporting, Analytics, and Visualization Services

Self-Service, Integrated Analytics, Dashboards, Automation

Elastic Operations

Data Platforms, Data Pipelines, and Machine Learning