August 22, 2022

How to Load an AWS RDS Snapshot Into Snowflake

By José Almonte

In this blog, we’re going to walk you through how to load data from AWS RDS Snapshot into the Snowflake Data Cloud. We will be:

  1. Preparing our environments
  2. Backing up our RDS data
  3. Exporting our RDS backup to an S3 bucket
  4. Creating a Snowflake stage 
  5. Querying data & creating tables

Why is Loading RDS Data Into Snowflake Important?

Relational databases are at the heart of your most critical applications. More often than not we find ourselves building architectures with many different types of databases and data that would be much more meaningful when aggregated in one place for analytics.

Throughout this blog, we will walk you through how to load data from Amazon Web Services (AWS) Relational Database Service (RDS) into Snowflake using an RDS Snapshot. The RDS Snapshot method is great for one time loads to explore the data available and determine overall value before setting up ongoing incremental ingestion mechanisms.

This solution will be using AWS S3 for the storage of data, and AWS IAM for managing access to data between services. 

While this post uses an RDS instance with a Postgres Engine, these steps should work with any database engines supported through Amazon RDS (MySQL, PostgreSQL, MariaDB, Oracle, or SQLServer). 

Now let’s begin.

Step 1: Setup Environment

  • Set Up Snowflake
    • Before you start, you will need to have a Snowflake Virtual Warehouse up and running with a User that can connect and query the cluster. Here are some helpful links below to get you started:

Sample Table Used To Demo

				
					CREATE TABLE public.airline (
    "CODE" VARCHAR,
    "AIRLINE" VARCHAR
)
;
				
			
				
					INSERT INTO public.airline (code,airline) VALUES
    ('UA','United Air Lines Inc.'),
    ('AA','American Airlines Inc.'),
    ('US','US Airways Inc.'),
    ('F9','Frontier Airlines Inc.'),
    ('B6','JetBlue Airways'),
    ('OO','Skywest Airlines Inc.'),
    ('AS','Alaska Airlines Inc.'),
    ('NK','Spirit AirLines'),
    ('WN','Southwest Airlines Co.'),
    ('DL','Delta Air Lines Inc.'),
    ('EV','Atlantic Southeast Airlines'),
    ('HA','Hawaiian Airlines Inc.'),
    ('MQ','American Eagle Airlines Inc.'),
    ('VX','Virgin America')
;
				
			

Step 2: Take Snapshot of RDS

  • Will need “DBSnapshotArn” for Step3
				
					#!/bin/bash

timestamp=$(date +%Y-%m-%d-%H-%M-%S)
db_name="phdata-rds-snowflake"
db_backup="${db_name}-backup-${timestamp}"

aws rds create-db-snapshot \
    --db-instance-identifier ${db_name} \
    --db-snapshot-identifier ${db_backup}
				
			
‘linux-output

Step 3: Export Snapshot to S3

				
					#!/bin/bash

timestamp=$(date +%Y-%m-%d-%H-%M-%S)
export_task_id="my-export"
db_snapshot_arn="********"
target_bucket_name="********" 
iam_rds_exporter_arn="********"
kms_key_id="********"

aws rds start-export-task \
    --export-task-identifier ${export_task_id} \
    --source-arn ${db_snapshot_arn} \
    --s3-bucket-name ${target_bucket_name} \
    --iam-role-arn ${iam_rds_exporter_arn} \
    --kms-key-id ${kms_key_id}

				
			
linux-output

Step 4: Create STAGE for Querying AWS S3 Bucket

				
					CREATE OR REPLACE file format parquet_format   type = 'parquet';


CREATE OR REPLACE STAGE rds_s3_stage  
    file_format = parquet_format   
    credentials = (
        aws_key_id='********',
        aws_secret_key='********'
    )   
    url = 's3://bucket-rds-s3-phdata/my-export/phDataRDS';

LIST @rds_s3_stage;
				
			

Step 5: Confirm Access to Data in S3 Bucket by Querying Parquet

				
					SELECT * FROM 
    @rds_s3_stage/public.airline/part-00000-0f4cd80c-8d8d-46e5-a617-ce11368ebf33-c000.gz.parquet;
				
			

Step 6: Create Table and Insert S3 Data

				
					CREATE OR REPLACE TABLE staging.airlines AS
    Select 
        $1:airline::varchar as "airline",
        $1:code::varchar as "code"
    FROM 
        @rds_s3_stage/public.airline/part-00000-0f4cd80c-8d8d-46e5-a617-ce11368ebf33-c000.gz.parquet;
				
			

Confirm creation and insertion of data.

				
					SELECT * FROM staging.airlines;
				
			

Conclusion

In this post, we’ve demonstrated how to load data from an AWS RDS Snapshot into Snowflake. This gives you the flexibility to further develop your Analytics Infrastructure by being able to quickly load data from your Amazon RDS environment into an advanced analytics for initial exploration and determining the overall value. 

This is a one time load of the RDS database and does not set up ongoing ingestion. For ongoing CDC ingestion from RDS to Snowflake check out solutions from our partner Fivetran/HVR.

If your team is interested in learning more about how to extract more value from your Snowflake account, please reach out! We’d love to help kick off your project! 

If you’re looking for more helpful Snowflake resources, be sure to check out our free guide to getting started with Snowflake.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit