Looking for an in depth approach to migrating data from your on premise systems to Snowflake? Download our step-by-step whitepaper.
Are you still sinking 40% of your IT budget into Oracle licensing fees? If the answer to that question is yes (or if you had to confirm whether it was actually only 30%), then it’s very likely you would be interested in what could be done to reduce that spend.
Snowflake is a data warehouse built for the cloud, and focused on your OLAP needs. It can assist with reducing dependence on Oracle for analytical workloads using a pricing model that scales with its use. The cloud native offering has many features and capabilities that make it an enticing solution to your company’s data warehousing needs, beyond just cost savings from scalable on-demand compute power.
Making the move to Snowflake is almost certainly the right choice, but when it comes to putting everything into practice the actual migration of the data often is the trickiest part. Of course you could go find a product that excels in this space, but then you would just be trading one set of licensing fees for another. Rather you might consider how to utilize Snowflake components and public cloud infrastructure to build a migration process that is based on cloud native principles.
One that is highly resilient, can easily be scaled to your demand, and utilizes consumption based billing. For the remainder of this post, we will walk through a migration plan that does just that by using existing cloud native technologies.
Oracle to Snowflake Migration Plan
Before getting into the processes that will be a part of the migration, it would be ideal to identify the path that data will take through the pipeline.
At its core, Snowflake is built on public cloud infrastructure, and integrates very nicely with the major public cloud data platforms; Amazon Web Services (AWS), Azure and Google Cloud Platform (GCP). Loading data into Snowflake is not done by connecting to a database to pull the data in, but rather by using these platforms’ object storage services as a staging location. This is for good reason, as it is cheap, nearly infinitely scalable, and many enterprises already use them as a data repository in some way.
So if Snowflake loads data from public cloud object storage services, a migration will first need to move the data from the database into the object storage service to stage it for Snowflake. This part of the migration has many options, including numerous on premise and cloud based ETL/ELT vendors. The three major cloud platforms also offer some form of cloud native data migration services, and the capabilities of these services grow every year.
Here is a high level look at the path data will take during the data migration from Oracle to Snowflake:
For the purposes of this post, the focus will be on the offerings from AWS, the first cloud platform that Snowflake supported. This means that data will be staged for Snowflake in AWS Simple Storage Service (S3) and the migration from Oracle to Amazon S3 will be done using AWS cloud native data migration service.
Step 1: Oracle to Amazon S3
There are many tools that can be used to fulfill movement of data from Oracle to Amazon S3, but in order to meet the goal of creating a cloud native migration, it would be best to start by looking at services offered by the public cloud. Within AWS, that would mean starting with Database Migration Service (DMS), and upon closer inspection of the capabilities of DMS, it would appear to fit the bill pretty well. Below are some details:
- Migration Type: Full load and/or Change Data Capture (CDC).
- Database sources: Oracle, MySQL, Postgres, SQL Server, and more.
- Targets for migration: S3, Kafka, Kinesis, and more, including many of the same RDBMS mentioned in sources.
With DMS, it is possible to migrate from an Oracle source to an Amazon S3 target. The migration itself can be a single batch migration of the current database, or it can be a near real time replication from source to target. Or both could be utilized together, to get all retrospective and prospective data into S3.
The AWS resources in play here include the following:
- DMS Replication Instance — The infrastructure that Replication Tasks run on, managed by AWS.
- DMS Replication Task — This is what defines a migration, including the type of migration and what sort of transformations to apply along the way.
- Source Endpoint — Connection details for the source database.
- Target Endpoint — Connection details for the target.
- S3 Bucket — The staging area for Snowflake. This might be something created solely for use by Snowflake, but it is often already an integral part of a company’s greater data repository landscape.
- Oracle Database — The source that DMS will be replicating from.
In order to migrate from Oracle, certain administrative tasks will need to be performed on the database, such as creating a user for DMS to utilize or enabling the redo logs for DMS to use for CDC. An Oracle DBA should feel comfortable performing these types of operations.
Step 2: Amazon S3 to Snowflake
With data in Amazon S3, it is now possible to utilize Snowflake components to complete the migration. Depending on the type of migration that was done, the data in S3 is representative of the actual table structure for a full load, or a timeseries view of operations with CDC. Luckily, a single processing pipeline can be built with snowflake components to handle both situations.
The primary components that make up this pipeline are:
- Snowflake Pipe (also known as Snowpipe) — Enables immediate loading of data from a stage (S3) to a table, as data arrives at the stage. This effectively keeps a Snowflake table in sync with data that is landing in a staging location.
- Snowflake Stream — A record of DML operations performed against a table, which can be queried like a table. This can be used as a CDC source for a given table.
- Snowflake Task — A periodic process that can execute an arbitrary SQL statement on a fixed schedule.
Combining these components with an intermediate ‘change’ table, this pipeline will produce the final results of the migration. The Snowpipe is used to keep the change table up to date with the latest data from S3.
A Snowflake stream is applied to the change table to keep track of the most recent inserts to the table, which is used by the Snowflake task. The task wakes up periodically to merge the contents of the stream into the final reporting table, after which it clears the stream state in preparation for its next execution.
Step 3: Migration Success
Using these two technologies together we have outlined a path for migrating a table from Oracle database to Snowflake, in what can be near real time if desired. There is no hardware to purchase or provision, no software to purchase or install, and no licensing fees from AWS or Snowflake.
All of this is done with a pay-for-what-you-use pricing model, and can be scaled to meet the demands of your migration plan. Hopefully, with a migration to Snowflake your analytical spend will be more reflective of the work you are actually performing.
While automating your Oracle migration into Snowflake can certainly be a challenge, phData has built tooling to ease the transition to cloud native data pipelines for Snowflake via Streamliner. If you are interested in a more in depth look at phData’s approach to migrating data from your on premise systems to Snowflake, take a look at our whitepaper.