March 7, 2023

How To Use Oracle GoldenGate to Ingest Data Into Snowflake

By Michael Gohn

The task of keeping multiple databases in sync so that data is accurate, up-to-date, and highly available is every data consumer’s biggest challenge. Oracle developed a tool to address this challenge and its called GoldenGate.

Oracle GoldenGate can handle both an initial bulk load of data from one database to another and then continue to replicate data from the source database to the target database automatically, keeping the databases in sync. 

But before we get too far ahead of ourselves, let’s take a quick step back to address some of the basics on Oracle and GoldeGate.

What is Oracle?

Oracle is one of the largest IT companies whose flagship product, Oracle Database, is a relational database management system. Oracle offers hundreds of software and cloud products in addition to the Oracle Database, however, including the very popular programming language, Java. 

Oracle boasts offering every service to migrate, build, and run all of your IT, whether that be integrated with existing legacy warehouses or a brand new cloud deployment.

What is Oracle GoldenGate?

Oracle GoldenGate is one of Oracle’s solutions that replicates, filters, and transforms data from one database to another. These replications do not have to be from one Oracle database to another Oracle database either. 

GoldenGate can replicate data between heterogeneous databases, not solely Oracle databases. Additionally, data does not only have to be replicated from one database to another but can be extracted from multiple sources or written to multiple target databases.

Why Use GoldenGate?

There are many benefits to using GoldenGate for data replication, with a wide array of use cases. GoldenGate replicates data in near real-time, boasting sub-second latency, regardless of the databases used or volume of data being replicated. GoldenGate also supports a number of different architectures as described in the image below.

different architectures

GoldenGate not only provides several different architectures but also a wide variety of databases that it can replicate data between.

(Shown in the image below.)

From keeping an active backup to consolidating or broadcasting data between platforms, GoldenGate is a very versatile tool that can handle many different use cases.

Prerequisites

In this blog, we focus on ingesting data into the Snowflake Data Cloud with GoldenGate and so we will pick up the replication process within GoldenGate. As such, we assume that we already have:

  • A GoldenGate extract and distribution path
  • A preexisting Snowflake table that has the same schema as the table in the source database that is being replicated

Steps to Ingest Data into Snowflake with GoldenGate

GoldenGate does not have a direct connector to Snowflake, however, GoldenGate for Big Data has the ability to write data to an internal or external Snowflake stage. From this stage, GoldenGate runs a merge statement to replicate data into Snowflake. Once an extract and distribution path is configured, follow these steps to ingest data into Snowflake.

  1. Create a Directory where GoldenGate will be Installed
  2. Download and Extract GoldenGate for Big Data
    1. This should be extracted into the directory location created in step 1.
  3. Run the Installation Wizard
    1. This is done by running the command ggs_Linux_x64_BigData_64bit_services/Disk1/runInstaller from the directory where GoldenGate was installed.
  4. Select the Oracle Database Version
    1. This is based on your environment.
  5. Set the OGG_HOME Environment Variable
    1. This should be set to the path to the directory where GoldenGate was installed.
  6. Check Over the Summary Page
    1. Then click “Install”. This will complete the installation of GoldenGate for Big Data.
  7. Begin to Create a Deployment
    1. This is done by running the $OGG_HOME/bin/oggca.sh program.
  8. Select an Existing Service Manager or Create a New One
    1. One Service Manager can cover multiple deployments, so it is only necessary to create a new one if one does not already exist. A new Service Manager requires a directory location, the hostname or IP Address of the server. and a port number.
  9. Add a New Deployment
    1. Enter a name and select the home directory for GoldenGate (this will be pre-populated if OGG_HOME is defined).
  10. Enter a Directory for the Deployment
    1. It is recommended for this directory to be outside of OGG_HOME.
  11. Enter Any Additional Environment Variables
    1. This will be prompted, with OGG_HOME pre-populated if it has already been defined. 
  12. Set a Username and Password for GoldenGate Microservices Service Manager
    1. Only if this is a new Service Manager. If this is a preexisting Service Manager, enter the already defined username and password for the Service Manager.
  13. Determine if This is a Secure Deployment
    1. GoldenGate recommends enabling SSL/TLS security and must be used if configuring for sharding support. If it is a secure deployment with SSL/TLS security, enter the server and client certificates.
  14. If Security is Enabled, Select Ciphers
    1. The next page will display a list of ciphers, which can be added, removed, and reordered to fit your business needs. 
  15. If Sharding is Enabled, Import GoldenGate Sharding Certificate
    1. A unique name must also be given for the certificate.
  16. Configure Ports
    1. Enter the Administration Server port number, which will auto-populate the rest of the ports. Monitoring can also be enabled at this time, which requires additional ports, a datastore, and location to be used, which can all be configured from this page.
  17. Review Configurations
    1. Review all of the configurations you have set for your deployment and, when ready, click “Finish” to spin up the deployment. From this page you can also save the configurations into a response file that can be run from the command line to create another identical deployment.
  18. Download the Snowflake-JDBC Driver JAR File
    1. That can be done here.
  19. Log in to the GoldenGate UI
    1. This is done by opening a web browser and entering http://localhost:<admin_service_port_number>/. 
  20. Add Snowflake Credentials
    1. Under the Configuration tab, enter the credential domain (Snowflake), an alias to be used for this credential, the Snowflake user ID and password. Click the connect to database icon to test that the connection is working correctly.
    2. Give Basic Details for the New Replicat
  21. From the home page, click to add a replicat. Then, give the replicat a name, select “Trail” for the source, enter a trail name (must be 2 characters), select a “begin” time and enter “Snowflake” as the target.
  22. Fill Out the Parameter File
    1. Enter details about how you want the data to be replicated into Snowflake. These details include data selection, mapping, transformation, DDL, error resolution, logging and status, and error reporting.
    2. A full list of acceptable parameters can be found here.
  23. Fill Out Properties File
    1. Next a .properties file needs to be filled out. Since Snowflake was selected earlier, GoldenGate will provide a pre-populated file with all of the fields that need to be configured marked with a TODO comment. There are four sections within this pre-populated file, corresponding to the four ways this replication can be performed, via a Snowflake internal stage or the three different kinds of external stages. Only the fields for the stage type being used need to be completed and the other sections can be deleted. It is also possible to create a file with the .properties extension locally and upload it to GoldenGate at this time.
				
					# Note: Recommended to only edit the configuration marked as  TODO

gg.target=snowflake

#The Snowflake Event Handler
#TODO: Edit JDBC ConnectionUrl
gg.eventhandler.snowflake.connectionURL=jdbc:snowflake://<account_name>.snowflakecomputing.com/?warehouse=<warehouse-name>&db=<database-name>
#TODO: Edit JDBC user name
gg.eventhandler.snowflake.UserName=<db user name>
#TODO: Edit JDBC password
gg.eventhandler.snowflake.Password=<db password>

# Using Snowflake internal stage.
# Configuration to load GoldenGate trail operation records 
# into Snowflake Data warehouse by chaining
# File writer handler -> Snowflake Event handler.
#TODO:Set the classpath to include Snowflake JDBC driver.
gg.classpath=./snowflake-jdbc-3.13.7.jar
#TODO:Provide sufficient memory (at least 8GB).
jvm.bootoptions=-Xmx8g -Xms8g

# Using Snowflake S3 External Stage. 
# Configuration to load GoldenGate trail operation records 
# into Snowflake Data warehouse by chaining
# File writer handler -> S3 Event handler -> Snowflake Event handler.

#The S3 Event Handler
#TODO: Edit the AWS region
#gg.eventhandler.s3.region=<aws region>
#TODO: Edit the AWS S3 bucket
#gg.eventhandler.s3.bucketMappingTemplate=<s3 bucket>
#TODO:Set the classpath to include AWS Java SDK and Snowflake JDBC driver.
#gg.classpath=aws-java-sdk-1.11.356/lib/*:aws-java-sdk-1.11.356/third-party/lib/*:./snowflake-jdbc-3.13.7.jar
#TODO:Set the AWS access key and secret key. Provide sufficient memory (at least 8GB).
#jvm.bootoptions=-Daws.accessKeyId=<AWS access key> -Daws.secretKey=<AWS secret key> -DSF_STAGE=s3 -Xmx8g -Xms8g

# Using Snowflake ADLS Gen2 External Stage.
# Configuration to load GoldenGate trail operation records 
# into Snowflake Data warehouse by chaining
# File writer handler -> HDFS Event handler -> Snowflake Event handler.

#The HDFS Event Handler
# No properties are required for the HDFS Event handler.
# If there is a need to edit properties, check example in the following line.
#gg.eventhandler.hdfs.finalizeAction=delete
#TODO: Edit snowflake storage integration to access Azure Blob Storage.
#gg.eventhandler.snowflake.storageIntegration=<azure_int>
#TODO: Edit the classpath to include HDFS Event Handler dependencies and Snowflake JDBC driver.                                                                             
#gg.classpath=./snowflake-jdbc-3.13.7.jar:hadoop-3.2.1/share/hadoop/common/*:hadoop-3.2.1/share/hadoop/common/lib/*:hadoop-3.2.1/share/hadoop/hdfs/*:hadoop-3.2.1/share/hadoop/hdfs/lib/*:hadoop-3.2.1/etc/hadoop/:hadoop-3.2.1/share/hadoop/tools/lib/* 
#TODO: Set property SF_STAGE=hdfs.  Provide sufficient memory (at least 8GB).
#jvm.bootoptions=-DSF_STAGE=hdfs -Xmx8g -Xms8g

# Using Snowflake GCS External Stage.
# Configuration to load GoldenGate trail operation records 
# into Snowflake Data warehouse by chaining
# File writer handler -> GCS Event handler -> Snowflake Event handler.

## The GCS Event handler
#TODO: Edit the GCS bucket name
#gg.eventhandler.gcs.bucketMappingTemplate=<gcs bucket>
#TODO: Edit the GCS credentialsFile
#gg.eventhandler.gcs.credentialsFile=<oggbd-project-credentials.json>
#TODO: Edit snowflake storage integration to access GCS.
#gg.eventhandler.snowflake.storageIntegration=<gcs_int>
#TODO: Edit the classpath to include GCS Java SDK and Snowflake JDBC driver.
#gg.classpath=gcs-deps/*:./snowflake-jdbc-3.13.7.jar
#TODO: Set property SF_STAGE=gcs.  Provide sufficient memory (at least 8GB).
#jvm.bootoptions=-DSF_STAGE=gcs -Xmx8g -Xms8g


				
			

24. Click “Create and Run”

  1. This will officially kick off the replicat and so any changes that happen in the source database will be populated into Snowflake.

Next Steps

Now that GoldenGate has replicated, and will continue to replicate, your data into Snowflake, you can begin to leverage the power of Snowflake. Whether that is utilizing dbt to build out robust models, connecting your analytics, BI or data tool to Snowflake to begin analyzing your data, or any number of other use cases, phData can help you continue your journey to getting the most out of your data.

Have more questions on data ingestion, GoldenGate, or Snowflake?

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit