Businesses are constantly changing due to the economy, weather, and innovation. Similarly, the data demands of these businesses are also changing. Choosing a framework that unlocks this flexibility is why phData recommends a CI/CD framework that provides instant feedback which helps accelerate development on the Snowflake Data Cloud.
This Continuous Integration / Continuous Delivery (CI/CD) framework has made it significantly easier for businesses and developers to make changes, have confidence in those changes, and get them into production faster than manually deploying their applications.
Database platforms on the other hand are a whole different story.
A large number of organizations are still deploying and testing manually for their data environments. This makes sense because applications are stateless and can be restarted and changed very easily without a lot of issues while the data environments maintain state and tend to be stuck in legacy change management systems.
So what about implementing CI/CD for Snowflake?
In this post, we’re going to explore the benefits (especially the increased value and agility) of combining a CI/CD process with a data platform like Snowflake.
Setting up the Right Tools for CI/CD in Snowflake
Once an organization decides on wanting to do CI/CD with their database, there is a list of tools to facilitate the automation and testing process.
The first and most important is a source control repository and pipeline tool. These can include hosted options like Azure DevOps, AWS CodeCommit/CodePipeline, BitBucket, or Github Actions— or can be hosted on their on-premise versions as well.
Second, you’ll need a tool to deploy the changes to the database. This can be a hand-coded script or an open-source tool like Redgate’s Flyway. These tools should be able to connect to the Snowflake instance and run the scripts to create the DDL and DML required for deploying changes.
Finally, you’ll need a testing framework. In a database, this could be anything from a group of SQL scripts (to fill tables, run scripts, and have expectations) to using a regular unit testing framework like pytest or JUnit. For this example, a combination of JUnit and DbFit gives a very easy-to-use framework.
5 Steps to Build a CI/CD Framework for Snowflake
Below, we share an example process with Snowflake using all open source technology. There can be a lot more or less steps depending on what the CI/CD process is planning to do.
- Step 1: The first step has the developer create a new branch with code changes.
- Step 2: This step involves deploying the code change to an isolated dev environment for automated tests to run.
- Step 3: Once the tests pass, a pull request can be created and another developer can approve those changes.
- Step 4: After the changes are approved, build a pipeline running and deploying the code to a UAT environment where certain users can perform validations.
- Step 5: Finally, a developer can approve the release to production and the deployment will push the code to production.
Flyway
Regardless of the source control and build system you choose, a tool like Flyway automates the deployment process by pushing the SQL, DDL, and DML to a dev, test, and production environment. Flyway operates on the concept of an evolutionary database which allows changes to be made to an object. This would be used in steps 2, 4, and 5.
For example, a table would be created in a script, and if the table needed an additional column, a second script would be used to add the column and fill it with a default value. These scripts get layered on each other over time, which allows the database to be modified while maintaining any data it is storing.
As an added benefit, it also allows the database to be created from nothing with all of the scripts, which can be extremely useful for testing.
There are multiple ways to configure Flyway, but for this example using a Maven plugin during a CI/CD process is relatively straightforward.
First, a pom.xml file is created (normally via an IDE), and in that pom, there needs to be a plugin section like below:
org.flywaydb
flyway-maven-plugin
{VERSION}
This will run the Flyway plugin during a build step. But, because it’s connecting to Snowflake, a dependency on the Snowflake jdbc driver needs to be added like this:
net.snowflake
snowflake-jdbc
{VERSION}
Now all that is left is to set the properties for Flyway to connect to the Snowflake instance and where to find the code to deploy.
In the case of multiple Snowflake instances (dev, test, prod), using Maven profiles can simplify some of the management. Below, there is an example of a dev and stage profile, the connection details for both, and the source folders.
dev
false
{USERNAME}
FLYWAY,{SCHEMAS}
false
create schema if not exists FLYWAY;
filesystem:{RELATIVE_PATH_TO_SQL}
stage
false
{USERNAME}
FLYWAY,{SCHEMAS}
false
create schema if not exists FLYWAY;
filesystem:{RELATIVE_PATH_TO_SQL}
The example above uses a private key and the username to connect, but a password can be used instead. At this point, if SQL scripts were in the folder and the following command was run:
mvn flyway:migrate -P dev
…the SQL code would be created in the specified schema, a new schema called “FLYWAY” would be created, and a history table would be created with records of the scripts that were applied. The nice thing about profiles is by just changing the command to be this:
mvn flyway:migrate -P stage
…a different user on a different instance can be used to apply those same changes.
The migrations between environments becomes very easy to do. Adding another prod profile with a production environment would allow a deployment to prod to happen just as easily.
Automating Your CI/CD Scripts on Snowflake
At that point, the SQL scripts that follow the naming convention for Flyway are automatically deployed. During the build process of the pipeline, if a step was added to run this command on a completed pull request, the entire process can be automated.
Now that you’re deploying code automatically via a pull request, the concern is how a developer can be sure the code is correct — and not breaking anything currently in production.
While unit testing is used for developing an application, it makes less sense to unit test SQL code because of the combined nature of the code. In other words, looking at testing as an acceptance test makes a bit more sense due to the high-level nature of SQL.
While you can use a handful of different tools, DbFit allows for tests to be built very simply and run from JUnit. Running them in JUnit allows the CI/CD pipeline to run them automatically during the build process. This would be used in step 2 when deploying to an isolated environment to automatically run the tests.
For an example of DbFit, using the same pom as before, you simply add DbFit, the fitnesse framework, and JUnit to the pom as dependencies:
junit
junit
{VERSION}
test
org.fitnesse
fitnesse
{VERSION}
test
com.github.dbfit
dbfit-core
{VERSION}
test
com.github.dbfit
dbfit-snowflake
{VERSION}
test
Some of these dependencies may not be in Maven central and would need a step in the build process to pull them and install them locally. Once that is done, the DbFit project can be added to the project in its own folder:
Now you can build the last piece of the JUnit test. In a src file, a testing directory can be created as shown below — and a Java class for the tests can be placed in that directory.
Once in the file, if the tests are being stored in DbFit/AcceptanceTests/JavaTests/ SnowflakeTests, then the Java test file would look like this:
import fitnesse.junit.FitNesseRunner;
import org.junit.Test;
import org.junit.runner.RunWith;
@RunWith(FitNesseRunner.class)
@FitNesseRunner.Suite("DbFit.AcceptanceTests.JavaTests.SnowflakeTests")
@FitNesseRunner.FitnesseDir(".")
@FitNesseRunner.OutputDir("./build/fitnesse-results")
public class TestsExample {
@Test
public void dummy() {
}
}
This code will use JUnit to call the DbFit test and run them locally.
It will also store the results of the tests in “build/fitnesse-results” which can be used in the CI/CD pipeline for reporting on the tests. With that defined, below is where a test could be placed (along with a sample test for a very simple view).
The actual test is located inside content.txt.
The code will tell the fitnesse framework it is a DbFit Snowflake test, then connect to the Snowflake instance via the jdbc url. Then it will insert five rows of test data, select from the view that can be created via Flyway and compare the output of the selected columns to the output provided. This syntax makes it very easy to test views and other code for SQL:
And the view code that is under test:
CREATE VIEW ANALYTICS.CarColorCount
AS
SELECT
SUM(CASE WHEN lower(color) = 'orange' THEN 1 ELSE 0 END) AS OrangeCars,
SUM(CASE WHEN lower(color) = 'blue' THEN 1 ELSE 0 END) AS BlueCars,
SUM(CASE WHEN lower(color) = 'red' THEN 1 ELSE 0 END) AS RedCars,
SUM(CASE WHEN lower(color) = 'black' THEN 1 ELSE 0 END) AS BlackCars
FROM analytics.CAR ;
Running these tests will result in a failed deployment if they don’t pass.
Another option besides DbFit is to use Java or Python to run scripts and report back their status to the testing framework if they pass. Either option will get the code tested before it gets to production. Finally, here is an overview of the whole process with people only involved to confirm things before they are deployed.
Overall, when using a database tool like Snowflake, there are very few reasons to not automate the deployment and testing process. Utilizing a CI/CD framework in Snowflake should achieve the goal of faster, more accurate deployments with less possibility for human error.
Looking for more best practices for Snowflake and CI/CD? Download our Getting Started with Snowflake Guide that provides actionable steps to help you get the most out of your Snowflake Data Cloud.