May 1, 2023

phData Toolkit April 2023 Update

By Nick Goble

Welcome to the April 2023 update of the phData Toolkit blog series! For those in the Minneapolis area, we were teased with a beautiful week of 80s and sunshine only to be rudely reminded that “second winter” is a real thing.  

We hope that wherever you live, you’ve been able to enjoy all the lovely things that spring brings.

phData staff love to get outdoors and enjoy the weather.  Whether it’s BBQ for all your friends and family, training for a marathon, or making insanely delicious foods, phData knows how to relax, have fun, and get out in the sun.

When we’re not off work living our best lives, we’re working diligently to drive additional value for our customers.  One of the major ways that we do this is through tools and automation to support our customers and their migrations. We regularly evaluate the data space for gaps across customers or the industry and aim to solve those with people and technology.

Over the last month, we’ve mostly focused on bringing our customers a new version of the Provision tool (AKA Tram).  While this tool has been instrumental to the success of our customers, it was time for a refresh and redesign.  We’ve also spent a good amount of time upgrading our Netezza to Snowflake SQL translations.

Let’s take a deeper dive!

Provision Tool

At its core, The Provision tool is a command line tool that uses a templated approach to manage Snowflake Data Cloud resources. The tool converts the templated configuration into a set of SQL commands that are executed against the target Snowflake environment.

This diagram shows the creation of a project in the Provision tool:

A complex diagram showing how the phData Provision tool works.

For those of you that are currently using Tram, we have a migration guide.

If you’ve ever been part of a core rewrite of an application, you know that this is not a minor feat and requires a lot of upfront costs.  You’re spending time copying functionality and, in some cases, coding to a new platform when you could be adding additional value and functionality to your existing application.  In other words, there’s a pause in business value.

So why did we do this?

As with any application, it’s a good practice to regularly evaluate your approach, design, implementation, and overall structure.  Historically, Tram was built, deployed, and ran as a separate utility from the rest of the Toolkit.  

With the growth of the Toolkit CLI, we decided to unify the experience and software development lifecycle (SDLC) into a single utility that can provide a range of functionality.  

This also gave us the ability to rename parts of the application structurally. For example, Tram uses the diction of groups, models, and workspaces. We found that this was both confusing for customers and sometimes confusing internally as well.  Is it a group of resources, a group that people can belong to, or an Active Directory (AD) group?  

Naming things well is one of the hardest tasks there is, and we’ve decided to make updates to use the terms templates, entries, and resource sets.

We’ve also introduced several upgrades to the new Provision tool:

  • Leverage Jinja instead of Handlebars for better utility and extensibility.
  • Snowflake identifiers are quoted automatically and no longer need to be explicitly quoted in the template (in most cases).
  • Better resource property handling and validation for both property names and property values, including automatically quoting string values.
  • Easier to understand role grant syntax when developing templates.
  • Improved error handling and user experience when developing complex information architectures.
    • Error messages for syntax errors when developing templates should be easier to read and more specific.
  • Includes built-in local tutorials to make getting started easy.
  • Snowflake schemas, stages, and tags in templates (models) are no longer nested to make them easier to create and drop.
  • Improved resource create/destroy behavior, making it easy to destroy all types of objects.
    • Tram was not able to destroy some individual objects.

One of the other additional benefits of this design is a significant performance gain.  We noticed that Tram would struggle against very large Snowflake environments due to the single-threaded nature of the application.  

While the majority of our customers didn’t even notice this, it eventually became apparent that scalability needed to be improved. In the new Provision tool, we now leverage a multi-threaded approach in our plan and apply statements.

SQL Translation Updates

One of the biggest challenges when performing platform migrations is handling differences between those two platforms.  What might be a VARCHAR with a specific size in SQL Server might just be a STRING in Snowflake.  

This is a simple example, but it can get much more complicated, and performing all of these translations manually would be error-prone, expensive, and time-consuming.  You’d much rather have engineers focusing on the new system’s information architecture and core components than manually translating hundreds of thousands of lines of SQL.

This is precisely what our SQL Translation tool aims to solve.

We have the ability to run a migration coverage report to identify how many of your SQL files can be fully translated, partially translated, or can’t be translated programmatically.  This allows you to better understand the manual effort required to finish your SQL translations and estimate the duration of your migration.

Over the last few months, we’ve seen an increase in the number of customers that are performing migrations from Netezza to Snowflake. This has caused us to spend additional time in adding support and coverage for SQL translations with this source/target pairing.

Let’s take a look at the fixes and upgrades to this translation!

Netezza Translation Updates

Translate the following functions:

  • trigonometric functions such as SIN, COS, and TAN 
  • LIKE_ESCAPE
  • now()
  • AGE
  • STRPOS
  • CHAR_LENGTH
  • INSTR
  • ISTRUE
  • ISNULL

Examples:

Netezza:

				
					select CHAR_LENGTH('Hello, World!'), strpos('Hello, World!', 'W');
				
			
Snowflake:
				
					SELECT
  LENGTH('Hello, World!'),
  POSITION('W', 'Hello, World!');
				
			

Netezza:

				
					SELECT CASE
  WHEN (name ~~ LIKE_ESCAPE('A' :: "VARCHAR", '\'::"VARCHAR")) THEN 'ONE'::"VARCHAR"
  WHEN (name !~~ LIKE_ESCAPE('B' :: "VARCHAR", '\'::"VARCHAR")) THEN 'TWO'::"VARCHAR"
  ELSE 'THREE'::"VARCHAR" END from test;
				
			
Snowflake:
				
					SELECT
  CASE
    WHEN (name LIKE CAST('A' AS VARCHAR)) THEN CAST('ONE' AS VARCHAR)
    WHEN (name NOT LIKE CAST('B' AS VARCHAR)) THEN CAST('TWO' AS VARCHAR)
    ELSE CAST('THREE' AS VARCHAR)
  END
FROM
  test;
				
			
Netezza:
				
					SELECT ISTRUE(1 = 1);
				
			
Snowflake:
				
					SELECT
  IFF(1 = 1, true, false);
				
			

Wrapping Up

We hope you’re just as excited as we are about the changes our product engineering team has made to the platform. The refresh and integration of our Provision tool into the Toolkit CLI is a massive step to unifying the experience of our engineers and customers who leverage our Toolkit to drive their success.  

If you’re interested in learning more about our Toolkit or how we can accelerate your organization around data, give us a shout!

phData Toolkit

If you haven’t already explored the phData Toolkit, we highly recommend checking it out!

We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2023. 

Be sure to follow: this series for more updates on the phData Toolkit tools and features.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit