September 7, 2023

phData Toolkit August 2023 Update

By Nick Goble

Hello, and welcome to our August update of the phData Toolkit blog series! Summer is in full swing as we head into fall. August brings State Fairs with hundreds of thousands of people, bonfires by the lake, and all the other joys of being outside.

August also brings you another wonderful suite of functionality to the phData Toolkit!

Over the last month, we’ve been heavily focused on adding additional support for SQL translations to our SQL Translations tool. Specifically, we’ve been introducing fixes and features for our Microsoft SQL Server to Snowflake translation.

Let’s dive in!

SQL Translation Updates

As previously mentioned, we’ve been focusing heavily on our SQL Server to Snowflake translations. We’ve had many customers performing migrations between these platforms, and as a result, they have a lot of Data Definition Language (DDL) and Data Markup Language (DML) that needs to be translated between SQL dialects.

Let’s take a look at some of the more interesting translations.

If Exists

When writing a particular piece of SQL, it’s common practice to check if something exists as part of your logic. Conditional logic is heavily used within non-SQL coding languages and is very common within SQL as well.

You may decide you want to set a particular value to one thing if some criteria are met. Otherwise, set it to a different value. We have now updated the logic in the translation to support this functionality.

MSSQL
				
					BEGIN
IF EXISTS (SELECT MAX(VALUE) FROM A WHERE ID = 2)
 BEGIN
 UPDATE B SET CODE = 2 FROM A;
 END
 ELSE
 UPDATE B SET CODE = 123 FROM C;
 END
;
				
			
Snowflake
				
					EXECUTE IMMEDIATE $$ BEGIN IF (
  EXISTS(
    SELECT
      MAX(VALUE)
    FROM
      A
    WHERE
      ID = 2
  )
) THEN
UPDATE
  B
SET
  CODE = 2
FROM
  A;
ELSE
UPDATE
  B
SET
  CODE = 123
FROM
  C;
END IF;
END;
$$
				
			

Similarly, you may want to check if something doesn’t exist. We also have added support for this as well.

Select @VARIABLE

This functionality isn’t as commonly used but unlocks a lot of functionality. Say you have multiple values that you want to concatenate together from different tables or columns. One way that you could do this is to write a single query with joins to those tables and select those values into one string.

However, in the event that you can’t join those tables together, you would need to concatenate the actual SQL results together. This is commonly handled in code that pulls data from databases, but you can also do this within the SQL query itself.

MSSQL
				
					create procedure proc1 as
  declare @var1 AS INT, @var2 AS INT;
  SELECT @var1 = col1 FROM t231 WHERE col1 = 10;
  SELECT @var1 += col1 FROM t231 WHERE col1 = 20;
  SELECT @var1 += col1, @var2 = col2 FROM t231 WHERE col1 = 10;


				
			
Snowflake
				
					-- WARNING: line 1, character 0: Syntax difference: STRING is added as a default return type by SQL Translation, as Snowflake expects a RETURN TYPE for PROCEDURES.
CREATE OR REPLACE PROCEDURE proc1()
RETURNS STRING
LANGUAGE SQL
as
$$
BEGIN
LET var1 INT;
LET var2 INT;
SELECT
  col1 INTO var1
FROM
  t231
WHERE
  col1 = 10;
SELECT
  :var1 + col1 INTO var1
FROM
  t231
WHERE
  col1 = 20;
SELECT
  :var1 + col1,
  col2 INTO var1,
  var2
FROM
  t231
WHERE
  col1 = 10;
END;
$$
;
				
			

This is a good example of how different SQL dialects can be. While both dialects provide the ability to select values into a named variable, the way they approach it is different.

Both dialects require you to declare a variable and the type, but how you declare them and how you select them is different. The way you declare the actual stored procedure is also different as well. This is where the SQL Translation tool can be a massive accelerator for your migration.

Toolkit CLI

Outside of our SQL Translation tool, we have also made a number of updates to our CLI version of the Toolkit.  This option allows you to programmatically leverage things like our SQL Translation tool as well as other functionality within the Toolkit.

Two of the main tools that’ve been built within the Toolkit CLI are the Provision tool and the Data Source tool.  These are two of the most heavily used tools within the Toolkit and frequently receive updates as a result.

Provision Tool Updates

For those who aren’t familiar with the Provision tool, it gives customers the flexibility to allow them to define and apply their own information architecture in a standardized way to Snowflake.  You have the ability to integrate with popular ticketing systems, such as Jira and ServiceNow, to allow self-service project creation and user access grants within existing enterprise approval processes

As Snowflake introduces new functionality that customers want to be able to manage within an Infrastructure-as-Code approach, we have to make updates to the Provision tool to support it.  As a result, we’ve added support for managing the Account Resource Monitor.  

This allows you to set an account level maximum credit usage to make sure you stay within your Snowflake budget.  We’ve also added support for provisioning password policies, which define how users are allowed to set passwords, how many attempts they get to enter it correctly, and what happens if they have too many failures in a row.

Data Source Tool Updates

The Data Source tool is one of the more powerful tools within the Toolkit and has so many different use cases.  Utilizing simple but powerful commands, you can automate your data platform processes at scale with ease to enable things like

  • Platform migration validation

  • Platform migration automation

  • Metadata collection and visualization

  • Tracking platform changes over time

  • Data profiling and quality at scale

  • Data pipeline generation and automation

  • dbt project generation

We’re excited to announce that we’ve created a new piece of functionality within the Data Source tool called SQL Collect. This tool allows you to collect the DDL for different object types like tables and views and save the collected SQL to an output folder.  This can then be passed to the SQL Translation tool to migrate your DDL to Snowflake.

Wrapping Up

We hope you’ve gained some additional insight into our phData Toolkit and the new functionality that we’ve added to the platform. As we identify common use cases and needs within our customers, we will continue to provide new/additional tooling to accelerate projects and help customers build reliable and observable systems. 

If you’re interested in learning more about our Toolkit or getting a demo around the functionality and how it could help your organization, give us a shout!

phData Toolkit

If you haven’t already explored the phData Toolkit, we highly recommend checking it out!

We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2023. 

Be sure to follow this series for more updates on the phData Toolkit tools and features.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit