August 19, 2022

phData Toolkit August 2022 Update

By Nick Goble

Welcome to our August phData Toolkit update! Our wonderful internal engineering team has been hard at work. We’ve been aggregating feedback from you, our clients, and our delivery team to ensure we’re building out new functionality that helps accelerate your data migration project.

Over the last month, we’ve been focusing very heavily on enabling more SQL translation support between Microsoft SQL Server (MSSQL) and the Snowflake Data Cloud. This continues to be a regular part of the data migration projects that we’re executing for our customers. We’re seeing more of these types of migrations with customers and a need for more functionality.

Let’s talk about these changes, what they enable, and what’s coming next!

SQL Translation Changes

As previously mentioned, our main focus has been on our MSSQL to Snowflake translations.  We have also made some changes to the Hive Family SQL dialect as well.

MSSQL to Snowflake

Within MSSQL, one of the common translations we perform is stored procedures. Stored procedures are an important part of your database and data warehouse. They allow for you to create reusable parameterized functions, protect against SQL injection attacks, and group different pieces of logic together.

We’ve added support for the following statements to further enable stored procedure SQL translation:

  • Cursor FETCH
  • Procedural WHILE
  • Procedural DECLARE
  • DECLARE cursor and OPEN statements
This looks like the following:
				
					Source:
create procedure proc1 as DECLARE vend_cursor CURSOR FOR
SELECT
  *
FROM
  Purchasing.Vendor;
OPEN vend_cursor;

Target:
CREATE OR REPLACE PROCEDURE proc1 ()
RETURNS STRING
LANGUAGE SQL as $$
BEGIN
LET vend_cursor CURSOR FOR
SELECT
  *
FROM
  Purchasing.Vendor;
OPEN vend_cursor;
END;
$$;
				
			
				
					Source:
CREATE PROCEDURE proc1 AS
WHILE @stud_value <= 5
BEGIN
Select
  'Please stop execution!'
END;

Target:
CREATE OR REPLACE PROCEDURE proc1 ()
RETURNS STRING
LANGUAGE SQL as $$
BEGIN
WHILE(stud_value <= 5) DO
SELECT
  'Please stop execution!';
END
WHILE;
END;
$$;
				
			
				
					Source:
CREATE PROCEDURE proc1 AS
FETCH NEXT
FROM
  contact_cursor INTO @LastName,
  @FirstName;

Target:
CREATE OR REPLACE PROCEDURE proc1 ()
RETURNS STRING
LANGUAGE SQL as $$
BEGIN -- WARNING: line 3, character 6: Equivalent of argument 'NEXT' not found in Snowflake FETCH statement.
FETCH contact_cursor INTO LastName,
FirstName;
END;
$$;


				
			

These additions are important because they unlock a set of functionality that is both very common and important within stored procedures. Engineers need the ability to set and reference variables within stored procedures. This enables you to query a table (or set of tables) once for a value or set of values and then reference those values in the following statements. 

This makes it so you don’t have to re-run the same query multiple times and increases the performance of the stored procedure and reduces the computing necessary to execute the procedure. This also gives you the ability to process data in chunks instead of having to process the entire result set of a query at once.

We’ve also added the ability to translate try/catch statements. This is a critical piece of functionality that allows engineers to handle failures within a stored procedure. You might perform an insert statement into a table and then a subsequent command fails.  

For example, you may have two different tables that you can select data from (maybe as part of a data migration). 

				
					Source:
CREATE PROCEDURE proc1 AS
BEGIN TRY
SELECT
  *
FROM
  NonexistentTable;
END TRY
BEGIN CATCH
SELECT
  *
FROM
  TableCopy;
END CATCH

Target:
CREATE OR REPLACE PROCEDURE proc1 ()
RETURNS STRING
LANGUAGE SQL as $$
BEGIN
SELECT
  *
FROM
  NonexistentTable;
EXCEPTION
WHEN OTHER THEN
SELECT
  *
FROM
  TableCopy;
END;
$$;
				
			

We’ve also fixed some bugs in our MSSQL translation:

  • Fixed string concatenation
  • Include bind values in Javascript procedure translation
  • Fixed parsing of = operator in if/else statements
  • Removed duplicate variable declarations
  • Fixed variable references in queries
  • Cast to_char function as Date
  • Enclose Snowflake keywords in quotes

Hive Family to Snowflake

The Hive Family SQL dialect (Hive, Impala, Databricks SparkSQL and SparkSQL) is another area we’ve made some changes to within our SQL translation tool.

We’ve now implemented common table expression support for insert statements. This is important as it enables engineers to create a temporary table to reference on subsequent statements. This is a very common practice in tools like dbt.

				
					Source:
WITH cte AS (
  SELECT
    name,
    id
  from
    employee_stg
  where
    id = 12
)
INSERT INTO
  table employee(name, id)
select
  cte.name,
  cte.id
from
  cte;

Target:
INSERT INTO
  employee (name, id) WITH cte AS (
    SELECT
      name,
      id
    FROM
      employee_stg
    WHERE
      id = 12
  )
SELECT
  cte.name,
  cte.id
FROM
  cte;


				
			

We’ve also fixed some bugs within our translations:

  • Fixed CREATE TABLE with WITH clause (CTE)
  • Fixed TRUNCATE TABLE with PARTITION clause
  • Fixed parse issue with alias within single quotes

What’s Next for the Toolkit?

In the coming month, we’re going to continue to iterate on our MSSQL to Snowflake translations. For each of our translations, we have a set of SQL statements that we’ve built up from various sources that we use for testing. We generate coverage metrics for how much of the translation pair we’re able to support based on real-world SQL with a goal of covering over 90 percent of the translations.

This enables us to focus on what’s missing and used within real-world use cases.

phData Toolkit

If you haven’t already explored the phData Toolkit, we highly recommend checking it out!

We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2022. 

Be sure to follow this series for more updates to the phData Toolkit tools and features. 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit