Welcome to our August phData Toolkit update! Our wonderful internal engineering team has been hard at work. We’ve been aggregating feedback from you, our clients, and our delivery team to ensure we’re building out new functionality that helps accelerate your data migration project.
Over the last month, we’ve been focusing very heavily on enabling more SQL translation support between Microsoft SQL Server (MSSQL) and the Snowflake Data Cloud. This continues to be a regular part of the data migration projects that we’re executing for our customers. We’re seeing more of these types of migrations with customers and a need for more functionality.
Let’s talk about these changes, what they enable, and what’s coming next!
SQL Translation Changes
MSSQL to Snowflake
Within MSSQL, one of the common translations we perform is stored procedures. Stored procedures are an important part of your database and data warehouse. They allow for you to create reusable parameterized functions, protect against SQL injection attacks, and group different pieces of logic together.
We’ve added support for the following statements to further enable stored procedure SQL translation:
- Cursor FETCH
- Procedural WHILE
- Procedural DECLARE
- DECLARE cursor and OPEN statements
Source:
create procedure proc1 as DECLARE vend_cursor CURSOR FOR
SELECT
*
FROM
Purchasing.Vendor;
OPEN vend_cursor;
Target:
CREATE OR REPLACE PROCEDURE proc1 ()
RETURNS STRING
LANGUAGE SQL as $$
BEGIN
LET vend_cursor CURSOR FOR
SELECT
*
FROM
Purchasing.Vendor;
OPEN vend_cursor;
END;
$$;
Source:
CREATE PROCEDURE proc1 AS
WHILE @stud_value <= 5
BEGIN
Select
'Please stop execution!'
END;
Target:
CREATE OR REPLACE PROCEDURE proc1 ()
RETURNS STRING
LANGUAGE SQL as $$
BEGIN
WHILE(stud_value <= 5) DO
SELECT
'Please stop execution!';
END
WHILE;
END;
$$;
Source:
CREATE PROCEDURE proc1 AS
FETCH NEXT
FROM
contact_cursor INTO @LastName,
@FirstName;
Target:
CREATE OR REPLACE PROCEDURE proc1 ()
RETURNS STRING
LANGUAGE SQL as $$
BEGIN -- WARNING: line 3, character 6: Equivalent of argument 'NEXT' not found in Snowflake FETCH statement.
FETCH contact_cursor INTO LastName,
FirstName;
END;
$$;
These additions are important because they unlock a set of functionality that is both very common and important within stored procedures. Engineers need the ability to set and reference variables within stored procedures. This enables you to query a table (or set of tables) once for a value or set of values and then reference those values in the following statements.
This makes it so you don’t have to re-run the same query multiple times and increases the performance of the stored procedure and reduces the computing necessary to execute the procedure. This also gives you the ability to process data in chunks instead of having to process the entire result set of a query at once.
We’ve also added the ability to translate try/catch statements. This is a critical piece of functionality that allows engineers to handle failures within a stored procedure. You might perform an insert statement into a table and then a subsequent command fails.
For example, you may have two different tables that you can select data from (maybe as part of a data migration).
Source:
CREATE PROCEDURE proc1 AS
BEGIN TRY
SELECT
*
FROM
NonexistentTable;
END TRY
BEGIN CATCH
SELECT
*
FROM
TableCopy;
END CATCH
Target:
CREATE OR REPLACE PROCEDURE proc1 ()
RETURNS STRING
LANGUAGE SQL as $$
BEGIN
SELECT
*
FROM
NonexistentTable;
EXCEPTION
WHEN OTHER THEN
SELECT
*
FROM
TableCopy;
END;
$$;
We’ve also fixed some bugs in our MSSQL translation:
- Fixed string concatenation
- Include bind values in Javascript procedure translation
- Fixed parsing of = operator in if/else statements
- Removed duplicate variable declarations
- Fixed variable references in queries
- Cast to_char function as Date
- Enclose Snowflake keywords in quotes
Hive Family to Snowflake
The Hive Family SQL dialect (Hive, Impala, Databricks SparkSQL and SparkSQL) is another area we’ve made some changes to within our SQL translation tool.
We’ve now implemented common table expression support for insert statements. This is important as it enables engineers to create a temporary table to reference on subsequent statements. This is a very common practice in tools like dbt.
Source:
WITH cte AS (
SELECT
name,
id
from
employee_stg
where
id = 12
)
INSERT INTO
table employee(name, id)
select
cte.name,
cte.id
from
cte;
Target:
INSERT INTO
employee (name, id) WITH cte AS (
SELECT
name,
id
FROM
employee_stg
WHERE
id = 12
)
SELECT
cte.name,
cte.id
FROM
cte;
We’ve also fixed some bugs within our translations:
- Fixed CREATE TABLE with WITH clause (CTE)
- Fixed TRUNCATE TABLE with PARTITION clause
- Fixed parse issue with alias within single quotes
What’s Next for the Toolkit?
In the coming month, we’re going to continue to iterate on our MSSQL to Snowflake translations. For each of our translations, we have a set of SQL statements that we’ve built up from various sources that we use for testing. We generate coverage metrics for how much of the translation pair we’re able to support based on real-world SQL with a goal of covering over 90 percent of the translations.
This enables us to focus on what’s missing and used within real-world use cases.
phData Toolkit
If you haven’t already explored the phData Toolkit, we highly recommend checking it out!
We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2022.
Be sure to follow this series for more updates to the phData Toolkit tools and features.