SQLMorph February 2022 Update

Welcome to our latest installment of the SQLMorph blog series! Part of the phData Toolkit, SQLMorph is a free-to-use transpiler that converts code between various combinations of SQL dialects. 

Offered as a Software as a Service (SaaS), SQLMorph is available for use in both bulk transpilations (via the CLI) or for manual usage through the toolkit UI. Depending on the volume of queries that you’re looking to transpile, you may need to contact phData for extended usage.

Let’s dig into the recent changes to the platform.

Log4J

If you don’t know about this vulnerability yet, you can read more about it here.  Essentially, a vulnerability was found within Log4J that could cause Remote Code Execution (RCE). This caused a large ripple within the development community, and SQLMorph was no exception to this.

The surface area of this vulnerability is so large that the Cybersecurity and Infrastructure Security Agency (CISA) published an article regarding it.

There were zero known attacks against SQLMorph, and it has since been patched to ensure this attack vector is blocked.

Microsoft SQL Server

One of the areas we’ve been adding additional functionality to is Microsoft SQL Server (MSSQL) to the Snowflake Data Cloud. As we continue to engage with customers, we see additional use cases for this source-target pair. We’ve added the following features and bug fixes:

  • Truncate table statement
  • Create user statement
  • Fixed translation issue with numeric literals in pivot clause
  • Parse unsupported statements and give error message
				
					Source: 
CREATE INDEX index1 ON schema1.table1 (column1); 
DEALLOCATE @MyCursor;
CLOSE Employee_Cursor;
FETCH NEXT FROM cur1;
    
Target: 
-- ERROR: MSSqlStmtVisitor: line 1, character 0: Snowflake does not support `Create Index` statement.
/* CREATE INDEX index1 ON schema1.table1 (column1); */
-- ERROR: MSSqlStmtVisitor: line 2, character 0: Snowflake does not support `Deallocate` statement.
/* DEALLOCATE @MyCursor; */
-- ERROR: MSSqlStmtVisitor: line 3, character 0: Snowflake does not support `Close` statement.
/* CLOSE Employee_Cursor; */
-- ERROR: MSSqlStmtVisitor: line 4, character 0: Snowflake does not support `Fetch` statement.
/* FETCH NEXT FROM cur1; */


				
			

Hive Family

Another area we’ve been consistently adding functionality to is the Hive Family of SparkSQL. This includes Hive, Impala, Databricks SparkSQL, and SparkSQL. Each of these uses nearly identical SQL dialects and allows us to target a larger number of sources with one transpilation!

We’ve added the following pieces of functionality:

  • Add common table expression (CTE) with insert statement
  • Implement MONTHS_BETWEEN and NULLIFZERO functions

We’ve also added fixes for the following statements:

  • Fixed translation issue with function `unix_timestamp` and `date_sub`
  • Fixed parse issue with  time, option, week, cost, hours, string, system, items, zone, check, end, admin as identifier
  • Fixed `schema.table.*` parse issue
  • Fixed parse issue with explode function
  • Fixed parse issue with “With clause”
  • Fixed identifier parse issue if $ present with alias
  • Fixed Interval expression parse issue

As you can see, not all transpilations are perfect. We regularly try to gather as many test cases as possible for our unit testing. These are more complex usages of SQL from systems like Hive. We continue to iterate and validate edge cases as we see usage.

Here are a couple of Hive Family SQL translations:

				
					Source: 
INSERT OVERWRITE TABLE t1 WITH req AS(SELECT pid FROM t2) select eid from all
    
Target: 
INSERT OVERWRITE INTO t1 WITH req AS (SELECT pid FROM t2)SELECT eid FROM all;


				
			
				
					Source: 
select cast(from_unixtime(unix_timestamp(date_col,"MMddyyyy HH:mm:ss")) as timestamp) as date_col , date_sub(timestamp_col, id) , timestamp_col, id from alltypessmall order by 1  limit 20;
    
Target: 
SELECT TRY_CAST(TO_VARCHAR(TO_VARCHAR(TO_TIMESTAMP_LTZ(DATE_PART(EPOCH_SECOND,TRY_TO_TIMESTAMP(date_col, 'MMDDYYYY HH:MI:SS'))),'yyyy-MM-dd HH:mm:ss.SSSSSS')) AS TIMESTAMP) AS date_col,
            DATEADD(DAY, - (id), timestamp_col), timestamp_col, id FROM alltypessmall ORDER BY 1 LIMIT 20;


				
			
				
					Source: 
select d_id, d_type from d_devices ex1 
                lateral view explode(split((split((split(d_attributes, "\\{")[1]),  "\\}")[0]), ',')) point AS points
    
Target: 
SELECT d_id, d_type, VALUE as points FROM d_devices ex1,
                LATERAL FLATTEN(INPUT = > SPLIT((SPLIT((SPLIT(d_attributes, '\\{') [1]), '\\}') [0]),','));
				
			

phData Toolkit

Over the last few months, we integrated SQLMorph directly into the phData Toolkit. This toolkit is a one-stop shop for all phData SaaS offerings as well as information regarding non-SaaS tooling built and maintained by phData. For those who have historically used SQLMorph from its previously dedicated UI, you can now find it under SQL Translation in the Toolkit.

We will continue to add SaaS solutions to the toolkit as we identify additional tools that benefit our customers.

If you’re using Snowflake for your data warehouse, we urge you to check out the Privilege Audit tool to audit your organization.

Direction Going Forward

Over the last year we’ve had this blog series focused exclusively on the changes we’ve been making to the SQLMorph platform. Now that SQLMorph is included as part of the phData Toolkit, we want to take this opportunity to reflect on how we notify users of changes. 

We believe it’s beneficial to notify users of changes to the phData tooling ecosystem in totality to ensure users stay informed.

As a result of this, we will be modifying this blog series to encompass changes to the phData Toolkit in its entirety as opposed to being SQLMorph specific. This will start in March of 2022.

Accelerate and automate your data projects with the phData Toolkit

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.