March 1, 2021

Intro to SQLMorph Series

By Nick Goble

Here at phData, we strive to build automation for mundane tasks whenever possible.  This allows us to focus on solving the more complex and challenging problems that our customers face.

Over the years, we have built a few different tools (CloudFoundation, Tram, and PAMS to name a few) that increase productivity, security, and development/deployment of data engineering pipelines and processes.  These tools continue to evolve as new functionality comes out from phData partners and as these tools are integrated into customer systems.

One tool that we have built, called SQLMorph, continues to get increased usage.  This free-to-use tool is a transpiler built to convert SQL statements from source to target.  By using this tool either through the UI or published Python script, it allows you to quickly translate your existing SQL statements into their destination.  This lets you spend more time on complex topics such as establishing data governance practices.

As we continue to iterate on SQLMorph, there has become a need to broadcast to our existing and potential customers the additional features and functionality we have added.  This blog series aims to do just that.

What is SQLMorph?

For those who are new to SQLMorph, it’s worth explaining what SQLMorph is and the complexity that it aims to solve.

Many enterprises are still running an on-premises database or data warehouse,  and are looking to transition to a cloud based system.  There are many different providers in the cloud data warehouse space, however we’ve noticed many customers moving to the Snowflake Data Cloud.  As such, SQLMorph currently is heavily focused on Snowflake as a target transpilation.

While many providers follow the ANSI SQL standard, almost every provider extends this standard, or compliments it with additional functionality and syntax.  Just like programming languages, everybody has an opinion of a better way to do things.

Examples

For example, let’s say a customer is looking to migrate from Oracle to Snowflake.  They have some SQL that selects data out of their database:

Source Dialect:  Oracle

				
					select
      d.department_name,
      v.employee_id,
      v.last_name
    from
      departments d
      cross apply (
        select
          *
        from
          employees e
        where
          e.department_id = d.department_id
      ) v
    where
      d.department_name in ('Marketing', 'Operations', 'Public Relations')
    order by
      d.department_name,
      v.employee_id
				
			
Target Dialect: Snowflake
				
					SELECT
  d.department_name,
  v.employee_id,
  v.last_name
FROM
  departments d
  CROSS JOIN (
    SELECT
      *
    FROM
      employees e
    WHERE
      e.department_id = d.department_id
  ) v
WHERE
  d.department_name IN ('Marketing', 'Operations', 'Public Relations')
ORDER BY
  d.department_name,
  v.employee_id;
				
			

While this is a simple translation of Oracle’s Cross Apply to Snowflake’s Cross Join, transpilations can get much more complicated.  For example:

Source Dialect: Oracle

				
					PROCEDURE for_loop_cursor_update_insert_example (GROUP_SEQ NUMBER, TRANS_SEQ NUMBER, ID NUMBER,TRANS_ID NUMBER, TRANS_SUM NUMBER, UNIT VARCHAR(6), DEBET NUMBER, CREDIT NUMBER) IS
BEGIN
PP := 1;
FOR v_rec in (SELECT unit from xxx)
LOOP
   FOR d_rec in (SELECT ID, TRANS_SUM, UNIT from yyy where unit = v_rec.unit )
   LOOP

     UPDATE YYY_SEQUENCE 
       SET SEQ_COUNT = SEQ_COUNT + 1
       WHERE SEQ_NAME = 'GROUP_SEQ'
       COMMIT;

      INSERT INTO AAA (ID, SEQ) VALUES (ID, GROUP_SEQ);
      PP := 1;
      WHILE PP <= 4
      LOOP
         PP := PP +1;

     UPDATE YYY_SEQUENCE 
       SET SEQ_COUNT = SEQ_COUNT + 1
       WHERE SEQ_NAME = 'GROUP_SEQ'
       COMMIT;

         IF (PP < 3) THEN
             DEBET := TRANS_SUM;
             CREDIT := 0;
         ELSE             
             DEBET := 0;
             CREDIT := TRANS_SUM;
         END IF;

         INSERT INTO BBB(ID, SEQ, UNIT, DEBET, CREDIT) VALUES (ID, TRANS_SEQ, UNIT, DEBET, CREDIT);
         COMMIT;
      END LOOP;
   END LOOP;
END LOOP;
-- from https://stackoverflow.com/a/61185793/14042808
END;
				
			
Target Dialect: Snowflake
				
					-- INFO: Comments from translated statement. Future work will list them closer to their original location.
-- Comment(s) from line 38, character 0
-- from https://stackoverflow.com/a/61185793/14042808
create or replace procedure for_loop_cursor_update_insert_example ("GROUP_SEQ" FLOAT, "TRANS_SEQ" FLOAT, "ID" FLOAT, "TRANS_ID" FLOAT, "TRANS_SUM" FLOAT, "UNIT" VARCHAR(6), "DEBET" FLOAT, "CREDIT" FLOAT) 
returns string 
language javascript
as
$$
var PP = 1;

var cursor1 = snowflake.createStatement({
    sqlText: "SELECT unit FROM xxx;"
});
var cursor1Result = cursor1.execute();
while (cursor1Result.next()) {
    var cursor1 = snowflake.createStatement({
        sqlText: "SELECT ID, TRANS_SUM, UNIT FROM yyy WHERE unit = v_rec.unit;"
    });
    var cursor1Result = cursor1.execute();
    while (cursor1Result.next()) {
        var loopResult = snowflake.execute({
            sqlText: "UPDATE YYY_SEQUENCE SET SEQ_COUNT = SEQ_COUNT + 1 WHERE SEQ_NAME = 'GROUP_SEQ';"
        });
        COMMIT;
        var loopResult = snowflake.execute({
            sqlText: "INSERT INTO AAA (ID, SEQ) VALUES (ID, GROUP_SEQ);"
        });
        var PP = 1;
        while (PP <= 4) {
            var PP = PP + 1;
            var sql_stmt = snowflake.createStatement({
                sqlText: "UPDATE YYY_SEQUENCE SET SEQ_COUNT = SEQ_COUNT + 1 WHERE SEQ_NAME = 'GROUP_SEQ';"
            });
            var sql_result = sql_stmt.execute();
            var sql_stmt = snowflake.createStatement({
                sqlText: "COMMIT;"
            });
            var sql_result = sql_stmt.execute();
            if ((PP < 3)) {
                var DEBET = TRANS_SUM;
                var CREDIT = 0;
            } else {
                var DEBET = 0;
                var CREDIT = TRANS_SUM;
            }
            var sql_stmt = snowflake.createStatement({
                sqlText: "INSERT INTO BBB (ID, SEQ, UNIT, DEBET, CREDIT) VALUES (ID, TRANS_SEQ, UNIT, DEBET, CREDIT);"
            });
            var sql_result = sql_stmt.execute();
            var sql_stmt = snowflake.createStatement({
                sqlText: "COMMIT;"
            });
            var sql_result = sql_stmt.execute();
        }
    }
}
$$
;
				
			
As you can see, this is much more complicated to try to convert manually.  While these are fairly simple SQL statements, it’s very common for enterprises to have stored procedures and queries that are thousands of lines of code.

New SQLMorph Features

As previously mentioned, the goal of this blog series is to announce new features to SQLMorph.  These features allow us to continue to automate and abstract differences between SQL syntaxes for customers migrating between providers.

In the last month, we’ve added and made adjustments to the following language specific transpilations:

Oracle to Snowflake

  • Update non-reserved keyword list based on Oracle document. Such as SORT, ENABLE, etc.
  • Translate CONSTRAINT state eg INITIALLY DEFERRED
  • PL/SQL – parse procedure calls without parentheses
  • PL/SQL – fixed bug where SQL in EXECUTE IMMEDIATE wasn’t translated correctly

Impala to Snowflake

  • CAST is now mapped to TRY_CAST due to source CAST behavior
    From: SELECT CAST(I AS INT)
    To: SELECT TRY_CAST(TO_VARCHAR(I) AS INT)
  • String literals using Double Quotes are now mapped to single quotes
    From: SELECT LOWER(“ABC”)
    To: SELECT LOWER(‘ABC’)
  • Snowflake now supports CONCAT_WS and therefore we removed recursive calls to CONCAT
    Impala: SELECT CONCAT_WS(‘A’,’B’,’C’,’D’)
    Was re-written as: SELECT CONCAT(CONCAT(CONCAT(‘A’, ‘B’), ‘C’), ‘D’)
    Now: SELECT CONCAT_WS(‘A’,’B’,’C’,’D’)

Microsoft SQL Server to Snowflake

  • Implemented delimited identifiers
    From: SELECT [Exchange Rate (Value In USD)]
    To: SELECT “Exchange Rate (Value In USD)”
  • Fix alias in front of columnName
    From: SELECT ’empID’ = EMP_ID
    To: SELECT EMP_ID empID
  • Implement CASE expression
  • Add ON PRIMARY in create table statement


We’ve also made some modifications to the command line interface (CLI) that we provide for SQLMorph.  This allows you to bulk translate SQL files rather than manually going through them one at a time in the user interface (UI).

  • Ability to choose the target type for date in Oracle to Snowflake
  • Ability to Support server side configs

SQLMorph is Now SQL Translation

Fantastic news! phData combined all of its tools and apps into a unified interface called the phData Toolkit. With the Toolkit, you’ll have a centralized place to utilize all of your favorite phData tools including SQLMorph, now called SQL Translation.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit