April 5, 2023

Using KNIME’s DB Tools with Snowflake

By John Emery

Toward the end of 2022, Snowflake was in use by nearly half of all Fortune 500 companies and it is expected that this proportion will continue to grow throughout the rest of the decade. 

As one of the premier cloud-based database tools, it sees usage across virtually every industry and vertical. To get the most out of the Snowflake Data Cloud, however, requires extensive knowledge of SQL and dedicated IT and data engineering teams

For many analysts and teams, these skills are either not present or the timelines waiting for other groups to build the tables they need are too lengthy.

What option is there, then?

Using a tool such as KNIME Analytics Platform can be an excellent way for analysts and non-IT teams to connect to Snowflake and utilize data that had previously been unavailable. The great benefit to an analytics engineering tool such as KNIME is that it does not require any SQL or coding knowledge (although it can certainly be helpful).

Throughout the rest of this post, we will discuss how anybody can use KNIME’s database nodes to leverage the power of Snowflake’s engine.

Getting Started with KNIME & Snowflake

When you first download KNIME Analytics Platform and search for Snowflake, you will come away disappointed. To keep download sizes to a minimum, the base installation of KNIME does not include the Snowflake extension

To download the extension, navigate to File → Install KNIME Extensions… and search for “Snowflake” in the search bar. Once you select it, navigate through a few prompts to begin the installation. Once the installation is complete, you may restart KNIME. You will also need to install the KNIME Database extension to obtain the full suite of database nodes.

A screenshot that shows how to Insert Snowflake Extension in KNIME

Once you have installed these two extensions and restarted KNIME you will find dozens of new nodes under the DB folder in the Node Repository. The first step is to connect to your Snowflake instance. If you need help with this, check out our blog post on connecting to Snowflake in KNIME.

Using KNIME’s Database Nodes

Once you have installed the required extensions and established a connection to your Snowflake instance, you can begin leveraging the power of KNIME’s database nodes and the Snowflake engine. 

So, what exactly are KNIME’s database nodes? 

KNIME’s database nodes allow developers to connect to a database—Snowflake in this case—and perform a variety of analytical operations without bringing the data onto your local machine. By keeping the data and all associated operations within the database engine you will use Snowflake’s (likely) much more powerful engine and your workflow’s performance should be significantly better than if you pulled the data into your computer’s memory. 

Let’s look over a list of a few of these database nodes so you can begin to see what operations are available:

  • DB Column Filter:
    This node allows you to select and deselect columns from a given table or query. This is the equivalent of the SELECT portion of a SQL query. The node also includes the ability to select fields based on a regular expression or by data type (strings, integers, etc.)

  • DB GroupBy:
    This node allows you to select fields to group by and then fields which to aggregate. You would use this node if you wanted to sum sales by state, for example. This is the equivalent of a GROUP BY clause in SQL.

  • DB Query:
    This node allows you to write a custom SQL query based on the data that flows into its input port. This is the equivalent of writing a sub-query with SQL. You can also include other tables from your initial database connection with this node for more complex queries. Note that this node requires SQL writing skills.

  • DB Sorter:
    Ths node allows you to sort the table based on one or more fields. This is the equivalent of an ORDER BY clause in SQL.

  • DB Connection Table Writer:
    This node allows you to pass the manipulated table back into the database as a new table.
A screenshot of the KNIME connector in Snowflake

While this only represents a small sample of what KNIME’s database nodes are capable of, it illustrates how you can perform operations using KNIME and Snowflake that would otherwise require a significant level of SQL expertise. 

The performance implications of this method can be substantial. Conducting a large query and pulling that data onto your local machine can be tremendously slow and potentially cause your computer to crash. Performing the same operations and leveraging the Snowflake engine, however, will be considerably faster and won’t bog your system down.

Closing

Here is a recipe for success: one part Snowflake plus one part KNIME, sprinkle in a bit of SQL knowledge and a healthy dose of cleverness and business knowledge, and you can bake an excellent analytical cake.

Did we mention that KNIME Analytics Platform is Free? Any user can go to their website and download it immediately—no strings attached. We have been using KNIME for years now, and still, come away impressed with not only the quantity but also the quality of the nodes and extensions that are simply given away. 

If you’re not using KNIME yet, why not give it a shot? It may just be the boost you need to fully utilize your Snowflake instance.

phData has a team of experts available to assist you in leveraging Snowflake with KNIME!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit