March 29, 2023

Best Practices For Using Snowflake With KNIME

By John Emery

As the 21st century has progressed, the use and storage of data in business across virtually every industry has become increasingly important. Companies that fail to leverage their data are left behind while ambitious upstarts displace them.

Additionally, as the size and speed of data has increased, the need for cloud storage services has become the norm. Services such as the Snowflake Data Cloud can house massive amounts of data and allows users to write queries to rapidly transform raw data into reports and further analyses.

However, many analysts and other data professionals run into two common problems:

  1. They are not given direct access to their database
  2. They lack the skills in SQL to write the queries themselves
 

The traditional solution to these problems is to rely on IT and data engineering teams. Unfortunately, this solution can be slow and fraught with inaccuracies in the final output. What can be done? One solutions is leveraging KNIME, a free low-code, drag-and-drop analytics engineering tool.

What is KNIME?

The KNIME Analytics Platform is a free and open-source analytics engineering platform that allows users to create workflows to solve virtually any analytics challenge. Don’t let the “free” price tag fool you, KNIME Analytics Platform is a profoundly robust solution; developers can leverage nodes for geospatial analysis, imagery, audio processing, connecting to APIs, coding in Python and Java, and much, much more. 

In addition to KNIME Analytics Platform, there is a paid service called KNIME Business Hub which allows teams to publish, share, govern, and schedule workflows. 

How Can Snowflake Work with KNIME?

One of the great things about KNIME are the myriad free extensions that the base software allows you to download. One of these—the KNIME Snowflake Extension—provides nodes dedicated to connecting to a Snowflake instance. 

KNIME Database Extension Nodes

Rather than connecting to Snowflake, pulling one or more tables locally onto your machine and then doing whatever processing is necessary, KNIME provides a suite of nodes for processing in-database. By simply passing along commands to your Snowflake instance, you simultaneously leverage the power of your Snowflake warehouse and KNIME’s innovative and easy-to-use nodes. 

For somebody who cannot access their database directly or who lacks expert-level skills in SQL, this provides a significant advantage. Let’s look at a few examples of KNIME’s database nodes; to read through the full list, check out KNIME’s documentation here.

5 Database Node Examples

  • DB Concatenate: Concatenates two database tables; equivalent to the UNION operation in SQL.
  • DB Insert: Inserts rows into a selected database table; equivalent to an INSERT statement in SQL.
  • DB Partitioning: Splits rows from a database table into two subsets. The node offers different ways to split the given table. This can be especially useful when building predictive models for training and testing sets.
  • DB Query: Runs a custom query against the database connection. The query can be updated dynamically through the use of flow variables and loops.
  • DB Table Creator:  Allows a user to create a new database table based on a KNIME input table.
KNIME database node examples

Best Practices For Using Snowflake in KNIME

At this point, we hope it is apparent that leveraging KNIME with your Snowflake database can be a powerful combination. Before we leave you, let’s talk about some best practices to employ so you can fully utilize these valuable tools.

  • Use KNIME database tools.
    Pulling a very large table down from Snowflake onto your machine can take a long time and consume a massive portion of your computer’s resources. It is very likely that your Snowflake warehouse can provide more processing power than your personal machine. Between nodes that provide a set action (sorting, grouping, pivoting) and those that allow you to write custom SQL, the sky’s the limit for what can be accomplished.
  • Only use the data you need.
    Rather than querying an entire table, consider using a row filter and a column filter to remove any unnecessary rows and columns.
  • Understand data types.
    In an ideal world, all columns in a database table would be set correctly. Text columns would be strings, whole numbers would be integers, and so on. Of course, like Candide, we don’t live in the best of all possible worlds. KNIME’s DB Type Mapper node allows users to update any columns that may not be mapped appropriately.
  • Close your connection when you are done.
    Leaving database connections open can hinder the performance of the database for you and other users and services that also access the database. At the end of your workflow you should use the DB Connection Closer node to ensure it does not stay open unnecessarily.

Closing

Snowflake is one of the biggest and most prominent entities in the cloud data space. However, fully utilizing such a great tool can be a challenge for many teams. By skilling up with a tool like KNIME Analytics Platform, you can speed up the analytics process dramatically. We encourage you to browse our posts on Snowflake and KNIME to learn more!

Have questions about KNIME, Snowflake, or how they can work together to solve your analytics challenges?

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit