February 12, 2021

The Four Parts of a Snowflake Query

By Ben Sinchai

Why do you need to know the parts of a Snowflake query?

Running a query in the Snowflake Data Cloud isn’t fundamentally different from other platforms in the fact it takes in SQL and gives you back results. What is different is how Snowflake goes about the query process to make it more modular and configurable. While this gives you more flexibility to query your data, it can also result in vastly different query times and even return different query results based on the warehouse you choose and the security role you define as part of the query.

Because of this, it’s critical to understand the fundamentals before you can optimize your Snowflake query performance and limit your credit consumption. 

In this article, you will learn the four parts of a Snowflake query, how it unlocks greater query flexibility, and how this differs from legacy data warehouses.

To kickstart your journey to the Snowflake Data Cloud, check out this guide for practical tips and guidelines to consider.

The Four Parts of a Snowflake Query

When logging into the Snowflake Data Cloud and attempting to query data, you will need to select four pieces of context to do so.

Role

A role is “An entity to which privileges can be granted” which means it’s a re-usable entity that is granted privileges to access certain things. As a user, you can assume or change to various roles that have been granted to your user account. See below how I can assume either the SYSADMIN or the PUBLIC roles, both of which are built-in. Other common roles would be MARKETING_READ for read-only access to marketing data or ETL_WRITE for system accounts performing ETL operations.
The default role drop-down menu in Snowflake
Learn more about Snowflake Roles and Access Control.

Warehouse

The next piece of context you will need to select is a warehouse. A warehouse is a set of compute resources. Default warehouses have one cluster of resources behind a warehouse, but Enterprise Edition accounts can have multi-cluster warehouses. Multi-cluster warehouses are used to control user and query concurrency.

A warehouse is technically known as a virtual warehouse because no data is associated with the warehouse itself, only compute resources.

The size of the cluster behind the warehouse is an important tuning parameter for both performance and cost. One simple rule is that each warehouse is twice as large as the previous warehouse and query performance roughly follows the same scale. Of course, since Snowflake is truly a Cloud/SaaS offering you can auto-suspend and auto-resume warehouses.

Use these options to configure your Snowflake query

Database and Schema

A database belongs to exactly one Snowflake account and contains schemas. While a schema belongs to exactly one database and contains database objects such as tables, views, etc.

In Snowflake when you want to query, you have to choose both a Database and Schema. Together a database and schema are called a namespace. Unlike Oracle where a schema is confusingly equal to a user, a schema is simply another level of organizing tables.

This means the full coordinates to a table in Snowflake require the database, schema, and table name. That’s slightly more complex than, say, Apache Hive which has a one-level hierarchy where the term database and schema are synonymous. However, I prefer the two-level hierarchy of database and schema because it ends up being simpler to use and easy to understand.

A breakdown of Snowflake database, schemas, and tables

Running a Query

Now that you know what the four pieces of context are, you’re ready to maximize your Snowflake query performance. After all pieces of context are selected, you can run a query from the worksheet.
Running your Snowflake query
It is worth noting that while role and warehouse must be specified in the context to run a query, database and schema can be specified within the query itself.
What the search string in your Snowflake query looks like

Conclusion

That’s it! Now you understand the basic structure of Snowflake. Go forth and query!

To kickstart your journey to the Snowflake Data Cloud, check out this guide for practical tips and guidelines to consider. 

If you’re looking for additional information on managing your credit consumption and monitoring your Snowflake query performance, check out phData PAMS. This is a tool for log collecting, monitoring, and alert management that we offer as part of our services.

And, as part of our Cloud DataOps service, we also provide Snowflake Monitoring and Dashboard reporting to monitor query, credit, and usage consumption from your Snowflake environment.

Common Snowflake Query Questions

Honing in on per query cost can be an art form due to factors that aren’t related to the query. Including auto suspend time, concurrent query usage, and whether or not the query requires a warehouse. But, the general way of finding this out is by taking the query execution time and multiplying it by the size of the warehouse. Ex: 1 minute query in XL warehouse => (1 minute / 60 minutes) * 16 credits per hour for XL = 0.27 credits * per credit cost ($4) = $1.07.

After executing a query in the notebook, the results will come back at the bottom of the screen under the “Results” tab. If you are accessing Snowflake via a BI tool or other connection, the results will show up to populate a report, dashboard, or variable for code execution.

Clicking history link at the top of the navigation bar will allow you to hone in on your user account’s query history to view previous execution and results. You can further customize history by expanding search parameters to find what you are interested in.
First off, this requires another warehouse to exist, but in most organizations there are options for almost all warehouse types. To compare the performance between two separate warehouses, simply execute a query starting in one warehouse and then execute in another. By analyzing the query history you will be able to see the execution time as well as the explain plan to see how warehouses act on the same data.

This is a bit more of a complicated ask because it requires more than looking through the history table. There are queries that you can run against system tables to estimate how this is tracking over time. But the ultimate solution requires metadata and pipeline tracking to ensure you are comparing the correct query and process.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit