The Four Parts of a Snowflake Query

Why do you need to know the parts of a Snowflake query?

Running a query in the Snowflake Data Cloud isn’t fundamentally different from other platforms in the fact it takes in SQL and gives you back results. What is different is how Snowflake goes about the query process to make it more modular and configurable. While this gives you more flexibility to query your data, it can also result in vastly different query times and even return different query results based on the warehouse you choose and the security role you define as part of the query.

Because of this, it’s critical to understand the fundamentals before you can optimize your Snowflake query performance and limit your credit consumption.

In this article, you will learn the four parts of a Snowflake query, how it unlocks greater query flexibility, and how this differs from legacy data warehouses.

To kickstart your journey to the Snowflake Data Cloud, check out this guide for practical tips and guidelines to consider.

The Four Parts of a Snowflake Query

When logging into the Snowflake Data Cloud and attempting to query data, you will need to select four pieces of context to do so.

Role

A role is “An entity to which privileges can be granted” which means it’s a re-usable entity that is granted privileges to access certain things. As a user, you can assume or change to various roles that have been granted to your user account. See below how I can assume either the SYSADMIN or the PUBLIC roles, both of which are built-in. Other common roles would be MARKETING_READ for read-only access to marketing data or ETL_WRITE for system accounts performing ETL operations.

Learn more about Snowflake Roles and Access Control.

Warehouse

The next piece of context you will need to select is a warehouse. A warehouse is a set of compute resources. Default warehouses have one cluster of resources behind a warehouse, but Enterprise Edition accounts can have multi-cluster warehouses. Multi-cluster warehouses are used to control user and query concurrency.

A warehouse is technically known as a virtual warehouse because no data is associated with the warehouse itself, only compute resources.

The size of the cluster behind the warehouse is an important tuning parameter for both performance and cost. One simple rule is that each warehouse is twice as large as the previous warehouse and query performance roughly follows the same scale. Of course, since Snowflake is truly a Cloud/SaaS offering you can auto-suspend and auto-resume warehouses.

Database and Schema

A database belongs to exactly one Snowflake account and contains schemas. While a schema belongs to exactly one database and contains database objects such as tables, views, etc.

In Snowflake when you want to query, you have to choose both a Database and Schema. Together a database and schema are called a namespace. Unlike Oracle where a schema is confusingly equal to a user, a schema is simply another level of organizing tables.

This means the full coordinates to a table in Snowflake require the database, schema, and table name. That’s slightly more complex than, say, Apache Hive which has a one-level hierarchy where the term database and schema are synonymous. However, I prefer the two-level hierarchy of database and schema because it ends up being simpler to use and easy to understand.

Running a Query

Now that you know what the four pieces of context are, you’re ready to maximize your Snowflake query performance. After all pieces of context are selected, you can run a query from the worksheet.

It is worth noting that while role and warehouse must be specified in the context to run a query, database and schema can be specified within the query itself.

Conclusion

That’s it! Now you understand the basic structure of Snowflake. Go forth and query!

To kickstart your journey to the Snowflake Data Cloud, check out this guide for practical tips and guidelines to consider.

If you’re looking for additional information on managing your credit consumption and monitoring your Snowflake query performance, check out phData PAMS. This is a tool for log collecting, monitoring, and alert management that we offer as part of our services.

And, as part of our Cloud DataOps service, we also provide Snowflake Monitoring and Dashboard reporting to monitor query, credit, and usage consumption from your Snowflake environment.

Common Snowflake Query Questions

Now that I know the critical parts of a Snowflake query, how can I determine how many credits each query uses?

Honing in on per query cost can be an art form due to factors that aren’t related to the query. Including auto suspend time, concurrent query usage, and whether or not the query requires a warehouse. But, the general way of finding this out is by taking the query execution time and multiplying it by the size of the warehouse. Ex: 1 minute query in XL warehouse => (1 minute / 60 minutes) * 16 credits per hour for XL = 0.27 credits * per credit cost ($4) = $1.07.

Now that I can execute a query, how do I see the results?

After executing a query in the notebook, the results will come back at the bottom of the screen under the “Results” tab. If you are accessing Snowflake via a BI tool or other connection, the results will show up to populate a report, dashboard, or variable for code execution.

Now that I have executed a query, how do I view the list of queries that I have run in the past?

Clicking history link at the top of the navigation bar will allow you to hone in on your user account’s query history to view previous execution and results. You can further customize history by expanding search parameters to find what you are interested in.

How can I compare the query performance when I change warehouses?

First off, this requires another warehouse to exist, but in most organizations there are options for almost all warehouse types. To compare the performance between two separate warehouses, simply execute a query starting in one warehouse and then execute in another. By analyzing the query history you will be able to see the execution time as well as the explain plan to see how warehouses act on the same data.

How can I track the performance of the query over time as the table grows in size?

This is a bit more of a complicated ask because it requires more than looking through the history table. There are queries that you can run against system tables to estimate how this is tracking over time. But the ultimate solution requires metadata and pipeline tracking to ensure you are comparing the correct query and process.

The Four Parts of a Snowflake Query

The Four Parts of a Snowflake Query

Role

Warehouse

Database and Schema

Running a Query

Conclusion

Common Snowflake Query Questions

More to explore

Snowflake Query Tagging Best Practices

Data Ingestion from PostgreSQL to Snowflake using Openflow

The GenAI Strategy Playbook

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit

Industries

Solutions

Company

Technology Partners

Other Technology Partners

Check out our latest insights

Snowflake Query Tagging Best Practices

Data Ingestion from PostgreSQL to Snowflake using Openflow

Data Engineering

Consulting, Migrations, Data Pipelines, DataOps

Change Management, Enablement & Learning

COE, Coaching, PMO

Data Science and Machine Learning Services

MLOps Enablement, Prototyping, Model Development and Deployment

Strategy Services

Data, Analytics, and AI Strategy, Architecture and Assessments

Reporting, Analytics, and Visualization Services

Self-Service, Integrated Analytics, Dashboards, Automation

Elastic Operations

Data Platforms, Data Pipelines, and Machine Learning