Running a query in the Snowflake Data Cloud isn’t fundamentally different from other platforms in the fact it takes in SQL and gives you back results. What is different is how Snowflake goes about the query process to make it more modular and configurable. While this gives you more options to query your data, it can also cause vastly different query times and even query results based on the warehouse you choose and the security role you are running as.
Because of this, it’s critical to understand the fundamentals before you can optimize your Snowflake query performance and limit your credit consumption: the four parts of the query process.
For best practices on launching your own Snowflake instance and reducing credit consumption, check out this guide.
The Four Parts of a Snowflake Query
When logging into the Snowflake Data Cloud and attempting to query data, you will need to select four pieces of context to do so.
A role is “An entity to which privileges can be granted” which means it’s a re-usable entity that is granted privileges to access certain things. As a user, you can assume or change to various roles that have been granted to your user account.
See below how I can assume either the SYSADMIN or the PUBLIC roles, both of which are built-in. Other common roles would be MARKETING_READ for read-only access to marketing data or ETL_WRITE for system accounts performing ETL operations.
Learn more about Snowflake Roles and Access Control.
The next piece of context you will need to select is a warehouse. A warehouse is a set of compute resources. Default warehouses have one cluster of resources behind a warehouse, but Enterprise Edition accounts can have multi-cluster warehouses. Multi-cluster warehouses are used to control user and query concurrency.
A warehouse is technically known as a virtual warehouse because no data is associated with the warehouse itself, only compute resources.
The size of the cluster behind the warehouse is an important tuning parameter for both performance and cost. Of course, since Snowflake is truly a Cloud/SaaS offering you can auto-suspend and auto-resume warehouses.
Database and Schema
A database belongs to exactly one Snowflake account and contains schemas. While a schema belongs to exactly one database and contains database objects such as tables, views, etc.
In Snowflake when you want to query, you have to choose both a Database and Schema. Together a database and schema are called a namespace. Unlike Oracle where a schema is confusingly equal to a user, a schema is simply another level of organizing tables.
This means the full coordinates to a table in snowflake require the database, schema, and table name. That’s slightly more complex than, say, Apache Hive which has a one-level hierarchy where the term database and schema are synonymous. However, I prefer the two-level hierarchy of database and schema because it ends up being simpler and easy to use.
Running a Query
Now that you know what the four pieces of context are, you’re ready to maximize your Snowflake query performance. After all pieces of context are selected, you can run a query from the worksheet.
It is worth noting that while role and warehouse must be specified in the context to run a query, database and schema can be specified within the query itself.
That’s it! Now you understand the basic structure of Snowflake. Go forth and query! And keep an eye out for a follow-up piece on Snowflake queries with a bigger focus on monitoring credit consumption and optimization.
If you’re looking for additional information on managing your credit consumption and monitoring your Snowflake query performance, check out phData PAMs. This is a tool for log collecting, monitoring, and alert management that we offer as part of our services.
And, as part of our Cloud DataOps service, we also provide Snowflake Monitoring and Dashboard reporting to monitor query, credit, and usage consumption from your Snowflake environment.