I often get asked by new SQL writers, “I know the absolute basics. I can get by. But what can I learn next to improve?”
My answer to that question is always Sub-Queries.
Now if you are a seasoned IT professional, SQL writer, or programmer, this concept probably comes naturally to you at this point. Most programming tools have a concept of a sub-query, subroutine, or custom functions. These are similar enough to a sub-query in SQL that using it will come naturally when picking up SQL.
However, more and more across enterprises it is an expectation that business users use SQL. If you don’t come from that IT or programming background, sub-queries can be a daunting conceptual task to learn.
What are the Basics of SQL Query Writing?
First, let’s level set with what are the basics of SQL query writing. If you know and can implement the following clauses in SQL, you know the basics:
- GROUP BY
- ORDER BY
If you are familiar with these commands, you are ready to take the next step and learn about sub-queries. If you don’t, we recommend that you take the SQL on Snowflake Foundations and Practitioner courses on Data Coach.
What is a Subquery?
A subquery is a query that appears inside another query. We often refer to this as being nested: the child query is nested within the parent query. It’s essentially a little program, embedded within a larger program.
The key is that the subquery (or nested query, or embedded query) is executed first. The results of this initial little program are fed into as inputs to the larger overall query.
Why Use Sub-Queries?
So, why do we use subqueries? It’s really simple. It avoids the need to run multiple separate queries. We can have a single query that gets the results that we need.
How to Use Sub-Queries in Snowflake
There are two main commands that we use for sub-queries in most SQL tools, the Snowflake Data Cloud included. They are the IN and WITH commands. Let’s take a look at an example using IN.
For this example, assume that I have a simple customer sales table.
I want to return the customer’s name who has the maximum sales number and only those customers. Assume I don’t know anything about the data.
Now, I could do this with two queries.
Query 1 –
SELECT MAX(SALES) FROM CUSTOMER;
This would return 200.
Query 2 –
SELECT CUSTOMER_NAME FROM CUSTOMER WHERE SALES = 200;
Now this gets me to the right answer. But I had to use two queries. I can consolidate this into one query using sub-queries. There are two ways to do this in Snowflake. Let’s start with the IN command.
Sub-Query Version –
SELECT CUSTOMER_NAME FROM CUSTOMER WHERE SALES IN (SELECT MAX(SALES) FROM CUSTOMER);
Note that we have an entire query within the WHERE clause. Remember to put this query in parentheses. Note that when using the IN statement, the sub-query needs to return only one field.
We have just gone through a real quick example of the most simple type of sub-queries you can use within Snowflake using SQL. To learn more about how to use sub-queries, take our SQL on Snowflake Calculation Specialist course on Data Coach.
You will need to know some SQL. Either finished the SQL on Snowflake Practitioner Data Coach course as a prerequisite or know all of the basic SQL commands (SELECT, FROM, WHERE, GROUP BY, ORDER BY, HAVING).
SQL on Snowflake focuses on how business users can query data artifacts within their organization themselves, without the need of going through IT or data engineering. It focuses on how to extract data from within a enterprise data warehouse, help generic insights for answering business questions. It focuses on the query commands needed by every business-side SQL practitioner to be successful with SQL in any system, but focuses specifically on the best-in-class system: Snowflake.
SnowSQL – as snowflake defines it – is not super applicable for this course. That is more of a data engineering topic. The audience for this course is business users.