The Snowflake Data Cloud makes it easy to scale warehouses up and down to meet your needs. However, it is up to you to ensure that the scale and usage of those warehouses are efficient, fit your budget, and meet performance expectations.
In this blog, we’ll explore five simple queries that will help you identify inefficiencies in your Snowflake data warehouses to ensure you aren’t using more credits than necessary.
Ultimately, this blog will help you get the most out of your Snowflake usage. With that, let’s dive into the queries.
Identify Oversized Warehouses
This query will identify warehouses that are too large for their workloads by revealing warehouses with consistently low query average load percentages or low max query loads, indicating potential downsizing opportunities for cost efficiency.
SELECT
warehouse_name,
warehouse_size,
MAX(query_load_percent) AS max_query_load,
AVG(query_load_percent) AS avg_query_load,
COUNT(*) AS query_count
FROM
snowflake.account_usage.query_history
WHERE start_time >= DATEADD('DAY', -30, CURRENT_TIMESTAMP())
AND query_load_percent IS NOT NULL
GROUP BY 1, 2
HAVING max_query_load < 100 OR avg_query_load < 75
ORDER BY 3, 4;
Warehouses identified by this query can be resized to a smaller warehouse by running the ALTER
command against a warehouse:
ALTER WAREHOUSE SET warehouse_size =
Identify Undersized Warehouses Using Disk Spillage
Warehouses come with some amount of memory according to their size, but if the memory fills up, the data will spill, first to the local disk and then to the remote disk. Too much spillage to the local disk can slow a query down, and spillage to the remote disk can slow a query even more. Excessive disk spillage indicates that a warehouse is too small for its workload and should be increased in size.
To find warehouses with spillage, use two separate queries for local and remote disk spillages:
SELECT
DATE_TRUNC('day', START_TIME) AS DATE,
WAREHOUSE_NAME,
SUM(BYTES_SPILLED_TO_LOCAL_STORAGE)
/ POWER(1024, 3) AS "Local Spillage (GB)"
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE
START_TIME >= CURRENT_DATE() - INTERVAL '30 DAYS'
AND BYTES_SPILLED_TO_LOCAL_STORAGE > 0
GROUP BY 1, 2
ORDER BY "Local Spillage (GB)" DESC;
SELECT
DATE_TRUNC('day', START_TIME) AS DATE,
WAREHOUSE_NAME,
SUM(BYTES_SPILLED_TO_REMOTE_STORAGE)
/ POWER(1024, 3) AS "Remote Spillage (GB)"
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE
START_TIME >= CURRENT_DATE() - INTERVAL '30 DAYS'
AND BYTES_SPILLED_TO_REMOTE_STORAGE > 0
GROUP BY 1, 2
ORDER BY "Remote Spillage (GB)" DESC;
Some spillage is okay, and the larger warehouses working on larger data can be expected to have large spillage.
Like above, warehouses identified by this query can be resized to a smaller warehouse by running the ALTER
command against a warehouse:
ALTER WAREHOUSE SET WAREHOUSE_SIZE = ;
Using a large warehouse if there is excessive disk spillage can decrease total credit consumption, but make sure to create a benchmark before and after the change.
Ensure Warehouses use auto_suspend and auto_resume
Snowflake can automatically pause and resume warehouses as needed. Long-running but unused warehouses can be very costly, but the fix is easy.
Snowflake warehouse start time is extremely quick, so unless you have a very time-sensitive query, using the auto_suspend
and auto_resume
properties is recommended.
SHOW WAREHOUSES IN ACCOUNT;
SELECT
"name",
"auto_resume",
"auto_suspend",
"resumed_on"
FROM table(result_scan(last_query_id()))
WHERE "auto_resume" = 'false'
OR "auto_suspend" = 0
OR "auto_suspend" = null;
Enable auto_suspend
by setting the property value to the number of seconds past when the warehouse becomes unused:
ALTER WAREHOUSE SET AUTO_SUSPEND = 1;
Warehouses can usually have a low auto_suspend
time unless keeping the caches ‘warm’ by leaving them running will improve performance enough to offset the cost savings.
Enable auto_resume
by setting the value of the parameter to TRUE
:
ALTER WAREHOUSE SET AUTO_RESUME = TRUE;
Ensure All Warehouses Have Resource Monitors
It’s not fun to get your Snowflake bill and see that a misconfiguration, inefficient query, or bad automation has caused it to balloon.
Snowflake provides resource monitors to either alert administrators or automatically suspend warehouses that have reached some credit threshold. Warehouses should always have resource monitors.
This query simply checks if the resource_monitor
parameter is empty:
SHOW WAREHOUSES;
SELECT
"name"
FROM table(result_scan(last_query_id()))
WHERE "resource_monitor" = 'null';
This example shows creating a resource monitor that will suspend the warehouse after a query is finished:
USE ROLE ACCOUNTADMIN;
CREATE RESOURCE MONITOR WITH CREDIT_QUOTA=1000
TRIGGERS ON 100 PERCENT DO SUSPEND;
ALTER WAREHOUSE SET RESOURCE_MONITOR = ;
For advanced usage (including adding alerts before the quota is completely used up), check out the Snowflake documentation on Working with Resource Monitors.
Identifying Idle Warehouses
Idle warehouses that sit unused are (at the very least) at risk for misconfiguration and can be a credit-overconsumption risk if misused. It’s a good practice to only keep around the warehouses being actively used.
This query pulls warehouse events from the account_usage
schema and then sorts by idle days:
WITH wh_events AS (
SELECT
WAREHOUSE_NAME,
MAX(TIMESTAMP) AS LAST_USED_DATE
FROM
warehouse_events_history
WHERE WAREHOUSE_ID NOT IN (
SELECT DISTINCT WAREHOUSE_ID
FROM snowflake.account_usage.warehouse_events_history
WHERE EVENT_NAME = 'DROP_WAREHOUSE'
)
AND EVENT_NAME IN ('CREATE_WAREHOUSE', 'RESUME_CLUSTER', 'RESUME_WAREHOUSE')
GROUP BY WAREHOUSE_NAME
)
SELECT
WAREHOUSE_NAME,
LAST_USED_DATE,
DATEDIFF(DAY, LAST_USED_DATE, CURRENT_TIMESTAMP()) AS IDLE_DAYS
FROM wh_events
WHERE IDLE_DAYS >= 30
ORDER BY LAST_USED_DATE;
This query helps in pinpointing warehouses that haven’t been used for the last 30 days, allowing you to make decisions about suspending or decommissioning these warehouses to save costs.
What's Next?
Efficient warehouse management in Snowflake is important to maintaining a high-performing, cost-effective data environment. Regularly implementing these strategies will ensure your Snowflake warehouses are optimally configured, leading to improved overall efficiency and cost savings.
Of course, there are many more opportunities to improve your account’s security, performance, operations, and cost-efficiency.
At phData, we released a free solution called the Advisor Tool that makes it easy to ensure all the queries above are run on a regular schedule while also providing a comprehensive report of all our best practices.
Try out our Snowflake Advisor tool today with one click in the Snowflake Marketplace or via the phData Toolkit!