Top 5 Queries To Optimize Snowflake Warehouses

The Snowflake Data Cloud makes it easy to scale warehouses up and down to meet your needs. However, it is up to you to ensure that the scale and usage of those warehouses are efficient, fit your budget, and meet performance expectations.

In this blog, we’ll explore five simple queries that will help you identify inefficiencies in your Snowflake data warehouses to ensure you aren’t using more credits than necessary.

Ultimately, this blog will help you get the most out of your Snowflake usage. With that, let’s dive into the queries.

Identify Oversized Warehouses

This query will identify warehouses that are too large for their workloads by revealing warehouses with consistently low query average load percentages or low max query loads, indicating potential downsizing opportunities for cost efficiency.

				
					

SELECT
  warehouse_name,
  warehouse_size,
  MAX(query_load_percent) AS max_query_load,
  AVG(query_load_percent) AS avg_query_load,
  COUNT(*) AS query_count
FROM
  snowflake.account_usage.query_history
WHERE start_time >= DATEADD('DAY', -30, CURRENT_TIMESTAMP())
  AND query_load_percent IS NOT NULL
GROUP BY 1, 2
  HAVING max_query_load < 100 OR avg_query_load < 75
  ORDER BY 3, 4;

Warehouses identified by this query can be resized to a smaller warehouse by running the ALTER command against a warehouse:

				
					ALTER WAREHOUSE <warehouse_name> SET warehouse_size = <new_size>

Identify Undersized Warehouses Using Disk Spillage

Warehouses come with some amount of memory according to their size, but if the memory fills up, the data will spill, first to the local disk and then to the remote disk. Too much spillage to the local disk can slow a query down, and spillage to the remote disk can slow a query even more. Excessive disk spillage indicates that a warehouse is too small for its workload and should be increased in size.

To find warehouses with spillage, use two separate queries for local and remote disk spillages:

				
					SELECT
    DATE_TRUNC('day', START_TIME) AS DATE,
    WAREHOUSE_NAME,
    SUM(BYTES_SPILLED_TO_LOCAL_STORAGE)
    / POWER(1024, 3) AS "Local Spillage (GB)"
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE
    START_TIME >= CURRENT_DATE() - INTERVAL '30 DAYS'
    AND BYTES_SPILLED_TO_LOCAL_STORAGE > 0
GROUP BY 1, 2
ORDER BY "Local Spillage (GB)" DESC;

				
					SELECT
    DATE_TRUNC('day', START_TIME) AS DATE,
    WAREHOUSE_NAME,
    SUM(BYTES_SPILLED_TO_REMOTE_STORAGE)
    / POWER(1024, 3) AS "Remote Spillage (GB)"
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE
    START_TIME >= CURRENT_DATE() - INTERVAL '30 DAYS'
    AND BYTES_SPILLED_TO_REMOTE_STORAGE > 0
GROUP BY 1, 2
ORDER BY "Remote Spillage (GB)" DESC;

Some spillage is okay, and the larger warehouses working on larger data can be expected to have large spillage.

Like above, warehouses identified by this query can be resized to a smaller warehouse by running the ALTER command against a warehouse:

				
					ALTER WAREHOUSE <warehouse_name> SET WAREHOUSE_SIZE = <new_size>;

Using a large warehouse if there is excessive disk spillage can decrease total credit consumption, but make sure to create a benchmark before and after the change.

Ensure Warehouses use auto_suspend and auto_resume

Snowflake can automatically pause and resume warehouses as needed. Long-running but unused warehouses can be very costly, but the fix is easy.

Snowflake warehouse start time is extremely quick, so unless you have a very time-sensitive query, using the auto_suspend and auto_resume properties is recommended.

				
					SHOW WAREHOUSES IN ACCOUNT;

SELECT
 "name",
 "auto_resume",
 "auto_suspend",
 "resumed_on"
FROM table(result_scan(last_query_id()))
WHERE "auto_resume" = 'false'
 OR "auto_suspend" = 0 
 OR "auto_suspend" = null;

Enable auto_suspend by setting the property value to the number of seconds past when the warehouse becomes unused:

				
					ALTER WAREHOUSE <warehouse_name> SET AUTO_SUSPEND = 1;

Warehouses can usually have a low auto_suspend time unless keeping the caches ‘warm’ by leaving them running will improve performance enough to offset the cost savings.

Enable auto_resume by setting the value of the parameter to TRUE:

				
					ALTER WAREHOUSE <warehouse_name> SET AUTO_RESUME = TRUE;

Ensure All Warehouses Have Resource Monitors

It’s not fun to get your Snowflake bill and see that a misconfiguration, inefficient query, or bad automation has caused it to balloon.

Snowflake provides resource monitors to either alert administrators or automatically suspend warehouses that have reached some credit threshold. Warehouses should always have resource monitors.

This query simply checks if the resource_monitor parameter is empty:

				
					SHOW WAREHOUSES;

SELECT
    "name"
FROM table(result_scan(last_query_id())) 
WHERE "resource_monitor" = 'null';

This example shows creating a resource monitor that will suspend the warehouse after a query is finished:

				
					USE ROLE ACCOUNTADMIN;

CREATE RESOURCE MONITOR <name>  WITH CREDIT_QUOTA=1000
  TRIGGERS ON 100 PERCENT DO SUSPEND;

ALTER WAREHOUSE <warehouse_name> SET RESOURCE_MONITOR = <monitor_name>;

For advanced usage (including adding alerts before the quota is completely used up), check out the Snowflake documentation on Working with Resource Monitors.

Identifying Idle Warehouses

Idle warehouses that sit unused are (at the very least) at risk for misconfiguration and can be a credit-overconsumption risk if misused. It’s a good practice to only keep around the warehouses being actively used.

This query pulls warehouse events from the account_usage schema and then sorts by idle days:

				
					WITH wh_events AS (
 SELECT
   WAREHOUSE_NAME,
   MAX(TIMESTAMP) AS LAST_USED_DATE
 FROM
   warehouse_events_history
 WHERE WAREHOUSE_ID NOT IN (
     SELECT DISTINCT WAREHOUSE_ID
     FROM snowflake.account_usage.warehouse_events_history
     WHERE EVENT_NAME = 'DROP_WAREHOUSE'
   )
   AND EVENT_NAME IN ('CREATE_WAREHOUSE', 'RESUME_CLUSTER', 'RESUME_WAREHOUSE')
 GROUP BY WAREHOUSE_NAME
)

SELECT
 WAREHOUSE_NAME,
 LAST_USED_DATE,
 DATEDIFF(DAY, LAST_USED_DATE, CURRENT_TIMESTAMP()) AS IDLE_DAYS
FROM wh_events
WHERE IDLE_DAYS >= 30
ORDER BY LAST_USED_DATE;

This query helps in pinpointing warehouses that haven’t been used for the last 30 days, allowing you to make decisions about suspending or decommissioning these warehouses to save costs.

What's Next?

Efficient warehouse management in Snowflake is important to maintaining a high-performing, cost-effective data environment. Regularly implementing these strategies will ensure your Snowflake warehouses are optimally configured, leading to improved overall efficiency and cost savings.

Of course, there are many more opportunities to improve your account’s security, performance, operations, and cost-efficiency.

At phData, we released a free solution called the Advisor Tool that makes it easy to ensure all the queries above are run on a regular schedule while also providing a comprehensive report of all our best practices.

Try out our Snowflake Advisor tool today with one click in the Snowflake Marketplace or via the phData Toolkit!

Top 5 Queries To Optimize Snowflake Warehouses

Identify Oversized Warehouses

Identify Undersized Warehouses Using Disk Spillage

Ensure Warehouses use auto_suspend and auto_resume

Ensure All Warehouses Have Resource Monitors

Identifying Idle Warehouses

What's Next?

More to explore

How to Use Maps in Sigma Computing

Alteryx Server 2024.2: New Features and How to Update

From Pipelines to Loops: How Fivetran + Census Reflects a Shift in Data Architecture

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit

Industries

Solutions

Company

Technology Partners

Other Technology Partners

Check out our latest insights

How to Use Maps in Sigma Computing

Alteryx Server 2024.2: New Features and How to Update

Data Engineering

Consulting, Migrations, Data Pipelines, DataOps

Change Management, Enablement & Learning

COE, Coaching, PMO

Data Science and Machine Learning Services

MLOps Enablement, Prototyping, Model Development and Deployment

Strategy Services

Data, Analytics, and AI Strategy, Architecture and Assessments

Reporting, Analytics, and Visualization Services

Self-Service, Integrated Analytics, Dashboards, Automation

Elastic Operations

Data Platforms, Data Pipelines, and Machine Learning