June 22, 2022

How to Estimate Compute Size for Snowflake Virtual Warehouse

By Kenton Steiner

When getting started with Snowflake, one of the largest challenges can be determining the size and structure of the virtual warehouses that are provisioned within your account. 

With various sizes, settings, and billing differences, it can be difficult to try to find the perfect match of price with performance, so that you aren’t over-paying for an architecture beyond your needs or experiencing performance bottlenecks from overwhelming your resources. Luckily, Snowflake provides several different features to allow you more flexibility with your resources to help you get to the perfect fit for your needs.  

In this blog, we will talk about Snowflake Virtual Warehouses and the features Snowflake provides to help you minimize costs while running your warehouses.

How are Credits Charged for Warehouses?

Snowflake Virtual Warehouses are clusters of computing resources, (CPU, memory, storage), that are required to perform operations on data within Snowflake like queries, loading, and unloading. 

As one of the core aspects of Snowflake, it is important to understand how these resources are billed. I will touch briefly on the key points, but for a full breakdown of Snowflake costs, please check out “How to Control and Estimate Costs with Snowflake!”

Snowflake warehouse sizing with credit per hour table

Snowflake virtual warehouses come in what they refer to as “T-shirt sizes,” starting with X-small for the smallest and going all the way up to 4X-Large, as 5X and 6X are in private preview. 

As your size increases, the credits/second also doubles due to the number of servers in the warehouse.

running time cost in credits by size of warehouse in Snowflake

The second important feature of Snowflake billing to note is that the minimum amount of time you will be billed for is 60 seconds when a warehouse starts running. With this in mind, you are trying to make sure each warehouse will run for at least 60 seconds when active to maximize the time you are paying for.

How to Optimize Your Snowflake Virtual Warehouse Size

As I mentioned earlier, correctly sizing your Warehouses may seem daunting but Snowflake has a number of features to help you find your ideal state.

Creating a Warehouse and Selecting Initial Size 

creating a warehouse in Snowflake

First, you need to create your virtual warehouse. Selecting the initial size of the warehouse depends upon the intended usage of the warehouse. 

If you are planning to use the warehouse for smaller data loading operations or queries in a smaller-scale testing environment, then smaller warehouse sizes (X-small, small, medium) should work for your use cases.

 

Alternatively, if you are bulk loading large numbers of files concurrently or performing queries in a large-scale production environment, then you might need a large, X-large, or larger-sized warehouse to be the most cost-effective.  

Snowflake recommends experimenting with running queries against warehouses of multiple different sizes to test and see which size will best meet your needs.

Automating Warehouse Suspension/Resumption

While selecting the initial size is important, Snowflake offers automatic warehouse suspension and resumption to give you more flexibility to start with a larger size and be able to re-evaluate and adjust size based on your workloads. 

Warehouses can be set to auto-suspend after a period of no activity, such as 10 minutes in the example above. Since Snowflake bills credits based on when warehouses run as opposed to when they are processing requests, this can prevent warehouses from consuming credits when they aren’t being used. Unless you have a large, steady workload or you need instant availability, setting an auto-suspend time for your warehouse can be very helpful.  

Similarly, auto-resume can also give you more control over the costs and access to warehouses.  If you want more manual control over when your warehouse resumes processing, then you can disable auto-resume. Otherwise, the warehouse will spin back up whenever new queries are submitted. 

Scaling Up/Down vs. Scaling In/Out

Snowflake also offers scaling up/down and scaling out/in to resize warehouses to better meet your needs. 

Warehouse Resizing Improves Performance

Scaling up or down would be resizing the warehouse, going from medium to large or large to small, in order to improve query performance. Scaling up typically occurs when dealing with more complex queries, and can happen at any point in time, even if the warehouse is currently running.  

A good candidate for scaling up would be a query that is running slowly with additional queries of similar size and complexity that you want to run on the same warehouse. A few points to note about scaling up are that larger warehouses do not necessarily mean faster query execution for simple queries, so scaling up may not yield large improvements in processing speed, and the newly provisioned resources will only handle new queries, while currently running queries will finish being processed by the current resources.  

When scaling down, compute resources and the cache associated with those resources are dropped, which can impact performance due to no longer having the full extent of the cache available.

Multi-Cluster Warehouses Improve Concurrency

Scaling out/in is a feature introduced in Snowflake Enterprise Edition and higher which allows you to add and remove clusters to multi-cluster warehouses to handle performance and queuing issues related to large numbers of concurrent users and queries. 

Minimum & Maximum Clusters

From the example above, you can see the parameters to set for scaling out/in are min and max number of clusters and scaling policy. Unless there are specific requirements for the warehouse to be running in maximized mode (all clusters in a warehouse always running) or there are concerns about high-availability of the warehouse, the recommendation is to set minimum clusters to 1 and maximum clusters to 8 while being mindful of costs to configure the warehouse to run in auto-scale mode. 

When dealing with a varying number of users and queries, Snowflake will automatically scale out and add more active clusters during periods of higher activity, and then remove clusters when the load decreases. 

Scaling Policy

You are also able to configure how Snowflake runs auto-scaling mode further based on the policy type, as you can select economy and standard mode for the scaling policy field. In a standard policy, if the system detects there is one more query than the current number of clusters can handle, it will start a new cluster, and then shuts down a cluster after 2-3 consecutive successful checks in 1 minute intervals to determine if the load on the least loaded cluster can be redistributed to the other clusters without spinning up the cluster again. 

For the economy policy, the system will only spin up a new cluster if it determines there is enough work to keep it busy for 6 minutes, and shuts down a cluster after performing 5-6 consecutive successful checks at 1 minute intervals to determine if the load on the least loaded cluster can be redistributed to the other clusters. The economy policy will favor keeping the current clusters running at full capacity and result in a bit longer queue times due to waiting to start clusters longer than standard, but also will do a better job of conserving credits. 

Tips and Reminders for Sizing Snowflake Warehouses

  • Snowflake charges credits based on the size of the warehouse and time running, with a minimum charge time of 60 seconds
  • Try out queries on warehouses of multiple sizes in order to determine which size to use initially
  • Auto-suspend and Auto-resume can help conserve costs when warehouses are inactive for longer periods of time
  • Snowflake offers scaling up/down and scaling out/in for optimizing your usage based on query complexity and load

Closing

Though there can be a lot of factors to consider when trying to determine the most efficient balance between compute power and cost, Snowflake has introduced a number of features to help you not overpay for resources and still maintain the performance you require.  

Features like auto suspend/resume allows you the freedom to  not have to worry about the provisioning of resources while Snowflake works behind the scenes. As your needs change, resizing and scaling provide the flexibility to easily keep up with your demand seamlessly and optimize your costs to provide you with more peace of mind as you manage your resources. For more tips on working with Snowflake, please visit our blog!

Looking to optimize how your organization uses Snowflake? Our team of experts are here to help!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit