November 1, 2023

How To Control and Estimate Costs With Snowflake

By Justin Delisi

This blog was originally written by Keith Smith and updated for 2023/2024 by Justin Delisi.

The Snowflake Data Cloud offers a scalable, cloud-native data warehouse that provides the flexibility, performance, and ease of use needed to meet the demands of modern businesses. While Snowflake offers unparalleled capabilities for data processing and analytics, it’s essential to keep a watchful eye on your costs.

Cloud computing resources come with a price tag, and without proper cost management, you might find your cloud expenses spiraling out of control.

In this blog, we will explore how to effectively estimate costs within the Snowflake data warehousing environment, help determine a baseline of costs with Snowflake, and explore a few key features that will add to your costs.

Addressing Cost Concerns with Snowflake

How do I control my Snowflake costs when everything is designed to charge me money? Isn’t Snowflake in the business of trying to charge me the most credits for my queries? How can I be sure that I stay within my budget based on cost projections?

Part of moving to new technologies (even the most promising) is understanding the question at the back of every buyer’s mind: what is it going to cost to get started?

The answer doesn’t have to be difficult or complicated.

Traditionally, the estimation process had many moving parts and estimations had to be exact for the ROI to prove out. Now, the cloud has flipped this on its head: you, as a consumer, pay only for what you use. 

But what does that actually mean?

If you don’t use a product, you will not pay for it. But if you use a product a lot, then you will pay what appears to be a lot of money for it. 

This is amplified when you pay for what you use a lot and are doing it inefficiently. Inefficiency doesn’t scale — and you can pay for it many times over.

Many customers are faced with breaking old habits when moving from on-premises to cloud products in this regard; when running processes on-premises, inefficiencies are generally significantly cheaper.

Today, alleviating cost concerns for Snowflake are about paying for what you’re using and using the platform efficiently.

This fear around continuous cost is normal —  every customer we’ve brought on board has faced inefficiencies. In fact, phData has gone through this internally. We are continually looking at how our internal usage of cloud-based products including Snowflake as they are being used for development and driving our business forward.

The reason this is difficult is because, in most instances, there aren’t clear guardrails on how to ensure your 12-month budget isn’t gone in 3 months. 

In practice, we see customers solve this problem by setting reasonable constraints around their warehouses, building dashboards to pinpoint issues, and constantly re-evaluating the way data is being used and consumed across the platform.

Let’s unpack exactly what this means and help you understand where we see customers struggle.

A Breakdown of Snowflake Costs

You can estimate your Snowflake costs in 4 steps: 

  1. Estimate the number of warehouses by size that are required.
  2. Estimate the amount of compute hours that each warehouse will use by size.
  3. Estimate the storage size of your data.
  4. Determine the features required to determine the Snowflake account level required.
*Note that these costs per credit are based on an account in AWS US-East-1 and may differ depending on the cloud provider/region the Snowflake account is created in.

Once all of the above are determined, it is pretty straightforward to provide an estimate based on cost per credit and number of credits that will be consumed.

But that’s the baseline. There are other questions to consider:

  • Are you going to be using replication?
    • Storage: Amount of table data in the primary database, or databases in a replication/failover group, that changes as a result of data loading or DML operations.
    • Compute resources: Frequency of secondary database, or replication/failover group, refreshes from the primary database or replication/failover group.
  • Are you going to be using Materialized Views?
    • Storage: Each materialized view stores query results, which adds to the monthly storage usage for your account.
    • Compute resources: In order to prevent materialized views from becoming out-of-date, Snowflake performs automatic background maintenance of materialized views. When a base table changes, all materialized views defined on the table are updated by a background service that uses compute resources provided by Snowflake.
  • Are you going to be using Snowpipe?
    • Given the number of factors that can differentiate Snowpipe loads, it is very difficult for Snowflake to provide sample costs. File formats and sizes, and the complexity of COPY statements (including SELECT statement used for transformations), all impact the resource consumption and file overhead charged for a Snowpipe load.
  • Are you going to be using Search Optimization Service?
    • Storage resources: The search optimization service creates a search access path data structure that requires space for each table on which search optimization is enabled.
    • Compute resources: Adding search optimization to a table consumes resources during the initial build phase. Maintaining the search optimization service also requires resources. Resource consumption is higher when there is high churn (i.e. when large volumes of data in the table change). These costs are roughly proportional to the amount of data ingested (added or changed). Deletes also have some costs.

Why do each of the above matter? 

While tracking SQL query usage and credit consumption is relatively straightforward via query history, the functions listed above are not straightforward and often difficult to track. As a new customer, you might not know what to expect. As an experienced customer, it can quickly become a black box. 

Trying to determine how best to transform and serve your data?
Check out our blog on transforming data with tasks and views.

How do I Ensure Snowflake Costs Are Efficient?

When you blow out your budget in 3 months (or are on track to), how can you pinpoint the areas that need to improve? 

Can you point to bad (long-running) SQL? Is it tied to your ingest process? Is there a business unit that is consuming far more credits than necessary? Are developers letting queries run over the weekend?

It can be difficult for an organization to get clear answers to these questions when they don’t have the necessary controls to reign in the spending — especially when the promise is on demand data for anyone and everyone across the organization.

phData's Recommendation

Combine a well defined Information Architecture with metadata tagging and integrating with reports that tie into SQL history/credit analysis. 

This lets us give users insight that alleviates overspending concerns with a simple dashboard where they can take actionable steps to mitigate minor issues before they become major budgetary problems.

phData has also created an Advisor Tool which we created so you can quickly and easily identify opportunities to improve the configuration, security, performance, and efficiency of your Snowflake environment. We have worked closely with our Snowflake data and operations experts to ensure that our Snowflake Advisor delivers actionable recommendations based on best practices and phData’s vast real-world experience on the Snowflake Platform.

Whatever your reason for exploring Snowflake, it’s critical to take the time to define a launch strategy and deploy the platform correctly to drive adoption, efficiency, and long-term cost savings.

Conclusion

Snowflake has truly revolutionized the way modern businesses handle their data processing and analytics needs. However, as we’ve explored in this blog, even the most powerful tools come with a caveat – the cost.

Cloud computing resources can quickly become a financial burden if not managed effectively.

By taking a proactive approach to cost estimation, you can make informed decisions about your data warehouse usage, ensuring that you get the most out of Snowflake while keeping your budget in check.

Looking for more best practices for Snowflake and cost management?

Download our Getting Started with Snowflake Guide for actionable steps on how to get the most out of Snowflake.

Common Snowflake Cost Questions ​

The most common inefficiencies that we experience typically boil down to design standards. A well designed architecture means raw data gets enriched and standardized through multiple stages. As standards are defined this data should be persisted to a physical table instead of joining views together or creating join columns via subselect queries. This can cause unnecessary table scans and longer run times.

The most common problem our customers have is ingesting data into Snowflake efficiently. This can manifest itself in multiple forms, file sizes, file counts, file speed, file types; and they all lead to increased credit consumption. This problem can typically be solved via tweaks to the ingestion process. Sometimes, it requires tooling that compliments the Snowflake data cloud and helps it meet broader enterprise needs. 

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit