How do I control my Snowflake costs when everything is designed to charge me money? Isn't Snowflake in the business of trying to charge me the most credits for my queries? How can I be sure that I stay within my budget based on cost projections?
Part of moving to new technologies (even the most promising) is understanding the question at the back of every buyer’s mind: what is it going to cost to get started?
The answer doesn’t have to be difficult or complicated.
Traditionally, the estimation process had many moving parts and estimations had to be exact for the ROI to prove out. Now, the cloud has flipped this on its head: you, as a consumer, pay only for what you use.
But what does that actually mean?
If you don’t use a product, you will not pay for it. But if you use a product a lot, then you will pay what appears to be a lot of money for it.
This is amplified when you pay for what you use a lot and are doing it inefficiently. Inefficiency doesn’t scale — and you can pay for it many times over.
Many customers are faced with breaking old habits when moving from on-premises to cloud products in this regard; when running processes on-premises, inefficiencies are generally significantly cheaper.
Addressing Cost Concerns with Snowflake
Today, alleviating cost concerns for Snowflake are about paying for what you’re using and using the platform efficiently.
This fear around continuous cost is normal — every customer we’ve brought on board has faced iniffencies. In fact, phData has gone through this internally. We are continually looking at how our internal use of the cloud & subsequently Snowflake are being used for development and driving business forward.
The reason this is difficult is because, in most instances, there aren’t clear guardrails on how to ensure your 12 month budget isn’t gone in 3 months.
In practice, we see customers solve this problem by setting reasonable constraints around their warehouses, building dashboards to pinpoint issues, and constantly re-evaluating the way data is being used and consumed across the platform.
Looking to estimate Snowflake costs? Check out our blog estimating customer size and cost
Let’s unpack exactly what this means and help you understand where we see customers struggle.
A Breakdown of Snowflake Costs
You can estimate your Snowflake costs in 4 steps:
- Estimate the number of warehouses by size that are required
- Estimate the amount of compute hours that each warehouse will use by size
- Estimate the storage size of your data
- Determine the features required to determine the Snowflake account level required
Snowflake Warehouse Sizes and Credit Usage per Hour |
||||||||
---|---|---|---|---|---|---|---|---|
Size | XSmall | Small | Medium | Large | XLarge | 2X | 3X | 4X |
Credits Per Hour | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 |
New in 2022, there’s now a 5X-Large and 6X-Large available as a preview feature on AWS. These use 256 and 512 credits respectively.
Once all of the above are determined, it is pretty straightforward to provide an estimate based on cost per credit and number of credits that will be consumed.
But that’s the baseline. There are other questions to consider:
- Are you going to be using any “serverless” Snowflake functions?
- Are you going to be using Materialized Views?
- Are you going to be using Snowpipe?
- Are you going to be using UDFs or external functions?
Why do each of the above matter?
While tracking SQL query usage and credit consumption is relatively straightforward via query history, the functions listed above are not straightforward and often difficult to track. As a new customer, you might not know what to expect. As an experienced customer, it can quickly become a black box.
Trying to determine how best to transform and serve your data? Check out our blog on transforming data with tasks and views.
How do I Ensure Snowflake Costs Are Efficient?
When you do blow out your budget in 3 months (or are on track to), how can you pinpoint the areas that need to improve?
Can you point to bad (long-running) SQL? Is it tied to your ingest process? Is there a business unit that is consuming far more credits than necessary? Are developers letting queries run over the weekend?
It can be difficult for an organization to get clear answers to these questions when they don’t have the necessary controls to reign in the spending — especially when the promise is on demand data for anyone and everyone across the organization.
phData's Recommendation
Combine a well defined Information Architecture with metadata tagging and integrating with reports that tie into SQL history/credit analysis.
This lets us give users insight that alleviate overspending concerns with a simple dashboard where they can take actionable steps to mitigate minor issues before they become major budgetary problems.
Whatever your reason for exploring Snowflake, it’s critical to take the time to define a launch strategy and deploy the platform correctly to drive adoption, efficiency, and long-term cost savings.
Looking for more best practices for Snowflake and cost management?
Download our “Getting Started with Snowflake Guide”
Common Snowflake Cost Questions
The most common inefficiencies that we experience typically boil down to design standards. A well designed architecture means raw data gets enriched and standardized through multiple stages. As standards are defined this data should be persisted to a physical table instead of joining views together or creating join columns via subselected queries. This can cause unnecessary table scans and longer run times.
The most common problem our customers have is ingesting data into Snowflake efficiently. This can manifest itself in multiple forms, file sizes, file counts, file speed, file types; and they all lead to increased credit consumption. This problem can typically be solved via tweaks to the ingestion process. Sometimes, it requires tooling that compliments the Snowflake data cloud and helps it meet broader enterprise needs.