Case Study

Streamlining Demand Forecasting: Farm Supplier Modernizes an Excel Headache with Dataiku & Snowflake


Customer's Challenge

A major agricultural cooperative needed to forecast demand for its products at different warehouses throughout the country in order to ensure it had the right quantities in the right locations at the right times. The existing forecasting process was built solely in Excel, crashing laptops with the size of the spreadsheets and leading to over $200 million a year in wasted inventory due to inaccurate forecasts. The company came to phData hoping to move the heavy calculations out of Excel, so their demand planners could spend more time on analysis and less time waiting on Excel Workbooks to complete calculations. 

phData's Solution

Through an eight-week engagement, phData was able to migrate their current forecasts to Dataiku utilizing Snowflake Data Cloud as the underlying computing engine and storage. The timeline also allowed phData to augment the company’s current forecasting approach with machine learning approaches that proved to be more accurate than their existing methodology for over 70 percent of products. 


A burdensome forecasting process that took hours of manual effort a week was moved entirely to the cloud. The daily forecasting pipeline can run in less than 45 minutes, and once per month, a full ML pipeline runs in less than three hours. 

The company is forecasting 10,000+ combinations of products and locations, and in 72% of these instances, phData was able to validate improved forecasting accuracy. With a $200 million inventory problem, even small forecasting improvements can translate into millions of dollars in savings that easily justify an 8-week investment.

The Full Story

A major agricultural cooperative needed to ensure its products were in the correct locations at the correct times to meet farmers’ demands. Many of the products the company sells are perishable, meaning too much stock in one location leads to waste as the product will eventually go bad and cannot be sold. 

For just one of the company’s multiple business lines, the expired product was a $200 million-a-year problem. In addition to the direct financial implications of the forecasts, the process of managing them via Excel had become a nightmare. 

The number of combinations of locations and products that needed forecasts exceeded 10,000. The forecasts also needed to be updated regularly as new sales occurred. The size of the Excel workbook used to update forecasts was causing older laptops to crash and taking 5+ minutes to apply a filter to a single column for users with brand new 16GB laptops. The overall process was inefficient and a prime use case to migrate to Dataiku and leverage machine learning.

phData was brought in with the goal of moving the heavy calculations out of Excel and off individual’s laptops and into the cloud. phData worked with the business to understand the current methodology they used to calculate forecasts and began recreating the logic within Dataiku

Using a combination of Dataiku’s visual recipes, SQL, and Python scripting, phData was able to reproduce the current forecasting methodology in a fashion that could be completely automated and was computed in the cloud. 

With the additional time remaining on the project, phData was also able to provide machine learning-based forecasts to augment the current forecasting methodology and perform some ad-hoc correlation analysis requested by the business. This was accomplished using both Dataiku’s visual machine-learning capabilities and Dataiku Python recipes powered by a Snowpark session within the company’s Snowflake account

Snowpark enabled phData to scale to over 10,000 different forecasts utilizing Facebook’s Prophet algorithm in a pipeline step that ran at about the time of an average lunch break. 

Why phData?

The customer opted to engage phData’s data science team for Dataiku development after previously engaging with our Data Engineering and Elastic Operations teams. phData had been supporting the client for several years, initially with Cloudera and Hadoop, and in recent years, supporting their modern data stack of Snowflake, Fivetran, dbt, and Dataiku. The customer was also impressed with phData’s data science delivery model to ensure that the solutions built met the needs of the business users.

Before phData
After phData

Take the next step
with phData.

Learn how phData can help solve your most challenging data analytics and machine learning problems.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit