Databricks Logo

case study

U.S. Manufacturer Overhauls Forecasting with Machine Learning on Databricks, Reducing Stocking Costs in Just 3 Months

1.

the customer:

A large U.S. Outdoor Vehicle Manufacturer.

ATV Manufacturer Infographic phData

2.

THE CHALLENGE THEY FACED:

Accurately forecasting demand across hundreds of thousands of unique products, many of which lacked historical sales data. Forecasting difficulties included:

  • Sporadic, “on-and-off” nature of demand for spare parts (i.e. large % of “zero” values)
  • Seasonal variation in demand across vehicle types (e.g. snowmobiles vs. fishing boats)
  • Tendency of historical forecasting to react to old trends that are no longer indicative of future demand
  • Variation in the proportion of identical parts shared between different vehicle models

3.

HOW WE HELPED:

phData’s ML team created a novel solution that has allowed the manufacturer to forecast demand for their entire catalog with high confidence for the first time — even for brand-new products with zero historical sales data to extrapolate from.

By combining advanced algorithms such as XGBboost, RandomForest, KNN, and LSH, the ML pipeline can associate new parts with existing parts while correcting historical biases and accounting for the complex web of variables and interdependencies of a large, multi-national company.

4.

WHAT WE GOT DONE:

In just three short months, phData’s two-person ML team managed to:

  • Deliver full end-to-end ML solution on Databricks
  • Overhauled the company’s existing forecasting system for better accuracy and consistency
  • Unlock net-new predictive capabilities to better support product launches and further minimize the costs of stocking unnecessary parts

Full story: Modernizing Demand Forecasting

After a string of recent acquisitions, a top Outdoor Vehicle Manufacturer (10,000+ employees, revenue of $5+ billion) had rapidly expanded their product catalogue — not just with new vehicles, but with the thousands of parts required to service them as well.

The number of SKUs in their inventory has ballooned with each new acquisition. It wasn’t long before they found themselves with 700,000 unique SKUs for various vehicle parts; and because they guarantee that replacement parts for any given vehicle will remain available for 10 years, the vast majority of those items aren’t going anywhere any time soon.

Naturally, managing such a massive catalogue presents a slew of challenges. One of the largest, as the company found out, was how difficult it became to accurately forecast demand across hundreds of thousands of unique products — particularly when accounting for seasonal or regional factors.

They realized that machine learning (ML) might help supplement their existing forecasting system (which was built on simple extrapolations of historical sales data for a given part) — but weren’t sure exactly how, or where they should start. And so they turned to phData.

Taking the plunge with machine learning

The two-person phData ML team worked closely with the manufacturer to understand their existing forecasting system, identify potential opportunities, and explore how ML might help them meet their key business goals — namely, improving their ability to predict demand and minimize the costs associated with purchasing and storing parts that turn out to be superfluous.

The price of poor forecasting

  • Understocking = risking customers, today and tomorrow — Vehicle owners need the right parts at the right time. Having to wait up to 6 months for out-of-stock parts leads to unhappy customers — undermining brand loyalty and future sales.
  • But overstocking = throwing money down the drain — Every part you make that goes unsold is an advantage wasted. And not only that: the cost of storing and tracking unused parts can cost just as much — especially if, as this customer did, you’re at risk of having to open new warehouses just to stock more and more parts you’ll never use.
phData Case Study Manufacturing

The next step was to review their options, establish the right goals for machine learning, and draw up a workable plan to get there.

The manufacturer’s existing forecasting system could generally provide adequate predictions for spare parts associated with the older vehicles in their catalogue; however, these predictions suffered from notable inconsistencies — especially for newer parts with less historical data to extrapolate from.

Forecasting difficulties included:

  • Sporadic, “on-and-off” nature of demand for spare parts (i.e. large % of “zero” values)
  • Seasonal variation in demand across vehicle types (e.g., snowmobiles vs. fishing boats)
  • Tendency of historical forecasting to react to old trends that are no longer indicative of future demand
  • Variation in the proportion of identical parts shared between different vehicle models

The teams using the system lacked visibility into how the forecasts actually worked and what specific input data was underpinning demand projections; it was nearly impossible to actually diagnose what was causing these problems, much less figure out how to solve them. As a result, the customer relied on a small team to create manual forecasts. Due to the size of the catalog, manual forecasting was time consuming and could only be done for a small percentage of the total items.

Meanwhile, because the customer lacked experience with more sophisticated ML models and algorithms, they weren’t sure what might or might not be possible with a more modern forecasting solution. For example, because their existing system was based solely on historical patterns (requiring at least 3 months of existing sales data), they hadn’t considered that forecasting demand tied to new vehicle launches was even a real possibility.

Unearthing novel opportunities

Our ML team walked the manufacturing company through their own process, helping them understand where their biggest pain points really were and what they would need to do to address them. From there, we designed a modern solution for more powerful, reliable demand forecasting. We leveraged Databricks as an end-to-end solution to manage the full data engineering and ML lifecycle, and we deployed it on Microsoft Azure to take full advantage of the Databricks platform’s cloud-native design.

And we didn’t stop there. While working to overhaul the customer’s existing forecasting, phData ML engineers identified their inability to forecast demand for new vehicle launches as a major gap — one the customer hadn’t realized it was even possible to solve for.

It posed a tricky data science problem. A successful model would need to deliver accurate predictions with zero baseline of historical sales data to extrapolate from — associating new parts with existing parts while correcting historical biases and accounting for the complex web of variables and interdependencies described above.

With the right ML expertise, however, a solution was both possible and eminently feasible — with the potential to deliver massive results.

Inventing novel solutions

Our team realized early on that the problem of forecasting demands for net-new items couldn’t be solved with a single algorithm. For example, using a single clustering algorithm to group similar parts or vehicles together yielded demand trendlines that weren’t predictive, or placed demand at entirely the wrong scale.

To account for the complexities at play, we designed two distinct ML models — one for predicting the average level of demand factors and another for predicting the level of variance across those factors — combining advanced algorithms such as XGBboost, RandomForest, KNN, and LSH.

As a result, the customer now has the ability to forecast demand for new products with high confidence. When testing our new solution using forecasts across 915 items, 82 percent had a Mean Absolute Error of < 0.3.

As the new solution rolls into production, the customer now has the ability to forecast their entire SKU catalog for the very first time — all thanks to not having to rely on manually produced forecasts.

Case Study Chart

Percentage of items by regression error

  • Scale horizontally – Moreover, the algorithms are implemented with pySpark, which enables us to scale horizontally (we can process high volumes of data at very high speed).

Because our ML teams are highly interdisciplinary — bringing both data science skills and data engineering knowhow — we helped ensure the customer’s solution would grow with the customer’s needs. Implementing the algorithms with pySpark enables the solution to scale horizontally, with the ability to process high volumes of data at very high speed.

And best of all? Our two-person ML team managed all this in record time — just three months!

On the art of algorithm arrangement:

“A real solution to a real problem is never just a single algorithm you can pick up from a library in Spark. The art is in how you choose and combine all the right algorithms and models for the job.” — phData ML Engineer

Powerful new predictive powers, in a matter of months

The modern forecasting and ML solution delivered by phData goes beyond improving the company’s existing demand forecasting — unlocking all-new predictive capabilities to better support product launches and further minimize stocking costs.

Business Outcomes:

  • Tangible value in three short months — phData’s ML experts managed to design and build a full end-to-end solution on Databricks, uncover a huge green-field opportunity for the customer, and deliver broader predictive capabilities with improved accuracy and consistency — all in just three months.
  • Better forecasting, lower costs, improved customer service — With all-new forecasting capabilities come all-new savings in stocking costs. The customer now has everything they need to deliver customers the right part at the right time — including for newly launched vehicles that lack historical sales data.
  • A platform for future success — The company now has a sustainable platform to implement the infrastructure and processes they need to scale, optimize, and maintain production-ready ML applications according to Databricks and Azure best practices.
The next phase will include additional refinement to the model, as well as work to automate the customer’s ML pipelines using DevOps processes and toolchain components. This is expected to further improve forecasts as well as setting a foundation for easily productionalizing new ML pipelines in other parts of the business. But already the payoff has been huge for the customer — improving their ability to predict demand for products new and old, so they can get customers the parts they need, without running up costs stocking warehouses full of superfluous parts.

Ready to learn more about phdata? Let's chat.