U.S. Manufacturer Overhauls Forecasting with Machine Learning on Databricks, Reducing Stocking Costs in Just 3 Months
A large U.S. Outdoor Vehicle Manufacturer.
THE CHALLENGE THEY FACED:
Accurately forecasting demand across hundreds of thousands of unique products, many of which lacked historical sales data. Forecasting difficulties included:
HOW WE HELPED:
phData’s ML team created a novel solution that has allowed the manufacturer to forecast demand for their entire catalog with high confidence for the first time — even for brand-new products with zero historical sales data to extrapolate from.
By combining advanced algorithms such as XGBboost, RandomForest, KNN, and LSH, the ML pipeline can associate new parts with existing parts while correcting historical biases and accounting for the complex web of variables and interdependencies of a large, multi-national company.
WHAT WE GOT DONE:
In just three short months, phData’s two-person ML team managed to:
Full story: Modernizing Demand Forecasting
After a string of recent acquisitions, a top Outdoor Vehicle Manufacturer (10,000+ employees, revenue of $5+ billion) had rapidly expanded their product catalogue — not just with new vehicles, but with the thousands of parts required to service them as well.
The number of SKUs in their inventory has ballooned with each new acquisition. It wasn’t long before they found themselves with 700,000 unique SKUs for various vehicle parts; and because they guarantee that replacement parts for any given vehicle will remain available for 10 years, the vast majority of those items aren’t going anywhere any time soon.
Naturally, managing such a massive catalogue presents a slew of challenges. One of the largest, as the company found out, was how difficult it became to accurately forecast demand across hundreds of thousands of unique products — particularly when accounting for seasonal or regional factors.
They realized that machine learning (ML) might help supplement their existing forecasting system (which was built on simple extrapolations of historical sales data for a given part) — but weren’t sure exactly how, or where they should start. And so they turned to phData.
Taking the plunge with machine learning
The price of poor forecasting
The next step was to review their options, establish the right goals for machine learning, and draw up a workable plan to get there.
The manufacturer’s existing forecasting system could generally provide adequate predictions for spare parts associated with the older vehicles in their catalogue; however, these predictions suffered from notable inconsistencies — especially for newer parts with less historical data to extrapolate from.
Forecasting difficulties included:
The teams using the system lacked visibility into how the forecasts actually worked and what specific input data was underpinning demand projections; it was nearly impossible to actually diagnose what was causing these problems, much less figure out how to solve them. As a result, the customer relied on a small team to create manual forecasts. Due to the size of the catalog, manual forecasting was time consuming and could only be done for a small percentage of the total items.
Meanwhile, because the customer lacked experience with more sophisticated ML models and algorithms, they weren’t sure what might or might not be possible with a more modern forecasting solution. For example, because their existing system was based solely on historical patterns (requiring at least 3 months of existing sales data), they hadn’t considered that forecasting demand tied to new vehicle launches was even a real possibility.
Unearthing novel opportunities
Our ML team walked the manufacturing company through their own process, helping them understand where their biggest pain points really were and what they would need to do to address them. From there, we designed a modern solution for more powerful, reliable demand forecasting. We leveraged Databricks as an end-to-end solution to manage the full data engineering and ML lifecycle, and we deployed it on Microsoft Azure to take full advantage of the Databricks platform’s cloud-native design.
And we didn’t stop there. While working to overhaul the customer’s existing forecasting, phData ML engineers identified their inability to forecast demand for new vehicle launches as a major gap — one the customer hadn’t realized it was even possible to solve for.
It posed a tricky data science problem. A successful model would need to deliver accurate predictions with zero baseline of historical sales data to extrapolate from — associating new parts with existing parts while correcting historical biases and accounting for the complex web of variables and interdependencies described above.
With the right ML expertise, however, a solution was both possible and eminently feasible — with the potential to deliver massive results.
Inventing novel solutions
Our team realized early on that the problem of forecasting demands for net-new items couldn’t be solved with a single algorithm. For example, using a single clustering algorithm to group similar parts or vehicles together yielded demand trendlines that weren’t predictive, or placed demand at entirely the wrong scale.
To account for the complexities at play, we designed two distinct ML models — one for predicting the average level of demand factors and another for predicting the level of variance across those factors — combining advanced algorithms such as XGBboost, RandomForest, KNN, and LSH.
As a result, the customer now has the ability to forecast demand for new products with high confidence. When testing our new solution using forecasts across 915 items, 82 percent had a Mean Absolute Error of < 0.3.
As the new solution rolls into production, the customer now has the ability to forecast their entire SKU catalog for the very first time — all thanks to not having to rely on manually produced forecasts.
Percentage of items by regression error
- Scale horizontally – Moreover, the algorithms are implemented with pySpark, which enables us to scale horizontally (we can process high volumes of data at very high speed).
Because our ML teams are highly interdisciplinary — bringing both data science skills and data engineering knowhow — we helped ensure the customer’s solution would grow with the customer’s needs. Implementing the algorithms with pySpark enables the solution to scale horizontally, with the ability to process high volumes of data at very high speed.
And best of all? Our two-person ML team managed all this in record time — just three months!
On the art of algorithm arrangement:
Powerful new predictive powers, in a matter of months
The modern forecasting and ML solution delivered by phData goes beyond improving the company’s existing demand forecasting — unlocking all-new predictive capabilities to better support product launches and further minimize stocking costs.