A leading manufacturer of earth-moving equipment, including construction, mining, and forestry equipment, has increasingly come to rely on sensor data to understand how their machines are performing.
Most machines they make — from excavators and front-end loaders to subsurface mining equipment and drills — include sensors to track a variety of indicators like hydraulic pressure, engine RPMs engine, oil temperature, and wheel speed. This Internet of Things (IoT) data not only allows them to predict when individual machines require maintenance, but also how they might help customers operate their products more efficiently (by analyzing operational cycles and patterns).
Because many of these machines may stay running 24×7 — and because such large machines often require similarly large outlays in capital — these insights provide enormous business value. This materialized by increasing top line revenue through new products and services, including smart-connected equipment and post-purchase proactive maintenance services.
After several previous iterations, the manufacturer had been using a Hadoop-based solution to process, store, and analyze all their sensor data. However, maintaining the platform required their small analytics team to spend more time administering the cluster than getting value from the data they collected, in addition, the static resource allocation model meant they could not scale dynamically and their compute costs were increasing.
As a result, they decided to explore how they might take advantage of the latest cloud-native services and data technologies to streamline systems management and improve efficiency, while simultaneously consolidating their siloed data sources.
To meet their goals and justify the costs of moving to a new platform, the manufacturer would need to design a modern, cloud-based data analytics solution; they also needed to ensure this solution could intake their existing data from Hadoop, and handle the high volume of new IoT data being pushed daily from their equipment. Key challenges:
The manufacturing corporation realized they were out of their depths. They needed help understanding how to best move on from their Hadoop-based solution and take advantage of the latest cloud-native technologies. But their own data science and analytics team was tied up as it was just keeping the current analytics systems up and running. Seeking a partner with proven data engineering expertise, they turned to phData.
The phData team worked closely with the manufacturer’s analytics team to understand both their existing Hadoop-based platform and their goals for overhauling it, then provided technology recommendations and support they needed to successfully transform it.
To deliver the required improvements in efficiency, maintainability, and data accessibility, phData designed a new architecture around Snowflake. They leveraged both the right mix of cloud-native services and data technologies (such as Spark and Kafka for data processing, and Microsoft Azure and Kubernetes for infrastructure and orchestration) as well as the right “Cloud 2.0” design and deployment practices (such as taking containerized, “infrastructure-as-code” approach to deploy the Kafka Connector using Azure Kubernetes Service) to make the most of those technologies. Finally, they proved the viability of the new solution by helping to successfully migrate one of the manufacturer’s large applications from Hadoop to Snowflake.
The data files generated once per minute by the sensors are now uploaded to Azure blob storage via a proprietary REST API; then, these hundreds of millions of small files are processed and normalized by Spark before being transmitted to Snowflake via the Spark-Snowflake connector.
Once in Snowflake, the data is consolidated and enhanced, every two minutes, via a series of tables and schemas designed to flatten data structures and introduce new data columns that provide more ways to break down the data.
The final result? A common data warehouse that’s easily accessible via Power BI dashboards.