Uncategorized

Log Aggregation, Search, and Alerting on CDH with Pulse

In mid-2017, we were working with one of the world’s largest healthcare companies to put a new data application into production. The customer had grown through acquisition and in order to maintain compliance with the FDA, they needed to aggregate data in real-time from dozens of different divisions of the company. The consumers of this […]

Read More

Spark Job History Server OutOfMemoryError

One of our customers hit an issue where the Spark Job History running out of memory every few hours. The heap size was set to 4GB and the customer was not a heavy user of Spark, submitting no more than a couple jobs a day. We did notice that they had many long running spark-shell […]

Read More