Operations

Troubleshooting Spark and Kafka escalations for our managed services customers

My favorite technical task is handling a troubleshooting escalation from one of our Managed Services customers. This week I was lucky enough to work on a problem worth sharing. The job was producing messages in a Spark job to Kafka. Half way through the Spark job, the job froze, no more messages were produced to Kafka. After […]

Read More

StreamSets – Hadoop Ingestion Made Simple

StreamSets recently announced and open sourced their first product, DataCollector. I had been given access to a preview version of the product and was quite impressed. Given their product is now public and generally available, I thought I would go through a super-simple demo. In my consulting role at phData, I’ve worked with many customers […]

Read More

Rolling Hadoop Upgrades

For those of us working in Hadoop operations, Hortonworks has a great read on Hadoop’s evaluation into a ever-breathing organism. Or in other words, as Hadoop continues to grow into the backbone of data centers, having it up 24/7/365 is a must. The article goes through the process of rolling upgrades. One of the critical […]

Read More