Ingest

Hadoop meets Blockchain: Trust your (Big) Data

At a simple level, Blockchains solve a trust problem. Increasingly, companies are relying on third parties to help drive brand recognition and gain consumer trust, this includes trusting third party data.  For these companies to succeed it is vital that the data they receive is trustworthy and accurate. Each organization involved needs to trust that […]

Read More

StreamSets – Hadoop Ingestion Made Simple

StreamSets recently announced and open sourced their first product, DataCollector. I had been given access to a preview version of the product and was quite impressed. Given their product is now public and generally available, I thought I would go through a super-simple demo. In my consulting role at phData, I’ve worked with many customers […]

Read More

Apache Kafka Performance Numbers

A search for “Apache Kafka performance” will result in dozens of articles but few results useful for estimating real-world results. Specifically I’ve few found results which run on hardware common to modern data centers, replicate the data with the common factor of 3, and many parallel producers and consumers. These results are meant to be used […]

Read More

Binary Stream Ingest: Flume vs Kafka vs Kinesis

Introduction The internet of things will put new demands on Hadoop ingest methods, specifically in its ability to capture raw sensor data — binary streams. As discussed, big data will remove previous data storage constraints and allow streaming of raw sensor data at granularities dictated by the sensors themselves. The focus of this post will […]

Read More