AWS

Cloudera Altus – First Look

I was lucky enough to attend StrataEU 2017 and one of the sessions was Deploying and managing Hive, Spark, and Impala in the public cloud led by Philip Langdale, Eugene Fratkin, and Jennifer Wu. I assumed this was a Cloudera Director session which we have lots of experience with, but I decided to pop my […]

Read More

Apache Kafka Performance Numbers

A search for “Apache Kafka performance” will result in dozens of articles but few results useful for estimating real-world results. Specifically I’ve few found results which run on hardware common to modern data centers, replicate the data with the common factor of 3, and many parallel producers and consumers. These results are meant to be used […]

Read More

Getting Into The Cloud With CDH In Minutes

A few weeks ago we wrote an article on the pros and cons of running your Hadoop capabilities in the Cloud compared to on-premise.  The conclusion was that there isn’t a right or wrong answer.  If you’re investing in data centers, it probably makes sense to run Hadoop on-premise.  If you’re not investing in data […]

Read More