Blog

Exploring Spark MLlib: Part 3 – Transformation and Model Creation

In the first two(1,2) posts we got the data ingested and explored the data with the spark-shell. Now we’ll move on to creating and submitting our code as standalone Spark application. Again all the code cover in the posts can be found here. We’ll start by creating a case class and a function for parsing […]

Read More

Exploring Spark MLlib: Part 2 – Exploring the data

In the last post we got the environment setup. Now that the data is in the cluster and spark is setup, we can begin to explore the data. A common way to start is by using the spark-shell. Spark-shell is a powerful command line interpreter for the Spark environment. Let’s get started. Execute the spark-shell […]

Read More

Exploring Spark MLlib: Part 1 – Setup and Ingest

This 4 part series will introduce Spark MLlib by walking through a basic example much like a chapter in Advanced Spark Analytics (phdata highly recommends). The goal is to cover a MLlib workflow end-to-end. The posts assume a basic understanding of Spark and the Scala programming language. As much as possible the code and examples […]

Read More

More Good News For Docker Containers On Hadoop

Those who have spent their career trying to manage multi-tenant infrastrcuture should be pleased to see the continued progress in being able to run Docker containers in Hadoop. Containers allow business units to define application, and application dependencies, in a light weight container package that then can be deployed on Hadoop. Even more important, the […]

Read More

Docker Containers On Hadoop

Most companies have been spending the past few years looking at how to better manage their data centers by using “private cloud” platforms like CloudStack and OpenStack. Outside of providing virtualization, the idea is that data centers can turn over compute creation to business units so they can dabble in DEV, and possibly scale up […]

Read More

Hortonworks Sandbox On Azure

Hortonworks recently announced the ability to procure their Sandbox through Microsoft Azure. This is great news for consumers of Hadoop and further reduces any barrier to entry for getting some hands on experience with the Hortonworks Data Platform (HDP). When working with customers, we continue to recommend small experimental environments for testing operational items like […]

Read More

HBase is like a Phone Book

HBase is like a Phone Book One of the decisions we often come across is when to use HBase and when to not. It’s a hard question to answer sometimes without having a good understanding of the use case. However, just the other day a very respected thought leader in Hadoop explained it this way […]

Read More

Using Ranger Made Easy With Hortonworks

Security is at the forefront of all enterprises today.  Making sure that data are protected and consumed by authorized entities is a daunting task for development and operations staff, who are often under pressure from business units to focus on features, rather than non-revenue generating infrastructure. The good news is things are getting easier.  The Hadoop […]

Read More

Hortonworks Dedication to Cloud

We’re seeing more and more enterprises wanting to dabble in both on-premise and off-premise data center capabilities. Hadoop deployments included. Some of it is based on cost, but more so, based on flexibility. “Infrastructure friction” is a hard challenge for businesses needing flexibility to try things out and fail fast. Recognizing this, Hortonworks has a […]

Read More