Blog

Cloudera Altus – First Look

I was lucky enough to attend StrataEU 2017 and one of the sessions was Deploying and managing Hive, Spark, and Impala in the public cloud led by Philip Langdale, Eugene Fratkin, and Jennifer Wu. I assumed this was a Cloudera Director session which we have lots of experience with, but I decided to pop my […]

Read More

My first year at phData

Last month marked my one year anniversary working at phData, and since a lot of my friends and future applicants have been asking me about my experience so far, I decided to write up a little blog post about it. My experience at phData can be summed up succinctly as different and constantly evolving. I […]

Read More

Archiving Navigator Audit Data with StreamSets and Kafka

Andy Stadtler helped with this post Many of phData’s customers are heavy users of Cloudera Navigator. Cloudera Navigator provides metadata information to the user who can also audit all actions performed on data in the cluster. Per day one customer generates an average of 4GB Audit Data, which is stored by default in the mysql […]

Read More

phData Names Michael Cleveland as Chief Operating Officer

MINNEAPOLIS, MN – JANUARY 9, 2016 – phData, the global leader in Big Data consulting and managed services, today announced the appointment of Michael Cleveland as its new Chief Operating Officer, a new position within phData, effective immediately.  Cleveland was most recently the Director of Strategy and Business Transformation at BIAS Corporation.  He will report […]

Read More

phData Celebrates Second Anniversary of Providing Big Data Solutions

MINNEAPOLIS, Minnesota – December 14, 2016 – phData, Inc., a globally recognized leader in big data managed services and consulting recently celebrated its second anniversary.  The company has seen rapid growth in its two years of existence and is quickly becoming a household name in the big data industry.  Founded by two brothers, phData has […]

Read More

phData, Inc. Joins Confluent Partner Program to Expand Big Data Solutions

Minneapolis – October 31, 2016 – phData, a big data consulting firm and managed services provider, today announced it has joined the Confluent Partner Program to help drive adoption and implementation of Apache KafkaTM and real-time solutions. Confluent, provider of the first streaming platform based on Apache Kafka, designed the program to enable a rapidly growing community […]

Read More

Visualizing NetFlow Data with Apache Kudu, Apache Impala (incubating), StreamSets Data Collector, and D3.js

NetFlow is a data format that reflects the IP statistics of all network interfaces interacting with a network router or switch. Netflow records can be generated and collected in near real-time for the purposes of cybersecurity, network quality of service, and capacity planning. For network and cybersecurity analysts interested in these data, being able to […]

Read More

Configuring Oozie for Spark SQL on a Secure Hadoop Cluster

A secure hadoop cluster requires actions in Oozie to be authenticated. However, due to the way that Oozie workflows execute actions, Kerberos credentials are not available to actions launched by Oozie. Oozie runs actions on the Hadoop cluster. Specifically, for legacy reasons, each action is started inside a single task map-only MapReduce job. Spark does […]

Read More