Enabling Big Data Analytics with Arcadia Data
As distributed data platforms like Hadoop and Cloud grow in adoption, there increasingly needs to be a more distributed approach to business intelligence (BI) and
Hadoop Meets Blockchain: Trust Your (Big) Data
At a simple level, Blockchains solve a trust problem. Increasingly, companies are relying on third parties to help drive brand recognition and gain consumer trust.
Log Aggregation, Search, and Alerting on CDH with Pulse
Introduction In mid-2017, we were working with one of the world’s largest healthcare companies to put a new data application into production. The customer had
Getting Started with Kudu
Five years ago, enabling Data Science and Advanced Analytics on the Hadoop platform was hard. Organizations required strong Software Engineering capabilities to successfully implement complex
Accessing Kerberized Sources From Spark2 In Cluster Mode on Yarn
Introduction Many of phData’s customers face the issue that they need to connect to a source secured via Kerberos in a Spark application. A source
4 Strategies For Updating Hive Tables
Introduction Apache Hive and complementary technologies, such as Cloudera Impala, provide scalable SQL on Apache Hadoop. Unlike legacy database systems, Hive and Impala have traditionally
Cloudera Altus – First Look
I was lucky enough to attend StrataEU 2017 on behalf of phData and one of the sessions was Deploying and managing Hive, Spark, and Impala
Archiving Navigator Audit Data with StreamSets and Kafka
Introduction Many of phData’s customers are heavy users of Cloudera Navigator. Cloudera Navigator provides metadata information to the user who can also audit all actions
Visualizing Netflow Data with Apache Kudu, Apache Impala (Incubating), StreamSets Data Collector, and D3.JS
Introduction NetFlow is a data format that reflects the IP statistics of all network interfaces interacting with a network router or switch. Netflow records can
Configuring Oozie for Spark SQL On a Secure Hadoop Cluster
A secure hadoop cluster requires actions in Oozie to be authenticated. However, due to the way that Oozie workflows execute actions, Kerberos credentials are not