Uncategorized

Data Science Salon Seattle Conference

Event Recap: Data Science Salon Seattle

Photo by Anna Anisin. Last week, I had the pleasure of attending the Data Science Salon conference in Seattle, Washington. It was my first time attending a Data Science Salon event and I was really impressed! Special thanks to the FormulatedBy team for the invite. The event featured a great line-up of speakers and there …

Event Recap: Data Science Salon Seattle Read More »

Apache Impala

How to Tame Apache Impala Users with Admission Control

This blog post is intended for users who are familiar with Apache Impala. If you’d like to learn about Apache Impala, read more here. Introduction A common problem encountered with Apache Impala is resource management. Everyone wants to use as many resources (i.e. memory) as they can to try to increase speed and/or hide query …

How to Tame Apache Impala Users with Admission Control Read More »

CDSW and Impala Configuration

How to Query a Kudu Table Using Impala JDBC in Cloudera Data Science Workbench

Introduction Kudu is an excellent storage choice for many data science use cases that involve streaming, predictive modeling, and time series analysis. However, in industries like healthcare and finance where data security compliance is a hard requirement, some people worry about storing sensitive data (e.g. PHI, PII, PCI, et al) on Kudu without fine-grained authorization.  …

How to Query a Kudu Table Using Impala JDBC in Cloudera Data Science Workbench Read More »

phData Inc 5000 Award

phData Ranks No. 48 on the 2019 Inc. 5000 with Three-Year Revenue Growth of 5,638%

Inc. magazine announced this week that phData has ranked No. 48 on its annual Inc. 5000 list, the most prestigious ranking of the nation’s fastest-growing privately held companies. With three-year revenue growth of 5,638%, we’re proud to see our name on Inc.’s 5000 list and more importantly, we’re excited about our future. So how did …

phData Ranks No. 48 on the 2019 Inc. 5000 with Three-Year Revenue Growth of 5,638% Read More »

phData Data Science ML Engineering ML Platform MLOps Support

The Ultimate Guide to Building a Machine Learning Solution

Introduction Did you know that more than 90% of the world’s data was generated in the last 2 years? That’s why machine learning models that find patterns in data and can make decisions are critical for businesses today. Organizations everywhere, across all industries, are fast approaching the largest disruption to hit the business world since …

The Ultimate Guide to Building a Machine Learning Solution Read More »

phData Machine Learning Services

phData’s Business Grows 100% YOY, Adds Machine Learning Operations

phData, the global leader in Cloudera Managed Services and Big Data consulting, kicked off 2019 by adding Machine Learning Operations to its portfolio of services. Recapping the success of 2018, which saw 100% growth rate, the outlook for 2019 is even brighter, highlighted by the completion of the Cloudera and Hortonworks merger. “As a Cloudera …

phData’s Business Grows 100% YOY, Adds Machine Learning Operations Read More »

Arcadia Data Logo

Enabling Big Data Analytics with Arcadia Data

As distributed data platforms like Hadoop and Cloud grow in adoption, there increasingly needs to be a more distributed approach to business intelligence (BI) and visual analytics. Traditional BI tools no longer scale to the increased business needs. At phData, we continue to run into traditional BI tools failing to adapt to the increasing data …

Enabling Big Data Analytics with Arcadia Data Read More »

Hadoop Blockchain Directory

Hadoop Meets Blockchain: Trust Your (Big) Data

Introduction At a simple level, Blockchains solve a trust problem. Increasingly, companies are relying on third parties to help drive brand recognition and gain consumer trust. This includes trusting third party data. For these companies to succeed, it is vital that the data they receive is trustworthy and accurate. Each organization involved needs to trust …

Hadoop Meets Blockchain: Trust Your (Big) Data Read More »

phData Pulse Dashboard

Log Aggregation, Search, and Alerting on CDH with Pulse

Introduction In mid-2017, we were working with one of the world’s largest healthcare companies to put a new data application into production. The customer had grown through acquisition and in order to maintain compliance with the FDA, they needed to aggregate data in real-time from dozens of different divisions of the company. The consumers of …

Log Aggregation, Search, and Alerting on CDH with Pulse Read More »

Getting Started with Kudu Book

Getting Started with Kudu

Five years ago, enabling Data Science and Advanced Analytics on the Hadoop platform was hard. Organizations required strong Software Engineering capabilities to successfully implement complex Lambda architectures or even simply implement continuous ingest. Updating or deleting data was simply a nightmare. General Data Protection Regulation (GDPR) would have been an extreme challenge at that time. …

Getting Started with Kudu Read More »