Cloudera CDH 6 Support is Ending, Now What?

An image with 3 icons, one depicting a checklist and the other two are both of support people

If you haven’t already started making plans for the CDH 6 End of Support Life (EOSL) coming up in March 2022, now is the time to do so to make sure that you’re fully prepared. In this article, we’ll be covering what the CDH 6 EOSL means for you, what your options are, and why […]

What To Do With Unsupported CDH 6

a picture of a support icon with a red circle with a slash through it

Let’s say you have thought through your long-term CDH 6 migration plans, and have found that you will not be able to complete a full migration to another platform before the EOSL date in March 2022. Due to this, you find you will need to continue to use your CDH 6 platform for some time […]

Apache Kudu Integration Testing in Scala/SBT Applications

Apache Kudu

Introduction to Kudu Integration Testing Beginning with the 1.9.0 release, Apache Kudu published new testing utilities that include Java libraries for starting and stopping a pre-compiled Kudu cluster. This utility enables JVM developers to easily test against a locally running Kudu cluster without any knowledge of Kudu internal components or its different processes. Cloudera published […]

How to Use the Kudu Quickstart on Windows

This blog post was written by Donald Sawyer and Frank Rischner.  Introduction to Apache Kudu Apache Kudu is a distributed, highly available, columnar storage manager with the ability to quickly process data workloads that include inserts, updates, upserts, and deletes. Kudu integrates very well with Spark, Impala, and the Hadoop ecosystem. At phData, we use […]

CDP Data Warehouse Experience: The Hadoop Paradigm Shift

Cloudera Data Platform (CDP) represents a major step forward toward combining the value-added distributions of Hadoop from both Cloudera (CDH) and Hortonworks (HDP) into a unified, cloud-ready Data and Analytics platform. CDP maps out a new direction to manage and expand large data workloads into single-cloud, multi-cloud, or cloud-data-center hybrids — wherever you need it, […]

Introducing the Cloudera Data Platform: Unlock Adaptive Scaling

As the technology landscape changes, it’s important for businesses to take advantage of the increased efficiencies this competitive landscape offers. The most recent advancement companies are racing to implement is leveraging the benefits of the cloud and elastic compute platforms that allow for on-demand availability for critical business processes. In practice, this means that applications […]

Implementing Metadata as Part of Data Management

Data centralization without careful metadata implementation is like stocking a warehouse without sorting and labeling all the boxes. Yes, you may have everything you need in there; but your end users will be wandering around lost. For example, how would someone looking for manufacturing material know, without access to metadata, that they needed to use […]

Archway: Self-Service Data Engineering on Cloudera CDH

Data engineering in a production environment is complex. Engineers and data scientists need to be onboarded onto a platform where they can share data and resources; and the process is often longer and more difficult than many people initially realize. It can be an adventure just getting the right approvals: Is the data allowed to […]

How to Tame Apache Impala Users with Admission Control

Introduction A common problem encountered with Apache Impala is resource management. Everyone wants to use as many resources (i.e. memory) as they can to try to increase speed and/or hide query inefficiency. However, it’s not fair to others and it can be detrimental to queries supporting important business processes. What we see at a lot […]

How to Query a Kudu Table Using Impala JDBC in Cloudera Data Science Workbench

Kudu is an excellent storage choice for many data science use cases that involve streaming, predictive modeling, and time series analysis. However, in industries like healthcare and finance where data security compliance is a hard requirement, some people worry about storing sensitive data (e.g. PHI, PII, PCI, et al) on Kudu without fine-grained authorization. Kudu […]