Database

Hive corruption due to newlines and carriage returns

phData has customers across the spectrum of use cases. One of our customers stores vast volumes of XML. One of our engineers was recently asked: Hive sometimes corrupts my data and other times it does not. What is going on? The answer is quite interesting so I thought I would share. Specifically the query they […]

Read More

Hands On Example With Hive Partitioning

Building off our Simple Examples Series, we wanted to take five minutes and show you how to recognize the power of partitioning.  For a more detailed article on partitioning, Cloudera had a nice blog write-up, including some pointers. http://blog.cloudera.com/blog/2014/08/improving-query-performance-using-partitioning-in-apache-hive/ One of the pointers that should resonate is the cardinality of the column, which is another […]

Read More

The Paradox of Agile Data Management

At phData, many of us come from a software development background and have witnessed the success of Agile Methodologies. Agile started in software development where it quickly gained popularity, but has also now made inroads into other realms. The concept of the “Agile Admin”, or as it’s better known, Devops, takes many of its core […]

Read More