How to Query a Kudu Table Using Impala JDBC in Cloudera Data Science Workbench

Introduction Kudu is an excellent storage choice for many data science use cases that involve streaming, predictive modeling, and time series analysis. However, in industries like healthcare and finance where data security compliance is a hard requirement, some people worry about storing sensitive data (e.g. PHI, PII, PCI, et al) on Kudu without fine-grained authorization.  […]

Read More

Getting Started with Kudu

Five years ago, enabling Data Science and Advanced Analytics on the Hadoop platform was hard. Organizations required strong Software Engineering capabilities to successfully implement complex Lambda architectures or even simply implement continuous ingest. Updating or deleting data, were simply nightmare. General Data Protection Regulation (GDPR) would have been an extreme challenge at that time.   […]

Read More

Visualizing NetFlow Data with Apache Kudu, Apache Impala (incubating), StreamSets Data Collector, and D3.js

NetFlow is a data format that reflects the IP statistics of all network interfaces interacting with a network router or switch. Netflow records can be generated and collected in near real-time for the purposes of cybersecurity, network quality of service, and capacity planning. For network and cybersecurity analysts interested in these data, being able to […]

Read More