phData is dedicated to Apache Hadoop and its ecosystem. Our expertise includes a vast array of Big Data-enhancing technologies.

Managing Complexity for Success

phData knows the intricacies of Big Data projects and how to expedite them for organizations. Our technical expertise translates into successful implementations and management, which reduces your business risk, increases accessibility to data and empowers better decision making.

Our Work

Saving Millions on RDBMS and EDW Storage Costs

One of the largest consumer goods companies in the U.S. uses Hadoop to reduce storage costs. As data volumes and demand for data continue to increase, IT budgets can’t keep up. phData works to reduce offload costs associated with storing data in EDW and RDBMS platforms.
Technologies Used: Hadoop, HDFS, Hive, Impala, Sqoop

Empowering Analysts to Work with Data

phData provides managed infrastructure and operational support for a leading cloud-based software company. As the variety and volume of data continue to grow, analysts needed a more scalable, flexible way to explore and transform data for analysis. The time and risk associated with creating and training a Hadoop operations team was outside the company’s core competencies. With phData, they finally leveraged the potential of Hadoop to perform exploratory analytics and ETL.
Technologies Used: Hadoop, HDFS, Spark, Pig

Insight Into Pricing Analytics Like Never Before

phData helped a large medical device company extract pricing data from SAP, ultimately providing time series insight into individual and group related pricing. The complexity of SAP hierarchy pricing presented the company with over 50 billion pricing conditions that had never been viewed holistically due to compute and storage constraints. Leveraging Spark to pre-calculate and Impala to explore the data, phData helped the company gain unprecedented insight into details such as cross-country pricing comparisons and price erosion over time.
Technologies Used: Spark, HDFS, Impala

Big Data Technologies

  • Ambari
  • CASK CDAP
  • Cloudera Manager
  • Hadoop
  • Hive
  • Impala
  • Kafka
  • Kudu
  • Ranger
  • Scala
  • Sentry
  • Spark
  • StreamSets
  • Trifacta

Do you have questions about Big Data technologies? Contact us for answers.