My first year at phData

Last month marked my one year anniversary working at phData, and since a lot of my friends and future applicants have been asking me about my experience so far, I decided to write up a little blog post about it.
My experience at phData can be summed up succinctly as different and constantly evolving. I first met Brock at the Twin Cities Hadoop User Group on January 21 2016, and phData hired me a couple of days later. Although phData has grown from 4 full-time employees at the time to over 30 employees, we’re still agile with hiring. The number of satisfied clients has grown at an even faster pace due to our unique culture which combines technical excellence with inquisitive souls. Over the past one year, I’ve had the opportunity to work on lots of impactful projects both at my client site and at phData. I wanted to use this post to share some aspects that journey.
When I started as an intern at phData, I dove right in and worked on creating a recommender system using Spark ML and Scala. Recommender Systems help users choose items they will like from a wide range of products(movies/books/music) and some popular apps that use recommender systems are Spotify(to recommend music) and Netflix(to recommend movies). I also created a presentation on the variety of tools available for Machine Learning on Hadoop, the basics of recommender engines, and a slide or two on how I created the created mine. I was offered a full-time job soon after.
My first day at my full-time job was the best! I was handed a top spec MacBook Pro and we worked out of COCO, a co-working space in downtown Minneapolis focused on startups and one of the coolest offices around town. Starting from my second day, I underwent a month-long training to get me grounded on the fundamentals of Distributed Computing and Hadoop. It’s truly surprising how much you can learn in a few weeks when you’re surrounded by people who not only know the Hadoop ecosystem in and out, but also helped build it! (For those of you don’t know, phData employs committers/PMC members/mentors to the Apache projects Hive, MRUnit, Parquet, Sentry, Impala and Kudu). As part of my training, I took several hands-on courses on the core Hadoop project (HDFS, YARN and MapReduce), Spark, SQL-on-Hadoop (Impala, Hive), Real-Time (Kafka, Spark Streaming), and Search (Apache Solr). An idea that defines culture at phData is “there are no stupid questions”. I was made comfortable enough to ask all the questions I had, and that helped me make the most out of my training. Although I was nowhere near being an expert(and I knew what expertise meant since I was around a few), I was certainly in an excellent position to start working on my first client.

Me@COCO (COCO was a trading floor earlier)

One of the things that motivated me to join phData, a startup, was the ability to witness the non-engineering aspects in a hyper-growth environment, particularly sales & marketing. I was promised that I’d have the opportunity to do more than engineering, and that began with my first month at phData. I was allowed to listen-in on sales calls with a few Fortune 500 companies, and even had the opportunity to witness a couple of in-person pitches. It gave me a sense of not only the significant value that phData provided to our customers, but also of the magnitude of opportunity that phData employees had to make an impact. I was convinced that the only thing that could possibly inhibit my growth in this company was my own ability!
Soon after completing my training, I was placed at a top Fortune 500 company with one of the biggest Hadoop installations in the Twin Cities. My client had over 30 Hadoop clusters containing over 1200 nodes. For those of you who are trying to understand how big that is, just one cluster had 30 Terabytes of RAM! Just a couple of days in, I started working on rebuilding a huge pipeline built using conventional technology by using Hadoop. A complete technical description of project I undertook would be long enough to warrant another blog post, so I’ll save it for that. Suffice to say that I had the opportunity to work on some of the coolest technology out there including Apache Spark, Hive, CDAP, Docker, Elastic Search and Scala. It was definitely a lot of learning on the go, but I absolutely loved the challenge. If you love challenges, then phData is definitely the company for you!
As time passed, I started missing our sweet COCO office (I had been working at client-site 5 days a week), and asked our top management if we could create a policy where everyone could work from phData office at-least one day a  week. This would help us bring back our learnings from our client back to our company, encourage knowledge sharing, and ultimately help us scale as a company. Management loved the idea and now Fridays@COCO are a thing; we work together and get lunch catered. Now, that’s what I love about phData. No matter your position, your suggestions are always welcome!
My year-long journey at phData has been packed with adventure, and there are several other things I absolutely love about my job. Some of them are:

  1. Being part owner of the company (all full-time employees at phData get stock options).
  2. Meeting Doug Cutting, the creator of Hadoop and Todd Lipcon, the founder of Apache Kudu.
  3. Getting admin access to our website, and doing some web analytics.
  4. Getting exposure to the managed services side of our business, which taught me a lot about Linux, Infrastructure, and the almost magical skills needed to keep massive Hadoop clusters up and running in the face of many challenges.
  5. Being part of such a tight-knit and friendly community.

As I end this blog post, I want to take the opportunity to thank everyone at phData for making my first year this awesome, and share how excited I am to work at such an awesome company.
Note for future applicants: We’re hiring!