We recently wrote a detailed post describing the characteristics of a best-in-class data platform but we didn’t describe how to implement such a platform. In this blog, we’ll explore the process and technology behind implementing a successful data platform.
Before we dig into the process of implementing a data platform, let’s take a step back and establish some clarifying context.
Many organizations today are going through some form of digital transformation, which is essentially a modernization process that aims to integrate digital technology into all areas of a business. A significant part of any digital transformation is developing a data strategy that the business will utilize, more specifically, how data is collected, secured, governed, and (most importantly) used to enhance business performance.
In short, the data strategy is the plan and the data platform is typically part of the manifestation of that plan.
It is important to call out that implementing a data platform is not a simple process in itself, but it is something that will be more easily achieved with a solid data strategy. Much of this post will assume some level of data strategy has been considered and/or defined.
For more details on building a data strategy, check out our comprehensive guide, “How to Build an Actionable Data Strategy.”
The goal of this post isn’t to provide a detailed step-by-step guide on building a data platform, but rather to arm you with an approach that can be followed to achieve a successful implementation.
What is a Data Platform?
A data platform is the set of technologies that provide the capabilities necessary to deliver on the overall business requirements of the data strategy. There is no requirement that technology only fulfills a single capability, rather, some technologies might fill many capabilities.
The technologies might be Software-as-a-Service (SaaS), requiring an account and little else, or they could be installed software that you place in a public cloud infrastructure (or even on-premise). The data platform consists of all of these technology components, their configurations, their integration, and the plethora of supporting tooling.
How Do I Implement a Data Platform?
Now that it is clear what a data platform is, what is the best way to go about implementing it?
This isn’t so much a step in the process as it is a reminder. At every point of implementation, it will be important to establish trust with the stakeholders of the data platform. A trusted data platform is much more likely to be successful than an untrusted one.
Define the Platform Architecture
The platform architecture is the combination of two things:
- The full set of capabilities needed to fulfill all analytical and intelligence use cases of a future state of the data platform.
- The expectation of how the organization will work with the data platform. This is in regards to building the data platform, operating the data platform, developing data pipelines on the data platform, and consuming from the data platform.
Regarding the set of capabilities, some are more clear, like being able to transform data and making it available for business intelligence tools. Others are less clear, like orchestrating workflows and securely sharing data with customers. Some might even be just an enigma, like building an effective machine learning practice.
An often overlooked part of building a data platform is understanding how it will be operated, built on, and consumed. Does the workforce currently exist for these functions? Are they skilled in the right disciplines? Will gaps be filled with up-skilling, new hires, or contractors?
It should be clear that a lot of this is going to be unknown upfront, and that is ok. The platform architecture is a vision, and as the journey moves forward, the unknowns will become more coherent. Because of the unknown, surrounding technology is more or less excluded from the architecture, save for the technology needed to fulfill the foundational capabilities of data storage and data retrieval.
The best way to establish trust here is to demonstrate to stakeholders an understanding of what they need from a data platform (capabilities) and an understanding of who is expected to use it and how.
How to Build out Foundational Capabilities
Choosing and implementing the technologies that expose the foundational capabilities is the first step to actual implementation. This is important because, like a foundation, it will support everything else that is to come.
Big Data systems like Impala or Hive on HDFS and cloud-based data warehouses like the Snowflake Data Cloud, Redshift, Synapse, and BigQuery all fit the bill nicely, offering the foundational capabilities necessary to build on: data storage and data retrieval. Choosing the technology is just the first part of building though, as a lot of work needs to go into operationalizing the software:
- Securing access to the software
- Monitoring the health of the software
- Ensuring the software is highly available and has a disaster recovery process
- Ensuring end-users can use the system according to their function
- Establishing governing data best practices
PRO TIP: On-premise systems will require sizing exercises to understand hardware requirements. Take this as a hint to seriously consider cloud offerings that offer ‘near infinite scaling’ among other significant benefits.
Depending on the flexibility of the organization, some of these capabilities might be able to be incorporated later in the process for the sake of acceleration, such as setting up disaster recovery or establishing data governance practices. It would be recommended to ensure buy-in from stakeholders on such an approach though, as unexpected downtime or the inability to understand where data came from or what it means could spell disaster and damage any trust that has been established thus far.
Referring back to the platform data architecture, this part of the process will need to consider who will build and operate the platform. This might be two different teams, or it might be just one, but the questions of upskilling or finding the right people for the job must be answered at this point. Training existing employees or hiring new employees or contractors must be considered, because moving forward without the proper training will significantly increase the risk of failing an implementation attempt, again damaging trust.
Implement Initial Use Case
The next logical step in many situations might be to research what technologies provide the necessary capabilities (from the platform architecture) for the chosen foundational data warehousing solution that was just built and operationalized. While this might be the right step to realizing the full and final form of the data platform, it is also the approach that provides the least value to the business right now.
It is akin to using a waterfall approach to data platform development when an agile approach is available. This is not meant to be a knock-on waterfall development, which certainly has its merits in the right circumstances, but this approach applied to data platform implementation will lengthen time-to-value for the business, potentially damaging the trust that is being established.
Instead, the next focus will be on implementing a business use case, preferably a high-profile, well-defined business use case. Implementing this first use case early is beneficial in many ways, such as:
- It helps to confirm technology choices are right for both the capability they are meant to fulfill and the individuals who will be using the technology for the capabilities.
- It gives direction on what technologies and concepts to start upskilling the current implementation team with.
- It provides immediate business value to the business, helping to establish trust in the data platform.
Regarding the first bullet, it is important to match the technology choices not only to the necessary capabilities but also to the skills of the individuals who will be implementing the use cases. For example, many technologies provide data transformation capabilities. Some of them are geared more towards use by traditional software developers and are very ‘down in the weeds’ writing SQL and user-defined functions. Others are higher-level visual tools and might be more appropriate for use by a business analyst. If the implementing team is made up of developers, it might not make much sense to use the higher-level tools.
If the team available is not properly skilled, an accelerated approach to hiring external help might be highly beneficial.
Iterate on Prioritized Business Use Cases
With one use case in the books that delivers value to the business, more will likely follow. A good approach to continuing on this success is to implement more use cases for the business, in an order that they prioritize. This is a great way to further establish trust with the business units and stakeholders, especially if they are allowed to reprioritize the business use cases as they need. Something like that would require flexibility from the implementers, but in the long run, this agility is going to be what is best for the business goals.
In an earlier section, it was mentioned that some operational aspects of the data warehousing software might not need to be implemented right away (assuming this was made clear to stakeholders, and they all agreed to it). While implementing a (hopefully growing) backlog of use cases, it would be best to fit in changes and improvements to the data platform and supporting technologies. Oftentimes this can happen organically, such as automating tedious parts of applying transformations or building out self-service capabilities for commonly requested use cases, but other times, it requires deliberate action. It is important to make sure to fit these non-use cases in as well, as this will help to keep the platform healthy.
Continue to Build Towards the Platform Architecture
…but don’t be blinded to the needs of the business!
One thing that may be observed is that certain capabilities from the original platform architecture aren’t being implemented. Assuming the work being done so far has been driven by the business needs, this is not an issue. If priorities lie elsewhere, then it is fine to keep on building and improving the money makers.
But if there is significant value not being realized by the business due to certain capabilities not being utilized by the data platform, it could be because it isn’t understood how to utilize those capabilities to extract the value. This can be a difficult situation to work through because it may require educating the business on how to identify issues that can be solved by the capability or educating the implementation team on how to build solutions with that capability (or both).
Addressing the need will establish further trust with the business, allowing for a wider variety of issues to be solved with the data platform.
Educate the Data Platform Users
As more and more capabilities are built into the data platform, it will become increasingly complex. It is important to educate users on how to use the platform to ensure that this doesn’t become an issue.
Education will be an ongoing process that will help to cement the trust previously established. As the platform matures with enhanced operations, automation, and even self-service processes, new material will be necessary to ensure users can successfully develop and consume from the data platform.
If the data platform will not be available until next year, it’s still just vaporware. If it has been developed, but there is no documentation, no one will know how to use it. If everyone is educated on how to use it, but it doesn’t do what the business needs, who is it meant for? If it does what the business needs, but breaks down twice a week, how can it be trusted to help make business decisions?
So, how does one implement a data platform? Build iteratively, delivering value to the business early and as frequently as possible by working on the capabilities needed for the top priority use cases.
Need more help?
The eight steps covered in this free guide will help ensure you’re building a foundation for your organization when starting to utilize your company’s data as a strategic asset.
The order by which the use cases are implemented should match the priority assigned by the business. This priority is likely to shift as development goes on, so ensure the team has the flexibility to adjust at the same rate.
Yes and no. It would not really be logical to roll out a new technology or capability if the business has no use for it. But if it comes to enhancing the operational stance of an existing capability or technology, then that is certainly a possibility.