This guide is intended to help your organization develop a more practical and focused data strategy framework that delivers value.
The eight steps covered in this guide will help ensure you’re building a foundation for your organization when starting to utilize data as a strategic asset.
In the not-so-distant past, leveraging data to make more informed decisions was once a strategic advantage only available to large businesses that had access to massive amounts of capital.
Today, advances in cloud computing allow companies of ALL sizes to build fit for purpose data platforms. Whether you work for a small company with a limited data platform or if you’re not getting any results from your existing platform, wielding data as a strategic advantage has never been more accessible.
Building a trusted (and effective) data platform starts with a proper data strategy that provides a prescriptive architecture, design, and implementation plan.
phData has had the pleasure of building actionable data strategies across many different businesses and what we uncovered is that ANY business can benefit tremendously from data as long as they have the right data framework in place.
With a combined decades of data strategy experience, we’ve made several mistakes, unlocked eye-opening results, and acquired an expert level understanding of how to build a data strategy framework that delivers actionable results.
This guide is not meant to be an all-encompassing library of information on building a complete data strategy, rather, it’s a hyper-focused plan to help your business create an actionable data strategy framework that delivers value.
Our approach is based on a six-week data strategy timeline that consistently delivers the best results for businesses of all sizes.
Throughout this guide, we will walk you through the exact steps we recommend our customers take along with examples and templates you can use to help build a foundation for an actionable data strategy framework.
The first step of formulating a proper data strategy is to identify the key players. The key stakeholders ideally should have a vested interest in the data platform, a healthy dose of excitement, and a genuine passion to make more data-driven decisions across your organization.
The best organizations create a cross-functional team that is typically led by someone in a Data/Analytics role or a leader within the IT organization. In some cases, this team is led by a leader within a business unit or a central business team. This individual serves as the “point person” who is responsible for driving success.
Pro tip: It’s important to ensure the point person has a clear line of sight and understanding of your current data platform architecture. They should be comfortable making technology decisions for the organization and ideally not develop the data strategy in a vacuum.
This team is very collaborative, working cross-functionally to gather input and support from numerous stakeholders across your organization. Here are a few examples of primary stakeholders:
As a valued stakeholder, you will be asked to participate in many activities throughout the data strategy project. It’s important to note that not every stakeholder takes action in each aspect of the project, rather they add value within their spans of control and help influence the overall success of the project.
The initial discovery sessions are intended to catalog the current state of data assets, data platform technologies, and any current data use cases. Once all of this information is captured, the next step is to identify any gaps or challenges and then create a prioritized list of potential future use cases.
Discovery is performed through a series of interviews and documentation reviews. Interviews are performed with each stakeholder group where detailed notes are taken to document relevant findings. Additional interviews may be performed when new information is uncovered in a related discovery session. Interviews are complimented with documentation that covers things such as:
Near the end of Discovery, it is important to catalog and summarize use cases, gaps, and priorities. This summarization will allow for the identification of a Primary Use Case that can drive the development of the platform.
Your primary use case is the focal point of your data strategy, it’s what’s going to drive the value home. The ideal primary use case should align with your business’s top priority and goals while also having the potential to be completely supercharged by data. There are two main objectives that the primary use case should facilitate achieving:
Questions for Consideration as You Identify Your Primary Use Case:
There is a spectrum that splits use cases, at one end you have Operational Data Products and on the other, you have Analytical Data Products. While these use cases rely on the same underlying data, they have very different requirements. The primary dimension that differentiates these are the impacts they have on generating revenue, producing products, or interacting with clients.
What Are Analytical Data Products?
Analytical data products are most commonly used to inform decision-making and analyze certain business functions. When they are not functioning, there is little impact on customers, revenue, or production. These are the use cases that we typically recommend to prioritize. They typically have a significant impact on the overall business but do not require large upfront investment or support to manage.
What Are Operational Data Products?
Operational data products are used to run day-to-day business operations. Typically, when they go down, there is a large impact on customers, revenue, or production.
Analytical – BI
Drafting an initial architecture of the solution based on information gathered in the discovery phase gives life to the platform. Even if it is incomplete, it will still bring visual representation to subsequent discussions. It also helps frame people’s thinking and tells the story of the platform. Without a visual representation, conversations often end up repeating, causing confusion and ultimately slowing down progress.
The architecture is organized by capabilities. Capabilities represent the logical components of the platform necessary to deliver on a requirement. For example, most data platforms require data warehousing as a capability. The data warehouse allows for efficient storage and querying of data for business intelligence, advanced analytics, and machine learning.
The first architecture diagram focuses on the capabilities of the platform. The capabilities are laid out in the order in which data will be processed. Like a good story, this architecture diagram tells the audience how data gets into the data platform, processed, and consumed. The architecture highlights specific capabilities and data requirements. The goal of this diagram is to get buy-in on a capability view of the data platform.
Capabilities are composed of technologies that have features that can align to specific business requirements that encapsulated in the capability. For example, the ingestion capability might have different technologies that manage real-time data ingestion vs. batch.
The second architecture diagram incorporates another level of detail, specifying technologies to support a capability. The technology assessment should provide reasoning and justification to technology choices (see Technology Assessment). Justification is derived from discovery phase interviews.
Technologies can be deployed and configured in a number of different ways. Everyone must understand how the technology will be operated and leveraged. This often requires a deeper technical representation of the technology. For example, Airflow can be used in a standalone server, it can be deployed as a service on kubernetes or it can be used “As a Service” in AWS Managed Workflow. Having a detailed representation will make it clear how the technology will actually be deployed in the data platform.
The final architecture diagram will be a deep technical document that will be a reference for how the actual platform is used. This version will be most useful to technology domain owners and it should offer clear guidance on the scope and magnitude of deployment. It will also allow for more accurate cost estimation of the platform.
The architecture is meant to be a living document. Each architecture diagram should be kept up to date as the discussion with business and technology stakeholders progresses. The refined architecture iterates until there is general acceptance. We’ve found high-level diagrams give the illusion of it being simple to set up whereas having the details lets the reader understand how much work it takes to get all the configurations just right.
As the capability architecture diagram comes into focus, technology assessments will be conducted to determine the technology stack of the platform. This will involve the creation of a document detailing the technology options, selection criteria, and applicable business factors that determine the selected technology.
For example, a company might be deciding whether to continue leveraging Hadoop for its data warehouse or moving over to Snowflake.
The assessment may reveal the need to do a proper Proof of Concept (POC) of technology to build a better understanding of how the proposed technologies stack up to the selection criteria. This is usually noted in the proposal and would represent a phase 0 implementation scope to solidify technology selection.
At phData, we’ve been fortunate enough to have a strong background in implementing data platforms using a variety of technologies. The selection criteria list below is what we use to help our customers maximize their technology investments.
Technology decisions are critical to a successful data platform. The decision drives everything from costs to recruiting for roles on the platform team.
As data platforms mature, the impact of not having specific components of a Data Governance program becomes more and more important. On the other hand, over-engineering a Data Governance program can slow down progress and limit business value. The key is balance.
Pay special attention to certain business units that have non-negotiable Data Governance requirements. Compliance reasons in certain industries require specific governance of the platform. Identify aspects of Data Governance that are non-negotiable and those that can be developed later in the data platform life cycle. Core Data Governance capabilities are:
Organizations evolve through growth, contraction, acquisitions, mergers, and a whole host of other factors. The teams that support the platform will need to ebb and flow with the organization and platform.
Typically, organizations have an executive leader who is responsible for data and analytics broadly. As organizations grow, sub-teams can form to support specific business units or functions within the organizations and a central IT team can manage fundamental aspects of the platform infrastructure. Below is a simple view of an organizational structure vs. a more complex one.
All of this work culminates in a clear plan for how to get from where you are to where you want to be. Typically platforms go through a similar implementation process of platform build, migration, use cases development, testing/validation, deployment to production, and management. Each of these phases needs to have clear timelines and costs. Balancing quality, speed, costs will vary by organization.
Listed below is an example of a phased implementation plan that phData uses for most customers:
If your business is ready for the migration step, we have a vast library of common migration approaches including Hadoop to Snowflake, Oracle to Snowflake, Hadoop to AWS, and much more!
The last step brings everything together. The recommendation process includes the target future state and justification for what data asset will be developed and how the business will create a strategic advantage through the use of data. Here are the five key components that should be included in the recommendations:
The intention of the recommendation is to convey and convince stakeholders of the direction they should be heading. It will be the basis for any needed investment in technology, people, process changes, or potential changes in organizational structure to support this new direction.
When developing the recommendation, it is important to bring stakeholders along in the process. They should not be caught off guard by the recommendation. Not all stakeholders will agree with the recommendation but getting their input, understanding their concerns, and addressing them is key to managing change within your organization.
Even though not all stakeholders will need to agree with the recommendations, key decision-makers will. Identifying these decision-makers and include them early in the recommendation process will increase your chances of success.
We hope this guide is helpful and serves as a resource for you as you evolve your organization. The general principles and best practices in this guide truly apply to organizations of all sizes, industries, and data/analytic maturities.
At phData, we know this work is foundational to your success and are here to help. Drawing from years of experience, learning, iterating, and doing this for customers at every stage of their life cycle, we’ve built a set of tools, processes, reference architectures, and a team that is ready to help you get on the right path towards better-utilizing data and analytics within your organization(s).
Learn how phData can help solve your most challenging data analytics and machine learning problems.
Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.