May 6, 2024

What is Identity Resolution? A Comprehensive Guide

By Sam Hall

Finding and stitching together the interactions for each individual across all your channels is a valuable and difficult challenge—a challenge known as identity resolution.

Whether you’re building a team for master data management (MDM), implementing a composable customer data platform (CDP), or just having challenges identifying the unique customers for your analytical use cases—understanding the fundamentals of identity resolution will guide you in making the best decision possible for your organization. 

This blog post aims to build a clear understanding of identity resolution, allowing you to have informed conversations and drive initiatives that maximize the value of your customer data.

What Is Identity Resolution?

Identity resolution is a process at the core of identifying and linking various fragments of customer data to form a holistic view of the customer. This information may come from various systems in an organization and beyond, encompassing customer attributes, brand interactions, and/or third-party-provided data. 

For many leaders, identity resolution is a key to unlocking the next steps toward impactful customer insight with regard to their brand.

The strategic power of identity resolution is its ability to evolve and grow genuine connections with your customers. By connecting the various locations where customer data can be stored, businesses can craft a comprehensive understanding of their customers, adapting to the many touch points across online and offline channels. 

How Does Identity Resolution Impact My Business?

Modern organizations are challenged with understanding how their customers are interacting with their brand. Fragmented customer identities spread across different channels lead to data mix-ups and incomplete profiles, making it difficult to know exactly who is doing what and where. 

Solving this issue is crucial for understanding customer behavior and building a strategy for engaging with your customer base.

A lack of insight into customer behavior directly impacts your ability to predict customer needs, fine-tune engagement strategies, reduce fraud, and deliver top-notch customer experiences. 

Without this insight, customer communication and promotional efforts can fall short, potentially causing ripples in customer loyalty and brand perception. There is also the risk of missing out on opportunities that could be revealed via a more comprehensive view of your customers, leading to inefficiencies in the customer experience.

To create an impact for your business, it’s crucial to understand the journey of your customers across the many touch points they use to interact with your business. 

Identity Resolution Strategy Considerations

Once you make the decision to move towards an identity resolution solution, there are several decision points and crucial questions that will guide you toward the optimal strategy for the unique needs of your organization. And yes, identity resolution should be viewed as an evolving business strategy, which should be assessed and re-assessed as your business and customer engagement strategy evolves. 

Existing Use Cases

Consider your existing use cases for identity resolution. Take a close look at the current solution(s) your team uses (there may be many of them). What needs are they addressing? How effectively are they meeting those needs? Is there a glaring need for improvement?

Data Quality

Next, dive into the details of your data. Can you confidently use one or more existing fields to represent your customer identities, or is a more probabilistic approach necessary? Also, assess the data you currently have—does it represent all the interactions your customers have with your business? Do you need to explore obtaining or acquiring additional information about your customers for a more comprehensive view?

Data Volume and Complexity

Scale matters. Evaluate the volume and complexity of your data, as it can often dictate the type of solution you should lean towards. Consider whether your team has the expertise to build a scalable solution in-house or if opting for a pre-built solution will be the way to go.

Security and Compliance

Compliance and data security should also affect your decision. With regulations such as CCPA and GDPR, your identity solution should align with these regulatory requirements as needed for your business to ensure the responsible and respectful handling of customer data. For organizations handling sensitive customer data, also understand the implications of whether your solution will risk confidential information landing in the wrong hands.

Methodologies and Technologies for Identity Resolution

Choosing the right methodology and tooling for your chosen strategy is key to the success of your identity resolution solution. Here we will look at the two main methodologies for implementing identity resolution and the considerations and tooling associated with them.

Deterministic Matching

Deterministic matching is the explicit linking of customer data based on explicit and known identifiers like email addresses or phone numbers to build a single customer ID. While not infallible, it ensures a high level of precision (not to be confused with accuracy) and is beneficial for situations when simplicity is desired and/or potential risks and tradeoffs must be explained clearly. 

For instance, a common way to deterministically identify customers would be through an email address or phone number. Now, a single customer might use multiple emails or phone numbers, but matching in this way provides a precise definition that could significantly reduce or even eliminate the risk of accidentally associating the actions of multiple customers with one identity. 

The tradeoff is that this matching strategy might not be completely accurate and may not capture the entirety of a single customer’s interaction with your business.

Another benefit of deterministic matching is that the process to build these identities is relatively simple, and tools your teams might already use, like SQL and dbt, can efficiently manage this process within your cloud data warehouse. 

Probabilistic Matching

Probabilistic matching, rooted in statistical models and algorithms (fuzzy matching might be a familiar term for many), takes a more flexible approach. It thrives on patterns, combinations of data points, and statistical probabilities. While less precise than deterministic matching, it can yield more comprehensive and accurate results, making it valuable for things like personalization, Customer Lifetime Value (CLV) calculations, and strategic decision-making based on customer identities. 

One of the challenges of probabilistic matching lies in quantifying the accuracy of the solution as a whole. As reasons for the matches are often difficult to deduce and explain, doing sanity checks on KPIs like total matches and new matches over time helps build confidence in the solution. 

Implementing a probabilistic strategy generally demands more design, tuning, and resources than deterministic matching. Fortunately, vendors like Truelty, Zingg, LiveRamp, and Hightouch can easily integrate into your composable CDP architecture, easing the journey into this more nuanced and adaptive realm of identity resolution. 

Expect to spend a significant amount of time tuning the resulting matches no matter what solution or vendor is used for the matching. 

Best Practices for Identity Resolution

Centralize Your Customer Data

Having all the context of your customer interactions in one place is critical for creating a unified customer profile and enabling identity resolution that will be applicable across your various customer touch points. This means bringing together one or more of:

  • Behavioral data like website visits, purchases, engagement with emails, and ads. Store this data in a customer data platform or data lake.

  • Offline actions like in-store purchases and customer service interactions. Connect POS systems and CRM databases to your centralized data store.

  • Qualitative data made by data science teams like customer lifetime value predictions or next best offer models. Pipe these insights into the central data store.

  • Identity data like demographics, contact info, and preferences. Combine data from your CRM, surveys, and third-party enrichment sources.

Centralizing this data gives a complete view of each customer and makes sure all systems can draw from the same consistent information, avoiding inconsistencies that can arise across siloed sources.

Understand Your Use Cases and Identity Requirements

Not all use cases demand the same level of identity resolution. For example, delivering personalized emails requires connecting web and email identities to enable coordination. However, targeted web advertising may only require linkage to a browser or device ID.

Start by documenting key use cases like personalization, customer churn, targeted marketing, etc… Then map out the specific identity requirements needed for each one and use this to build requirements for your centralized identity solution. This clarity ensures you can meet all identity resolution requirements and minimize the danger of over-engineering.

Understand the Nuances of Your Data

Expect that data inconsistencies and one-off nuances will be found as you iterate and develop your solution and plan for it accordingly in your timeline. Things like:

Vague or Misleading Values: If names, addresses, emails etc. have a lot of variation, typos, nicknames, etc… then you will likely need a more advanced identity solution such as probabilistic matching. The noisier the data, the more complex the resolution approach may need to be.

Anomalies: It’s important to be aware that parts of your customer data might be higher or lower quality than others, which can sometimes be taken advantage of, or may require special consideration. For example, loyalty program data may have more reliable emails and be easier to deterministically match than transactional data. You may be able to use a simpler approach for high-quality subsets like these. However, things like bot traffic, aggregator accounts, and internal employees can produce unusual or undesired results. These could potentially be separated out to avoid corrupting or unnecessarily bogging down the main resolution process.

Reliability: Know which customer fields are actually reliable enough to use in your resolution approach. Prioritizing higher-quality attributes (or eliminating low-quality ones) generally produces better results.

Recognize Multiple Identity Solutions May be Needed

With different use cases, you may need to facilitate multiple identity solutions:

  • A simple deterministic approach for email marketing campaigns.

  • A data clean room for safely matching to third-party data.

  • An identity graph specifically for advertising/targeting use cases.

You may not necessarily find a one size fits all approach. Be prepared to use differing purpose-built solutions to get desired identity capabilities across all your customer experience initiatives. The key is laying a foundation by understanding your use cases, data, and requirements first, then building your identity solution architecture to be able to handle the delivery and use of solutions tailored to each need.


Building an identity resolution solution can sound complicated, but it doesn’t have to be. 

  1. Identify relevant business use cases to drive the solution.

  2. Build your customer data source of truth.

  3. Build the appropriate identity resolution process(s).

At phData, we can work with your organization to identify use cases, as well as use our strong engineering foundation to fill in the gaps when it comes to centralizing and leveraging your customer data to create an identity resolution solution that will bring immediate value to your business.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit