December 16, 2021

What Team Supports Your Data Catalog Best?

By Christine Carroll

Welcome to part two of our trilogy on data catalogs. If you missed our first blog on what a data catalog is, be sure to check it out! In this blog, we’ll explore what the ideal team to support your data catalog looks like.  

Who Are the Users of a Data Catalog?

A tool is only as good as the team you have to support and champion it. When setting your data catalog, it is tempting to leave it with a technical team that can keep the automation running, onboard new datasets, and support upgrades and downtime.  While these people are extremely important, establishing a well-balanced team is key to leveraging your data catalog to its fullest potential.   

We will explore the various actors that can usually be found in a data governance program that could also support your data catalog. 

A diagram displaying an example org structure of people to support the data catalog tool.

Technical Data Steward

If you are a Data Engineer, this role will be the most familiar. This role is for someone familiar with how the data is stored, how it is updated, how it is moved, and how it is transformed.   Usually, you will be able to find this person very close to the data. Their key responsibilities will likely include making sure the quality is maintained, jobs are running, and the storage is acting as expected.  

The role of the Technical Data Steward is to share their technical knowledge, assist or lead in the setup of connections between the resources and the data catalog, and assist in any policies requiring their technical expertise.

Business Steward

This is the person who is responsible for defining the policies and making sure the goals and deliverables align with business drivers. The Business Steward will be most familiar with the business drivers mentioned in the first blog in this series. They will also be responsible for working with other business areas to build policies and metrics for the data catalog. Additionally, they will work with the Lead Steward to make sure these policies are enforced.

Lead/Data Steward

A Lead/Data Steward is someone who is responsible for making sure that the data is meeting the standards required for the company. When policies are established for a dataset, the Lead Steward will work with the Business Steward and Technical Steward to ensure those policies are being enforced and followed. The Lead Steward will also be responsible for making sure the metadata is properly defined and agreed on.  

Subject Matter Expert

This is someone who knows a lot about the actual data in the table. A data catalog tool helps a Subject Matter Expert capture this knowledge, allowing additional and future team members to benefit from this knowledge.

Executive Sponsor

This is the trickiest to get and perhaps the most crucial person on your team. This is someone in an executive leadership role that believes and supports the data catalog tool. They see the value in it and are kept up to date with progress and value-added. This is the person that can continue to sell your tool to other leaders and fight for it when needed.

Scope of the Team

If you are a large organization with many different data domains, this team setup (excluding the Executive Sponsor) can be repeated per domain or subject area. Each domain would have a set of stewards with knowledge specific to that domain and the ability to create and support the metadata pertaining to it. 

For the team members, the data catalog might not be their full-time job. Instead, they will be responsible for their normal day-to-day responsibilities, and the data catalog is an additional requirement as metadata is onboarded and curated. Once onboarding is complete, the stewards can switch to a support model.  

More roles can be found in a data governance program, and similar roles with different titles can also be found. Will you need all of these roles for your data catalog? It really depends on your company and how your team members interact with the data. Your Technical Steward might be the Subject Matter Expert and the Data Steward. You might have roles that overlap or even roles that aren’t needed for your program at all.  

The important thing to remember is to not let your passion for the technology blind you to the value that other members of the company can provide. They can provide meaning, support, and exposure to leadership, which can’t always be achieved alone.  

Another important thing to consider when picking your team and your tools is how your human-driven metadata will be populated. 

How will Human-Driven Metadata be Captured?

The population of this type of metadata can be as simple as filling in a text box and clicking ‘save’ and as complicated as submitting every piece of metadata to a panel review. This will look different for each company and will closely align with your data culture.  

Your data culture might be very strict and locked down or it may be open to everyone in your company. Data catalogs will often offer a metadata review and approval pipeline where any metadata entered can be sent to a Data Steward for review and approval. Who is allowed to enter this metadata is a key question. You might want only a few people creating very accurate metadata, and a few more verifying and approving that metadata before it’s visible to other business users. You might also go a step further and say only a Data Steward can create metadata. This can be very time-consuming but typically provides very accurate results. 

Crowdsourcing Metadata

Opposite of a strict metadata approval workflow is the crowdsourcing approach. Crowdsourcing is the practice of obtaining information from a large group of people in a way that is not part of their roles and responsibilities. This approach would not restrict anyone from creating metadata. 

This opens your metadata up to a lot more actors, but the accuracy could go down. The idea behind crowdsourcing is it is eventually correct. With enough people reviewing the metadata, a correct definition will emerge.  

Another option is to allow anyone to enter metadata, but a Data Steward will still eventually approve or give it more meaning.  A Data Steward might be overloaded with metadata at this point, so it will be important to determine if you are ok with the possibility of inaccurate metadata or if you would prefer no metadata instead.  If crowdsourcing is the route you go, the last member of your team could be everyone.

Now that you have your team assembled, it’s time to explore how to best support that team on your data catalog journey.

How to Support your Data Catalog

A data catalog is not a set it and forget it tool. Instead, it will need to be a part of your daily data culture. The journey will need to continue long after installation. This section is all about ways to continue supporting your data catalog journey after the initial setup and team establishment.  

Continue to Gather Metrics

Back in blog one of this series, metrics were mentioned (and hopefully gathered). These metrics should be continuously gathered as your data catalog journey continues. Meet with current users of the data catalog and find out what is working and what is not. How much time are your Data Scientists saving? What can be improved? 

Iterate, adjust, and gather more and new metrics. Beyond just metrics, document any interesting anecdotes that can paint a picture of your data catalog journey. Be active in showing where you started from and how far you have come.

Establish Policies and Stick to Them

To make the data catalog a part of your data culture, policies should be established, well understood, and enforced. For instance, an established policy could be in order for a dataset to be considered production-ready, the data needs to have certain criteria established in the data catalog. As you establish these policies, it is important to follow them in order to guard against complacency. It is too easy to go back to how things were. Be vigilant and consistent.

Don’t Boil the Ocean

Once the tool is in place, it can be daunting to get your data in shape. Start with something small that aligns with your business drivers. Some data catalog tools can provide you with usage statistics, letting you know what datasets are popular, which can help you to figure out which ones to focus on. Don’t be afraid to set small goals and go after them. Sometimes small achievements can have a huge impact on the right people.

Do Not be Afraid to Iterate

When you started the data catalog, you might have had a different journey in mind, but usage patterns and users might find value in different areas. Be willing to investigate these use cases out and adjust as needed.  

Educate and Socialize

As you educate your teams and partners, make sure you share the ‘why’ and not just the ‘what’.  Why is the data catalog important to your company? Why are companies focused on building their data culture with these types of tools? Understand your audience and what they will find useful in the tool. A Data Scientist will use the tool much differently than a Data Engineer.  Socialize why and how this tool will improve their day-to-day work life.    

Add-ons

Evaluate tools and add-ons in the same way you evaluated the Data Catalog.

It is tempting to add on new features, but the additional tools should go through this same process you went through for your data catalog. Determine what value it adds, who will support it, gather metrics, establish goals, and build it into your growing data culture. It will also need to be maintained and nurtured, just like the data catalog.

Beyond IT

Since a data catalog is a tool, it is easy for it to fall into an IT-only support model. It is very important for your data catalog to have support from the business side as well. Using usage reports, you can find people in your organization that have an interest in the data or are willing to play a more active role in it. 

It is important to continue to build your team with people who have a vested interest in the data being available and correct. If they are members of the data catalog team (even part-time), it is important that the data catalog objectives are part of their professional goals. Your data catalog isn’t a hobby, rather a key part of your data culture. It will need tending and nurturing to continue to grow.

In Closing

Now that you have a better picture of the ideal team to run and support a data catalog, it’s time to investigate the last piece, which data catalog tools you should look into. Don’t miss the last blog in our data catalog series!

Next up in Series

Need Expert Help Making Your Data Catalog a Success?

phData has years of experience helping businesses of all shapes and sizes unlock more value from their data. Whether you need help building an actionable data strategy or advice on how to make your data catalog a smashing success, our data experts would love to help!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit