You Meshing with My Data?

I’ve heard the phrase “data mesh” used for several years now, but I finally decided to do some research after encouragement from my manager.  As always, this post is meant to be anything but a comprehensive overview.  Selfishly, these musings always help crystallize my own thinking.  But I maintain the thought back-of-mind that they may benefit others and encourage more exploration on the topic.

When I first heard the term “data mesh”, my mind conjured up the image of a giant spider web with multiple smaller webs connected to it.  Come to find out, this isn’t too far from the concept. 

Just imagine the smaller webs are domains (sales, finance, etc.) in charge of their data.  Most organizations are already structured in this distributed manner.  Data mesh is passing the ownership of data products (such as pipelines, data repositories, and dashboards) to the various domains in a business. Contrast this with the traditional approach, where an organization relies solely on a centralized data and analytics team to service data products.

When researching the idea (and there’s a lot of content out there), it becomes clear that data mesh isn’t a tool or set of tools.  It’s a philosophy of how data should be distributed and utilized across an organization.  You can’t just go out and buy data mesh.  It’s a framework of thinking that must be agreed upon and standardized across an organization, or at least a subset of an organization.

Characteristics of a data mesh architecture

There appear to be four key principles that define a data mesh:

  1. Domain-driven.  As mentioned above, a data mesh passes the ownership of data products downstream to the business, contrary to the traditional approach where one centralized team is in charge of creating and maintaining them.
  2. Data products.  Also mentioned above… Businesses with a data mesh architecture think about their data as a product.  Each pipeline, dashboard, report is built with the rest of the business in mind.  Content is discoverable across domains and owners are responsible for the trustworthiness, distribution, and overall quality of their product.
  3. Self-service.  A data mesh infrastructure encourages the use of a self-service platform that provides a common place for data products to be shared (think, Power BI Service, for example).  Business users can access data products on this platform and create their own custom reports tailored to their specific needs.
  4. Federated.  A data mesh doesn’t have to mean the wild west of reporting.  It introduces the concept of federated data governance, similar to the relationship between the federal government and state governments in the United States.  While state governments have certain levels of autonomy, there are over-arching laws they abide by as part of the broader country.  In the same way, a data mesh encourages a set of global standards to be applied to each domain.

These are the consistent, key points I find when researching the topic.  If you’re interested, these articles from AWS, Microsoft, and McKinsey were helpful for exploring the topic further.

Who should implement a data mesh?

I expect this is the most important question, hence, I’ll focus here.  After curating much of the content, a number of key themes emerged for why an organization would implement a data mesh.  I’ve posed them as a series of five key questions.  Depending on your answers, you will likely find a data mesh to be more or less helpful in your specific case.

Are you a small or large organization?
Smaller organizations will not benefit as greatly (if at all) from a data mesh.  A one-person BI team can often provide all the analytics needed to a small firm.  The data mesh concept isn’t necessarily unhelpful; it’s just unnecessary.  As an organization’s size and complexity increases, they become a better candidate for distributing the ownership of data products.

Does your BI team have a large backlog of data projects?
In other words, are they a bottleneck to data insights?  If so, a data mesh can help speed up delivery timelines, since it reduces dependency on a centralized data and analytics team.  Increased speed can come at the expense of quality, though, so it’s important to make sure the proper governance is in place to ensure the distributed products meet internally determined standards (see “federated”).

Where do your data experts sit?
Most firms have a number of key individuals who are subject-matter experts on particular data sources.  If they have a decent data IQ and can use modern BI tools, these business users are great candidates to take ownership of data products.  However, if most of your data experts sit on the centralized BI team, it may not be time to implement a data mesh.  Consider upskilling your business users before handing them the keys to the car.

Has your business adopted (or has plans to adopt) data science solutions?
This seems to be more of a minor consideration.  The gist of it is that data science products tend to require more data to consume, thereby increasing the burden on the centralized data team to surface data.  A data mesh can help ensure data scientists are able to access and manipulate their data as needed without over-reliance on another team.

Are you comfortable with increased complexity?
Lastly, and this is my own addition, do you have the appetite for this change?  Data mesh appears to be difficult to implement.  Case studies are clear that data mesh can lead to a more complicated data environment, and it requires a broad change in organizational thinking.  If your organization is content with slower time-to-insight, centralized data product development, and less complexity, business as usual may be the best solution.

Much more could be said, but hopefully this is a helpful teaser.  Are there more key considerations I’m missing?  Is your organization implementing, or considering implementing, a data mesh?  What would your concerns be with moving to a data mesh architecture?


Posted

in

by

Tags:

Comments

Leave a comment