Everyone working in or related to data analytics has surely heard of the buzzword catching on as fast as a wildfire – Data Mesh.
What is Data Mesh?
But what exactly is Data Mesh and why more and more companies are looking to implement the latest trend in the data industry? Below, we’ll explore what Data Mesh means and if you should too mesh it up (title reference to Barr Moses’s article What is a Data Mesh — and How Not to Mesh it Up).
In the age of data as a first-class citizen, every enterprise strives to be data-driven, pouring hefty investments into data platforms and enablers. However, the ever-growing data demands are no match for the traditional data warehouse or data lake with limited real-time streaming capabilities.
The need for democratisation and scalability in the data pipelines underpins the faults in the legacy systems and conflicting business priorities. Fortunately, there is a new enterprise data architecture on the rise that ushers in a new lease on the bulky and fragile data pipelines. Data Mesh introduces a way of seeing data not as a by-product but as decentralised self-contained data products.
Software development was actually the first to transition from monolithic applications to microservice architectures. We are now seeing the data industry follow suit by moving away from massive data teams prioritising centralised, monolithic data lakes and databases, to one that prioritises data domains and data products as first-class citizens.
This paradigm shift in data architecture means data teams have to provide sharing, publishing, discoverability, and more importantly, interoperability of all data assets within the mesh. But more importantly, this pivot drives teams to prioritise outcomes and products they delivered to the business at all times; instead of obsessing over the underlying technology or stack used.
Zhamak Dehghani, a ThoughtWorks consultant and the original architect of Data Mesh, defines the concept as data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design. However, Zhamak emphasises that her support for a domain-oriented approach doesn’t mean she is “advocating for a fragmented, siloed domain-oriented data often hidden in the bowels of operational systems; siloed domain data that is hard to discover, make sense of and consume”, neither she is “advocating for multiple fragmented data warehouses that are the results of years of accumulated tech debt.” But she argues that the response to these accidental silos of unreachable data is not creating a centralised data platform, with a centralised team who owns and curates the data from all domains, as it doesn’t scale.
Zhamak Dehghani instead finds that the paradigm shift is necessary to solve the architectural failure modes. This paradigm shift sits at the intersection of techniques instrumental in building modern distributed architecture at scale; Techniques that the tech industry at large have adopted at an accelerated rate and have created successful outcomes. Dehghani’s idea of the next enterprise data platform architecture exists in the convergence of Distributed Domain-Driven Architecture, Self-serve Platform Design, and Product Thinking with Data.
The promise and premise of the Data Mesh
Although Data Mesh is getting a lot of attention, the fundamental ideas are actually not new. Many forward-oriented organisations have implemented it. Daniel Tidström, Partner & Management Consultant at Data Edge, has been working with it in parts at least for quite some time.
According to Daniel, Data Mesh becomes crucial when a company scales quickly.
“With the proliferation of data sources and data consumers, having one central team to manage and own data ingestion, data transformation and serving data to all potential stakeholders will inevitably lead to scaling issues,” states Daniel. “Given the increasing importance of data in our organisations, designing for scalable teams and scalable platforms is really crucial. This is a recognised problem in other areas of software engineering so I can’t see why data must still live inside a monolith.”
The alternative would be to scale out the team by hiring more data engineers, but everyone in the industry knows that finding good data engineers and skills is really hard. So it makes a perfect sense to go for a distributed data architecture at scale.
Also, in companies where domain-driven development and microservices architectures are implemented, it just makes sense to also consider moving the ownership of data into the domains, Daniel explains.
Daniel Tidström is currently working for a client that has implemented a Kafka infrastructure binding all domains together. To work with the data, they need to be able to manage service level agreements, know what is published, understand what the schema looks like, and how the schema evolves.
All these things Daniel and his team are doing points to the Data Mesh direction, although they are not necessarily calling it Data Mesh. However, it’s crucial that the data product owners and the domains need to treat data as a first-class citizen and deliver the data as a product.
Challenges with Data Mesh
Data Mesh is not just a plug-and-play solution. It comes along with an array of challenges that companies have to navigate through. Many people may find themselves out of their depth with how to implement it, states Daniel.
Setting up the contracts on how to ingest the data is probably one of the more challenging things of the Data Mesh implementation. “Implementing a Data Mesh is not a purely technical project that you can implement in isolation from the rest of the business,” he explains. It’s not something that you can just start with, and then it works from the first try, but you have to grow and develop with it.”
However, the biggest challenge is not technical, but it’s getting the data maturity that is needed in an organisation to work well in a Data Mesh setting. “It’s a cultural shift for many organisations, in having the product owners really think about data and treating it as a product. Since this is a shift from traditional software engineering it is no easy feat.”
Of course, it will be a challenge for many, but it needs to be addressed, because not solving it will just push the complexity downstream onto the data engineers and the central team, who will need to sort out all the mess, continues Daniel.
“And that’s one of the problems with the data team today that they have too many competing priorities. It’s tough to build something that scales and is long term. Many teams just work on an infinite, neverending backlog of requests from everyone. And that doesn’t work either. So something needs to be done. Data Mesh is probably the most interesting scenario for it.”
Data Mesh is one step closer to democratising data and enabling the whole organisation to treat data as a strategic asset. This shift introduces a new way of working in many aspects. “This is where the soft values and the cultural traits are the biggest factors, just making sure to treat data as it should be treated. If you really want to become a data-driven company, data can’t only be a concern for one or two central teams.”
When to start thinking about Data Mesh
Although Daniel says that companies will encounter technical challenges along the way, that shouldn’t stop them from implementing Data Mesh, as technical solutions can always be found.
“The most important thing about Data Mesh is starting to discuss the distribution of data because data creation is inherently distributed in all companies.” With the number of data sources growing every day, many organisations should probably at least consider what their options for scaling are.
If you have domain-driven development, started working with Microservices, or if you do a cloud migration, that’s a good time to consider Data Mesh, suggests Daniel.
What will happen with the data architectures and data warehouse teams
Does a distributed domain driven architecture mean the end for the central data team? Daniel appeases that there is still a place for centralised managing of the core data assets that go across domains.
“I don’t think Data Mesh will remove the need for data warehouse teams, but instead, it will make their job easier in a way,” states Daniel. He admits that he often gets the question of what will happen with the data architecture when it becomes distributed.
“In most companies, you probably have one poor data architect that goes into the architects’ forum and tries to be the spokesperson for data across the organization, often feeling quite lonely. If you instead have a distributed ownership of data and manage it as a product, it would have a much more obvious place at the table. So I think it will spark many good and interesting discussions that might be a bit painful in the beginning. But it will be addressed, and it will be improving going forward.”
Daniel advises not to forget the centralised teams, as they will still have a role in the distributed architecture. “Looking downstream, I think it is very important to enable the data consumers with good self-service tooling. It is not good for anyone to have the central data team as a gatekeeper or bottleneck for accessing data. There are many good frameworks now, like data build tool (dbt), for example, that solves a lot of issues between data consumers and the data platform. So that is something that I encourage everyone to look into. It’s an amazing tool that does a lot of things really well.
The relationship between Data Mesh and DataOps
As a paradigm shift that is yet to be tested and implemented by companies, the inquisitive mind may ask how it fits into the more known DataOps environment. Daniel says does not see any obvious problems with DataOps and Data Mesh coexisting; he hasn’t come across any immediate hurdles that stand in the way of DataOps and Data Mesh working well together.
Generally speaking, data observability across a distributed pipeline can be a bit of a challenge. But this is a challenge that exists already in most organisations that have some sort of a handover that is not even formalised. Data Mesh doesn’t remove this challenge, but it addresses it in a better way, says Daniel.