



Data Mesh has been founded on four key principles that revolutionised data management. It focuses on decentralisation, domain-oriented teams, data-as-a-product, self-serve platform, and federated governance.
The expanding data volume from various sources has become a challenge for traditional data architectures. The complex nature of modern data landscapes has pushed the limitations of data platforms as we know them. In response to this dynamic data environment, a groundbreaking approach emerged in 2018 known as “Data Mesh”. At its core, Data Mesh advocates for a decentralised, domain-centric strategy to manage data, bridging the gap between data producers and consumers.
Imagine Data Mesh as a transformative wave similar to the microservices revolution of the 2010s but applied to data architecture. Data Mesh reshapes how we think about and expand data architecture. At its heart lies a shared principle: the distribution of data ownership among distinct teams, each finely tuned to specific business domains.
Each team manages its individual data, operational and analytical. It treats data as a valuable product, while navigating the emerging complexities within their domain. Organisations start to cultivate a proficient and expandable data ecosystem by providing more independence and adaptability to data producers. Data Mesh fosters:
In this article, we will delve into the world of Data Mesh and how it rose to popularity. As the digital landscape continues to evolve, embracing the principles of the Data Mesh philosophy could potentially hold the key to unlocking the full potential of an organisation’s data resources, taking scalability to the next level.
While centralised data platforms are scalable, continually adding new data sources and transformations leads to chaos, often referred to as “data spaghetti”. Data Mesh offers an alternative for larger organisations, distributing data ownership and governance to maintain manageability as complexity grows.
It’s important to note that scalability is a concept bound to the company’s needs, and not every company should adopt Data Mesh. Smaller organisations may find centralised platforms suitable, while larger ones might consider Data Mesh as a solution to scale their data operations effectively.
We also follow up with a small comparison between Data Mesh, Data Fabric, Data Lake, and Data Lakehouse, so you are fully informed about which approach your company might want to follow.
If you’re eager to explore further possibilities and make the most of your data, we invite you to delve into our collection of case studies about Data solutions:
Managing data involves a range of methods that influence how an organisation deals with its data. To do so, there are different approaches such as Data Warehouse, Data Fabric, Data Lake, and Data Lakehouse. However, let’s explore the decentralised approach for now, but from a different perspective known as Data Mesh.
If we want to get a better grasp of Data Mesh, it makes sense to know more about centralised data platforms. While they offer clear benefits, some organisations need to enhance their data management practices even more.
This is where Data Mesh becomes relevant. So, let’s begin by examining centralised data platforms first.
Centralised Data Platforms establish a single source of truth, by integrating data into a coherent system. Serving as pivotal centres for data integration, centralised data platforms ensure consistency, streamline data processing, and facilitate the extraction of insights. They empower organisations to excel in an environment where data plays a pivotal role in propelling business operations forward.
However, these platforms often come into existence by tech-savvy engineers with a limited understanding of domain-specific complexities. This disconnect naturally arises due to the wide spectrum of business domains an organisation might be involved in. The lack there of exposes the organisation with high data demands to potential hazards, like decreasing data quality, that culminate in compromised final reports.
Therefore, forging symbiotic collaborations with teams deeply rooted in specific domain expertise becomes not just advantageous, but an absolute necessity to make the most out of data.
Data originates from operational systems that drive businesses forward. The Centralised Data Platform moulds raw data into the desired shape for analysis by data consumers. This complex process, referred to as data processing or the data pipeline, involves several steps: ingesting, validating, aggregating, and complex manipulations among diverse datasets. This brings forth three primary sets of challenges when the domain-specific component is neglected:
This implies that operational teams handle data without explicit responsibilities and specifications for managing analytical data. The situation results in duplicated workloads across multiple teams involved in data management and processing, potentially causing bottlenecks due to the growing data volume.
What’s more, a relaxed validation process can inadvertently introduce incorrect records into the system, leading to inconsistencies. Despite careful planning to prevent such situations, the unpredictable nature of reality sometimes challenges our preparation efforts.
The intricacy arises from the fact that in analytical data platforms, these datasets are eventually integrated to extract insights. Consequently, a glitch in processing one dataset can disrupt numerous downstream pipelines, reports, and workflows, necessitating data engineers to coordinate and resolve issues across multiple areas.
This constant firefighting and operational load leave little room for teams to focus on pioneering new features and innovations. In other words, it limits the potential for growth and scalability.
In essence, creating a data platform without minding domain-specific needs is similar to constructing a structure with missing support beams.
Data Mesh’s rise to prominence appears to be a direct response to the limitations of established data architectures. The disparity between organisational requirements and prevailing solutions, and the pressing necessity for a revolutionary transformation in data engineering, brings it in line with the progress witnessed in software engineering.
Both the data and operational issues birth from the very core of the centralised, monolithic structure that underpins conventional data platforms. Organisations are then forced to make a resolute management decision to transform the existing platform into a data platform matching the organisation’s needs.
A platform that is designed to break free from the architectural challenges of centralisation, allows for enhanced reusability and flexibility. This stresses the critical role played by Data Platform architecture.
The design process should span the spectrum, addressing functional essentials such as data transformation, while simultaneously navigating the realm of non-functional considerations, optimising the flow and code to minimise manual interventions and operational obstructions.
Let’s have a look at key considerations in data platform architecture:
Performance Optimisation: Regular performance tuning and optimisation efforts should be part of the architecture’s lifecycle to ensure efficient data processing and analysis.
Think of a modern data platform architecture as a strategic imperative for organisations seeking to harness the full potential of their data assets. This is where Data Mesh comes in, a strategic response devised to tackle these challenges head-on.
Let’s have a look at the challenges, to detect if your company needs to overthink their data strategy and move towards a Data Mesh concept:
Challenge | Centralised Data Platforms | Data Mesh Solutions |
---|---|---|
Scalability | As data volume grows, centralised platforms may struggle to scale efficiently, causing performance, growth, and extension bottlenecks. | Data Mesh introduces data products, enabling horizontal scalability as domain teams manage their own scalable data products. |
Agility | Centralised platforms can adapt slowly to changing business requirements, hindering agility and innovation. | Data Mesh empowers domain teams with self-serve infrastructure to iterate and innovate independently. |
Collaboration | Lack of global, shared context and understanding can impede collaboration between data producers and consumers. | Data Mesh emphasises standardised data contracts, improving understanding and collaboration on data semantics. |
Resource Allocation | Concentrating resources for maintenance and processing in a single team can lead to inefficiencies. | Data Mesh distributes resource management to domain teams, optimising utilisation and effectiveness. |
Maintenance | Centralised platforms can become complex and resource-intensive to maintain as they scale. | Data Mesh distributes maintenance responsibilities, allowing domain teams to maintain their specific data domains efficiently. |
Platform Architecture | Traditional centralised architecture limits flexibility and adaptability in managing diverse data types. | Data Mesh adopts a federated architecture, allowing diverse domains to work together seamlessly under a unified framework. |
Data Mesh is a framework based on 4 core principles that apply to each domain. Having these principles in place, organisations gain an environment where the benefits of Data Mesh are harnessed without falling into the pitfalls of the complexity and inefficiency of decentralisation without these measures in place.
The 4 principles are:
The initial two core principles of Data Mesh, domain-oriented decentralised data ownership and architecture, as well as data as a product, address the challenges integral to centralised data platforms. By distributing ownership to individual domains and treating data as a valuable product, these principles tackle issues of low data quality.
The subsequent two principles of Data Mesh, federated computational governance and the self-serve approach, play a crucial role in mitigating potential drawbacks of decentralisation. Federated computational governance establishes standardised processes and ensures connectivity across domains in an automated way. Simultaneously, the self-serve approach streamlines operations and performance, simplifying data management and interaction across domains.
Let’s go through one by one.
1. Data Mesh: Domain-oriented decentralised data ownership and architecture
At the core of the Data Mesh approach is the idea of breaking large, monolithic data platforms into smaller, more manageable domains. This mirrors how modern businesses employ specialised teams to handle specific aspects of their operations, ultimately improving data-driven decision-making.
In certain situations, these domains may also benefit from further subdividing their data into nodes to better align with the organisation’s needs.
Let’s have a look at the graphic below:
For instance, consider a domain like ‘sales.’ Within this domain, you might gather all the order-related data. In another domain, ‘customers,’ you could collect user information such as addresses and names, among other details.
Now, let’s delve deeper. Within the node ‘customer behaviour’, another domain that might aggregate orders and other behaviours by customers. It allows you to predict when a customer might run low on a previously ordered product, or when the customer is likely to return a purchased product. This prediction can then trigger a targeted mailing campaign, ultimately boosting sales, or optimise logistics costs for the enterprise.
By breaking down data into nodes like ‘customer behaviour’ and ‘customer order,’ an organisation gains flexibility and access to high-quality data, custom-tailored to meet specific needs.
One effective approach to identifying suitable domains within an organisation is Domain-Driven Design (DDD). Applying DDD principles to data architecture involves collaborating with domain experts to define clear boundaries and responsibilities for each domain. This ensures that domains are meaningful reflections of the business reality, instead of just technical divisions.
Within the context of Data Mesh, domains can be categorised into three primary types:
The concept challenges the traditional centralised approach, promoting agile ownership and empowering smaller, specialised teams. However, there are potential downsides as well. Without proper connectivity and alignment, these domains may struggle to fulfil their roles effectively.
Let’s say, if a source domain isolates its data, downstream consuming domains may lack the information needed for meaningful analysis. Similarly, an aggregating domain’s work can become a constant firefighting effort if it depends on is inconsistent or incomplete data.
Despite these challenges, when aligned with other principles, this approach offers significant advantages, including agility, operational scalability, and improved data quality through a deeper understanding of the business domain.
2. Data Mesh: Data as a Product
The Data as a Product principle within Data Mesh acknowledges the complexity of discovering, exploring, understanding, and ultimately trusting data, especially when it’s spread across various domains. The second principle of Data Mesh simplifies the process and enhances data usability for a wide range of consumers, including Data Analysts, Data Scientists, and other downstream users.
In essence, it means a shift in mindset that distances itself from viewing data as a passive resource, but views it as a valuable product meticulously designed, developed, and managed to meet the specific needs and expectations of its consumers.
The transformative concept of treating data as a product addresses a critical challenge that has long been a drawback of centralised data platforms: the significant time and effort required for operational support around data. The focus here is squarely on the data itself, with the aim of streamlining its accessibility, quality, and usability.
The Data as a Product principle emphasises that data should embody three key qualities:
These attributes merge to create a comprehensive understanding of data that aligns with business objectives:
Implementing the Data as a Product principle requires the involvement of key roles:
These roles possess sufficient domain knowledge and proficiency in basic programming languages and SQL. They play a crucial role in ensuring data remains accessible, discoverable, secure, and up-to-date. This, in turn, enhances data quality, allowing one domain to serve multiple data products to data consumers.
The Data as a Product principle within Data Mesh serves as a robust solution to combat issues like data siloing. It fosters instant data accessibility and user-friendliness, ensuring smooth operations.
However, like any concept, there are complexities to consider. Different approaches to defining features for Data as a Product may introduce various techniques that could complicate implementation. Challenges such as repetitive efforts and differing interpretations can lead to increased costs in building decentralised data platforms.
This is where the following two principles within Data Mesh come into play.
3. Data Mesh: Federated computational governance
Decentralisation brings its own set of challenges. The absence of common processes and standards often leads to weak connectivity and interoperability issues, which, in turn, hinder the generation of cross-domain insights. The solution to this challenge lies in embracing the Federated Governance principle, which has emerged as a key component in implementing and maintaining a decentralised structure.
Picture this: In a decentralized data landscape, various domains operate independently, each with its own processes and rules. This can result in a lack of coordination and consistency, making it difficult to achieve meaningful insights from the data.
This is where Federated Governance jumps into place, a guiding principle designed to address these challenges.
Federated Governance revolves around maintaining a high and consistent level of service. Its primary objective is to instil compliance and consistency within domains and the data products residing within them.
In our increasingly interconnected world, data contracts play a pivotal role in ensuring data integrity and coherence in cross-domain collaboration. These contracts serve as explicit agreements, outlining the precise structure, exchange mechanisms, and interpretation guidelines for data shared among different systems, teams, or domains.
Creating data contracts represents a significant paradigm shift, necessitating a comprehensive organisational restructuring. This transformation demands careful consideration and the implementation of innovative solutions to ensure the success of cross-domain data collaboration. However, it’s essential to note that within the context of Federated Computation Governance, sole data contracts may not always be the optimal solution.
In the realm of Federated Computational Governance, it’s crucial to recognise that data contracts are just one piece of the puzzle. Robust and comprehensive governance mechanisms work together to provide a holistic framework for managing and governing data across diverse domains. These mechanisms are:
Incorporating these elements into Federated Computation Governance ensures a more holistic approach to managing data across domains. While data contracts remain fundamental, they are enhanced and complemented by these broader governance practices. Together, they contribute to the maintenance of high-quality data and effective cross-domain collaboration within the evolving landscape of data management.
Ideally, governance components should be automated as much as possible. There are two key reasons behind this:
Automated solutions are inherently better at maintaining high-quality and consistent service levels compared to manual interventions. This computational approach ensures efficient and consistent governance implementation.
4. Data Mesh: The self-serve data platform in Data Mesh
Decentralised platforms can result in duplicated and multiplied work when organisations only apply the first three principles. Building, running, monitoring, and deploying each operational domain can lead to repetition, increased costs, and added complexity.
Entrusting complete responsibilities for these tasks to each domain hinders the achievement of consistent and high-quality service levels. In such situations, automation becomes essential to streamline processes and meet standardised service level objectives (SLOs).
The Self-Serve Data Platform automates the complexities of managing, maintaining, and deploying domains. This liberates Domain Data Engineers from operational complexities, allowing them to focus on domain-specific transformations, modelling expertise, and platform interaction capabilities.
Furthermore, the platform simplifies storage, computing, data sharing, and enhances security. All of these factors together make it easier to address the organisation’s needs, maintain consistent processes, and ensure that service level standards are consistently met.
There are two critical facets to self-service data platforms that significantly enhance their value within the Data Mesh framework:
One of the main tasks of the Self-Serve Data Platform is providing profound insights to the entire enterprise. This can be done with the following capabilities:
The self-served platform capabilities additionally democratise access to the data for analysts or other kinds of business stakeholders, enabling to get more interesting insight and conclusions.
These capabilities bridge the gap between complex technology and strategic decision-makers.
The Self-Serve Data Platform focuses on streamlining operational challenges related to domain management, maintenance, deployment, and continuous monitoring. It allows engineers to holistically monitor the status of Service Level Objectives (SLOs), enabling them to:
Incorporating the self-serve principle into the Data Mesh framework results in a dual advantage.
The Self-Serve Data Platform optimises operational efficiency and maximises the potential of the Data Mesh concept, enabling organisations to harness the full extent of its benefits.
The Data Mesh paradigm marks a transformative shift in how organisations approach their data ecosystems. At its core, it emphasises integration of different data types, bridging the gap between operational and analytical data. It significantly enhances decision-making processes, offering a holistic view of the business that aligns both real-time operational insights and historical analytical perspectives.
Data Mesh propels organisations to venture into uncharted territories when venturing towards the fusion of operational and analytical facets. This novel approach demands technical prowess, emphasises cultural shifts, and collaborative endeavours.
Leveraging Data Mesh principles within expansive enterprise-level data platforms can lead to a multitude of substantial benefits. These advantages encompass:
It’s important to recognise that there is no universal formula to implement Data Mesh. The very essence of Data Mesh lies in its adaptability, allowing companies to carve their distinctive paths and data products.
Just as each organisation possesses its unique attributes, goals, and challenges, the resulting data products within the Data Mesh will be equally distinct and tailored to cater to the individuality of the enterprise.
Pivotal aspects, such as crafting a domain-oriented architecture and executive structure, are a journey troubled with considerations. The implementation hinges on the organisation’s readiness and willingness to embrace change, and capacity to adapt.
This said, let’s delve into the challenges of implementation.
While Data Mesh offers a promising approach to managing complex data ecosystems, organisations need to be prepared to address challenges effectively. A thoughtful implementation strategy, strong leadership support, and a commitment to ongoing refinement are essential to navigate these complexities and reap the benefits of a Data Mesh framework.
Let’s take a consolidated look at the challenges:
Disclaimer: It’s important to mention that all of the four concepts (Data Mesh, Data Fabric, Data Lake, Data Lakehouse) are not directly comparable. All of them work as architectural paradigms for building data platforms.
In today’s data-driven landscape, organisations are faced with the challenge of efficiently handling, processing, and extracting value from vast amounts of data. To address these demands, various data management architectures have emerged, each with distinct approaches to data organisation, processing, and governance.
By examining their unique characteristics, strengths, and considerations, we aim to provide a clear understanding of how these architectures differ and how they can potentially cater to different organisational needs.
Data Fabric, as a unified data integration and management framework, stands out by offering organisations a centralised solution for tackling the challenges of data integration, transformation, and governance. It provides a cohesive perspective of data, harmonising information from diverse sources, formats, and locations.
Unlike Data Mesh, which promotes a decentralised approach with domain-oriented data teams, Data Fabric centralises data control and abstraction, offering a more unified and structured solution for data integration and management. Data Fabric’s core strength lies in abstracting the intricacies of data infrastructure, allowing organisations to maintain data consistency, accessibility, and reliability.
It revolves around data pipelines, offering robust capabilities for data discovery and integration. By presenting a unified data layer to users and applications, Data Fabric remains a powerful tool in simplifying complex data ecosystems, making it an indispensable choice for enterprises seeking streamlined data management.
While a Data Lake serves as a centralised repository that efficiently stores vast amounts of raw, unstructured, and structured data, Data Mesh introduces a fundamentally different approach to data management.
Data Lakes excel at handling data from various sources, even without predefined schemas. They are particularly suitable for managing extensive data volumes and serving as a robust foundation for a wide range of data analytics and processing tasks. This empowers data scientists and analysts to explore the data and extract valuable insights.
In contrast, Data Mesh promotes a decentralised model, emphasising domain-oriented data teams and distributing data ownership across an organisation. This distinction highlights how Data Mesh challenges the centralised storage paradigm of Data Lakes, focusing on improved data quality, accessibility, and governance through a more decentralised and team-centric approach to data management.
The choice between Data Mesh and Data Lake hinges on an organisation’s specific data requirements and preferred data governance strategy.
As an emerging architectural concept, the Data Lakehouse combines the strengths of both Data Lakes and traditional Data Warehouses. This innovative approach aims to deliver the scalability and flexibility of Data Lakes while introducing essential features such as schema enforcement, data quality assurance, and optimised query performance, often associated with Data Warehouses.
Data Lakehouses serve as a bridge between data engineering and data analytics, offering a unified platform for the storage, management, and analysis of data. In contrast, Data Mesh represents a decentralised approach to data management, emphasising domain-specific data teams and distributed data ownership. Data Mesh revolutionises how organisations manage their data.
In contrast, the Data Lakehouse concept takes traditional data warehousing capabilities and enhances them by incorporating the scalability and flexibility of Data Lakes. This makes it an appealing option for those seeking to bridge the gap between these two data management paradigms. The choice between Data Mesh and Data Lakehouse ultimately depends on an organisation’s specific data needs and preferred data management approach.
Data Mesh represents a pivotal paradigm shift in the world of data management, holding the promise to reshape how organisations handle their data in the future. Its emphasis on domain-oriented decentralisation, collaboration, and treating data as a product offers a new path towards more agile, scalable, and efficient data ecosystems. As organisations continue to grapple with growing data volumes and evolving requirements, the principles of Data Mesh provide a framework for addressing these challenges head-on.
However, the adoption of Data Mesh is not without its complexities. It requires a cultural shift, technical proficiency, and a commitment to collaboration. Organisations must assess their readiness for Data Mesh adoption, considering factors such as their existing data infrastructure, team dynamics, and willingness to embrace change. While the journey to becoming Data Mesh-ready may involve challenges, the potential benefits in terms of data quality, agility, and decision-making are substantial.
In an era where data-driven insights are paramount, Data Mesh stands as a beacon of innovation, offering a glimpse into a future where data is not just managed but harnessed for its full potential. As organisations continue to explore this transformative approach, the data landscape is poised for a profound evolution, driven by the principles of Data Mesh.
If you liked this article, we recommend reading these articles:
Unlock the power of your analytical data platform for data-driven decisions →
How gradual engineering improvements deliver value and save costs →
Data Mesh is a modern approach to data management that emphasizes decentralisation, domain-oriented teams, treating data as a product, employing a self-serve platform, and using a federated governance model. These four principles collectively form the foundation of Data Mesh.
In Data Mesh, data management responsibilities are distributed among domain-oriented teams, whereas centralised platforms typically rely on a single team to manage all data. Decentralisation in Data Mesh aims to empower teams closer to the data source, fostering agility and scalability.
Treating data as a product in Data Mesh means that data is managed with the same level of care, ownership, and accountability as any other product in an organisation. Data is made accessible, discoverable, and reliable for its consumers, promoting higher data quality and usability.
Federated governance in Data Mesh focuses on maintaining consistent data standards and practices across domains, while allowing each domain to have autonomy. Centralised platforms enforce governance from a single point, whereas federated governance ensures compliance and consistency while empowering individual domains.
Data Mesh is a transformative approach that may not be suitable for all organisations. It is well-suited for organisations with complex data needs, a willingness to adapt culturally, and a desire for enhanced agility. Centralised platforms are still effective for organisations with simpler data requirements and established centralised practices.