Digitl | Data Mesh - a Decentralized Data Architecture

Data Mesh - a Decentralized Data Architecture

2023-02-13 | Article | Insights

Cultural and operational challenges present obstacles to becoming a data-driven company

Being a data-driven enterprise has many advantages. As you collect more data, your products and services improve, attracting more customers and deepening relationships with those you already have. In turn, this creates the foundation for improved future products and innovations that shape future business opportunities. Despite this, organizations often struggle to harness the power of data, and the challenges go beyond technology alone. According to a recent study conducted by Accenture Research the biggest impediments relate to cultural and operational challenges such as companies’ inability to extract tangible and measurable value from data to deploy data-driven business strategies.¹ There are many reasons for these impediments such as the complexity imposed by occurrences such as the amount of data produced in companies and its speed, siloes created by factors such as the use of multiple cloud providers, data ownership, lack of maturity in reporting and analytics processes, the application of AI, etc.²

Data Mesh is the organizational extension of a Data Lake

Data Mesh is a concept that extends the Data Lake idea to provide a more holistic approach to data strategy within an organization. The goal of Data Mesh is to balance data democratization and centralization, making data accessible and usable by all teams while still maintaining control and governance. By establishing a Data Mesh, organizations can help solve common data challenges such as siloed data, lack of standardization, and poor data governance. By adopting this approach, organizations can ensure that data is treated as a valuable asset, accessible to all teams, and used effectively to drive business decisions and outcomes.

The organization, infrastructure, user governance, and data governance are used to:

Link siloed data to enable large-scale automated analyses.
Offer easy access to a centralized infrastructure with a self-service model for faster data access and SQL queries.
Distribute data ownership across a federation of functions. As every team knows its data well, no additional data experts are needed.

Core tenets of a Data Mesh

The Data Mesh’s concept consists of four building blocks:

Domain-oriented decentralization - A Data Mesh architecture proposes federated domain teams such as Sales, Accounting, Marketing, and the like to create high-quality data assets instead of a central data team of technology specialists. These data-driven units in the organization have complete ownership over all data needed or collected for their specific business purpose. Due to their domain expertise, they are best suited to organize and generate value from data that originates from their domain. Although domains own the data, data governance (including user management) is handled by the IT unit. Thus, this is a hybrid solution between complete centralization and decentralization. Domain results can be made available as data products (e.g. Sales Forecast) to other domains, which in turn can be used for their data products. This way, in addition to a unified team structure, autonomous domain teams with a variety of skill sets are formed. Each data domain maintains its data warehouse, but these individual warehouses are combined to form a Data Mesh.
Data-as-a-product - This refers to all analytics data produced by one domain and made available to other domains. There is a repeatable application for a data product. Examples of metrics that indicate the value of a Data Product are users of the Data Product, questions answered by using the Data Product, etc. To achieve DaaP a Data Sharing Agreement contract is set between the data-producing domain and a user from another domain that details all relevant information about the data such as service level, guarantees for the data quality and recency, access instructions, etc. To access the data products, a Data Marketplace is used as a point of management and access control within the organization.
Self-service data platform - A central marketplace that facilitates the publication and use of data in a standardized manner and ensures that data platform functions (Security, Storage, Processing, Integration) are centralized or conform to standardized templates. Domains do not need to provide their infrastructure services or develop their access control, but they can make use of the centrally provided resources. It is up to the domains to maintain their diversity of data sources and data requirements. Each Domain is equipped with sufficient resources to meet the needs of its users and its internal needs.
Federated governance - By maintaining compliance with data access and security standards, (including provisioning of a certification audit process for data products and educating data product owners about those controls) the data governance ensures data security, data usage per predefined rules and requirements (Service Level Agreements), support for high-quality data products, and the development of standard forms for data sharing.

Using Google Cloud to build a Data Mesh

Google Cloud provides the technical foundation and a comprehensive product portfolio to successfully implement a Data Mesh

Completely serverless and scalable - A Self-Service Platform can be established that can be used by domain teams to create data products without affecting the operations. This enables the decentralization of domain-oriented data ownership and a suitable architecture.
Storage and compute separation - Provides simultaneous access to data products without data splitting or duplication. The scalability of BigQuery provides quick and reliable access to new data products for multiple users across the entire company with different needs (operational, analytical, machine learning, etc).
Centralized user- and data governance - Using Dataplex's Data-Governance Products, Google Cloud provides a centralized platform for defining a comprehensive data model and database, automating the implementation of Data Governance policies, and tracking data product usage.

The native connectors of the Google data sources such as Google Analytics and Ads Data Hub and custom connectors for third-party data sources can be used to import and consolidate all data in BigQuery. This can then be stored in Cloud Storage, building the central data hub of the Data Lake infrastructure. This way, the data is centralized and made available via data and user governance with Dataplex. With Vertex AI and Jupyter Notebooks, data can be analyzed, visualized, and processed with machine learning models to serve different use cases. After preparation of the data, it can be exported to Google Analytics to enrich existing data, build audiences, and activate them in the ad tech tools of the Google Marketing Platform, onsite, or in third-party ad tools such as Meta, TikTok, etc. This infrastructure enables all stakeholders to access their respective relevant information via tools like Looker Studio and PowerBI based on a connection to BigQuery and the BigQuery BI Engine effortlessly and quickly.

In a nutshell

There are many challenges in becoming a data-driven organization, including cultural and operational issues. To tackle these issues, a Data Mesh balances data democratization and centralization by creating organizational domain-oriented teams with complete ownership over their data. This allows for a centralized marketplace for data products within the organization. This is not an easy challenge to overcome and takes time and resources to train the different stakeholders. However, this is an attainable goal that can benefit some organizations by lowering their overhead and increasing the ability of different teams to make data-driven decisions.

To achieve this, the four building blocks of a Data Mesh are put in place, which are: domain-oriented decentralization, data-as-a-product, self-service data platform, and federated governance. Google Cloud provides the technical foundation for implementing a Data Mesh, offering a scalable, serverless platform for data products and centralized user and data governance. Please do not hesitate to contact us if you want to find out more about building a Data Mesh.

Resources:
¹ CLOSING THE DATA VALUE GAP, How to become data-driven and pivot to the new, Accenture, White paper, 2019
² Build a modern, distributed Data Mesh with Google Cloud, Google, White paper, February 2022