Sustainable Data Governance with Unity Catalog
The time to view data governance as a luxury, a nice-to-have, has come and gone. In today's data-driven world, proper data governance is essential for survival. It has become a key enabler to organizations' future growth and resilience, especially for highly regulated entities such as financial services institutions (FSIs).
A robust data governance framework is critical to maintain data quality, ensure data privacy and security, and meet regulatory compliance requirements. While this sounds like a simple set of objectives, it is a seemingly impossible outcome and a never-ending journey for most. That is, until now.
With the release of Unity Catalog, FSIs finally have access to a comprehensive, robust toolkit that seamlessly slots into any organization's data governance framework – designed from the ground up to offer an operationally efficient and sustainable platform. In this multi-part series, we'll discuss everything from the utopian governance end state, the road to get there, and the value captured on the bottom line, all with Unity Catalog at the heart of the ecosystem. In this blog, we'll start by looking at a typical desired data governance framework, what it looks like with Unity Catalog at the core, and how organizations can integrate the solution with other custom or third-party platforms for data governance.
Understanding the Challenges
Implementing and maintaining a robust data governance framework has been a significant challenge to the financial services industry for decades. FSIs typically comprise complex organizational structures, manage an expansive range of data types and sources, and are subject to some of the most stringent security and privacy regulations.
As the industry moves toward digital transformation, the importance of a robust but dynamic data governance framework becomes increasingly evident. It's not simply about compliance but improving operational efficiency, enhancing customer experiences, and driving informed decision-making across the organization. More specifically, FSIs tend to struggle with the following:
- Data Security and Privacy Compliance
- Data Quality Management and Maintenance
- Data Discovery and Collaboration
- Organizational Maturity and Discipline
The industry's challenge, then, lies in overcoming the hurdles of outdated systems, cross-functional alignment and coordination, and intricate legal landscapes. Implementing a data governance strategy simultaneously leveraging the appropriate technologies and fostering a culture of applied data governance is paramount.
"Data governance must become a natural extension of the organization's being. It's the people, process, and technology coming together in a data-minded endeavor to power all decisions made across the organization."
Harnessing the Power of Unity Catalog
Awareness, visibility, access, and observability are all at the heart of the data governance challenge. Databricks Unity Catalog is a unified data and AI governance solution designed from the ground up to address and solve these critical challenges.
Unity Catalog is engineered to help organizations address critical data governance issues and achieve a greater state of data awareness. More specifically, Unity Catalog can help FSIs:
- Improve data visibility and understanding. Unity Catalog's centralized approach provides a unified view of data assets and their utilization. It makes it easy for organizations to understand how their data is used, the decisions it powers, and the value it delivers.
- Strengthen data security and regulatory compliance. Unity Catalog employs a single permission model to ensure consistent application of access policies. Fine-grained access controls further enhance security measures.
- Automate critical governance tasks. Unity Catalog offers native services to automate several functions, including (near real-time) lineage, asset monitoring and observability, and auditing.
- Foster a culture of data ownership and mindedness. Unity Catalog's unified view of data assets naturally cultivates an environment of transparency, consistency, and collaboration – allowing organizations to optimize the responsibility and accountability matrix, elevating the general level of data maturity and discipline.
Implementing Unity Catalog allows FSIs to embed a unified operational governance layer at the core of the data ecosystem – built to centralize visibility, access and permissions models, observability, and data sharing and distribution.
Interoperability: The Key to Comprehensive Data Governance
FSIs increasingly embrace a more diverse set of specialized systems and technologies to handle the vast data ecosystem. Ensuring compatibility and harmonious collaboration between these technologies is essential for the sustainability and effectiveness of a data governance framework.
Figure 1 shows the vast requirements for the data governance framework across the modern data ecosystem. What's more, is the level of interoperability required between systems and stages. Data quality monitoring, for example, would primarily come into play after the ingestion stage. On the other hand, data lineage and search and discovery must be considered across the entire estate.
A key strength of Databricks Unity Catalog is its modularity, extensibility, and interoperability. It is designed to seamlessly integrate and work with existing data management tools, including catalogs, storage systems, and governance solutions. FSIs can leverage existing investments in complementary platforms and services to capitalize on specialized functionalities and enhance different aspects of the governance framework. Unity Catalog empowers organizations to create a comprehensive and tailored data governance solution that meets their needs without expensive migration costs.
The rest of this discussion will explore how Unity Catalog can be integrated with three popular platforms: Immuta, , and . Unity Catalog is the foundational operational layer, providing a centralized hub for managing, applying, monitoring, and auditing governance functions.
"Unity Catalog is the conductor in a symphony of information flow, orchestrating the efficient allocation and use of data to build value for customers and shareholders."
Enhancing the Security and Privacy Model
The financial services industry is synonymous with sensitive data. The complexities of securing this data and complying with the requirements set forth by regulatory frameworks on data privacy (e.g., GDPR and CCPA) typically result in an operating model that either severely restricts data access or prohibits it altogether. The result is a general breakdown in collaboration and data-driven innovation. A recent study by Forbes found that in 2021 only 48.5% of companies surveyed could effectively drive innovation with data.
Unity Catalog can help organizations simplify the security framework with a single permission model for all data and AI assets. The true power, however, lies in integrating with an advanced data security platform like Immuta. In this scenario, Unity Catalog provides a centralized, comprehensive metadata repository with a robust enforcement mechanism for access management. Immuta's cutting-edge technology also allows for fine-grained access control and dynamic data masking, granting authorized users the necessary access while protecting sensitive information.
The tight integration between Unity Catalog and Immuta ensures that data governance policies are enforced and applied consistently across the entire data ecosystem. Additionally, combining the two systems further enhances compliance capabilities, aligning data governance practices holistically with regulatory requirements such as GDPR, CCPA, or PCI DSS. Unity Catalog facilitates automated data classification and tagging to ensure data is managed appropriately based on the relative levels of sensitivity and security. Immuta, in turn, leverages the metadata provided by Unity Catalog to automatically enforce dynamic attribute-based access controls and privacy policies that can be written in plain language, with no SQL coding or technical expertise required. Together, Unity Catalog and Immuta significantly reduce the time and resources required to manage risk and compliance across all data sets.
Ultimately, the collaboration between Unity Catalog and Immuta promotes a culture of trust and transparency that permeates the entire organization and its collection of data assets. Users are empowered to access the necessary information to make decisions while ensuring compliance at the most granular level. The partnership fosters a mindset of security and privacy awareness.
Augmenting the Quality Monitoring Process
For any data governance framework to succeed, trust is a must. Maintaining proper data quality is essential to uphold the organization's trust in its data and, ultimately, its reliance on said data for making decisions. Data discovery views can and should be accompanied by signals from data quality monitoring systems to improve awareness of potential issues and provide mechanisms for remediation. The integration between Unity Catalog and Anomalo marks a significant advancement in data quality management within the data governance landscape.
Anomalo, with its sophisticated suite of anomaly detection capabilities, seamlessly integrates with and complements Unity Catalog to offer continuous data monitoring that can identify deviations from expected patterns and predefined thresholds. Anomalo leverages statistical analysis techniques and machine learning (ML) algorithms to supercharge the data quality management process – automatically detecting outliers, inconsistencies, and other anomalies that might impact or erode the integrity and reliability of the data. The integration with Unity Catalog further enables end users to leverage lineage information to contextualize flagged anomalies, providing deeper insight and understanding of each issue's root cause, potential impact(s), and the significance of discrepancies.
Moreover, the combination of Unity Catalog and Anomalo empowers FSIs to establish a closed-loop feedback system for data quality management and improvement. Data stewards can, for example, update data quality rules and standards within the respective system(s) as anomalies are detected, and their causes are identified. This greatly accelerates the resolution process and cultivates a proactive approach to mitigating future issues. The continuous improvement cycle ensures that active data quality management becomes an evolving, self-correcting way of life, driving data excellence throughout the organization.
Democratizing the Search and Discovery Engine
Complete data visibility and transparency have long been a core strategic objective of FSIs looking to harness the full potential of their data assets. The ability to have sight of organizational data holdings and gain a comprehensive understanding of data lineage is crucial for making informed decisions, ensuring (regulatory) compliance, and maintaining a competitive edge in a rapidly evolving industry. Despite recognizing the importance, achieving this state of access and awareness has proven elusive for many organizations.
One of the key challenges hindering FSIs from attaining complete data visibility is the sheer complexity and fragmented nature of the data ecosystem. Organizations often operate with many legacy systems, modern technologies, and disparate data sources across various business units. The innate disjointedness and isolated operating model make creating a unified view of data assets difficult. All these factors compound to limit accessibility and inhibit collaboration.
Integrating Unity Catalog and Alation revolutionizes data search and discovery within the organization. Unity Catalog is the centralized metadata repository, automatically capturing detailed information about how, when, where, and by whom data assets are being used. Alation can leverage this repository across its advanced data intelligence platform, making it easy for technical and non-technical users to find, understand, and utilize data.
For technical users, the combination of Unity Catalog and Alation offers enhanced search capabilities that extend beyond conventional metadata queries. Users can leverage Alation's intelligent search algorithms, which utilize natural language processing (NLP) and machine learning (ML), to find relevant datasets, queries, or reports quickly. On the other hand, Alation's user-friendly interface caters to non-technical users, such as business analysts, executives, and other stakeholders. Alation's storytelling capabilities allow users to access curated, business-friendly data assets, glossaries, and insights in a digestible format.
Combining the two platforms greatly simplifies exploring and understanding data and fosters a data-minded culture across the organization. The complementary relationship between Unity Catalog and Alation empowers organizations to harness the true potential of their data, making informed decisions that directly impact business value drivers.
Compounding Data Governance: The Eighth Wonder of the World
Interoperability between governance solutions should not just be uni- or bi-directional, but rather omnidirectional. In this scenario, the integration and interoperability between Immuta, Anomalo, and Alation amplify the governance framework to deliver a network-effect solution beyond their individual (direct) integrations with Unity Catalog. By integrating these solutions, FSIs can close the loop on a compounding set of capabilities to create a sustainable governance solution that becomes part of the business-as-usual process.
With Immuta and Alation working harmoniously, organizations achieve greater data security and privacy management. For example, Immuta's dynamic data masking capabilities can call on Alation's tagging services to protect sensitive information. The combination enables data stewards to enforce granular permissions policies based on data sensitivity while offering a user-friendly experience for data exploration, discovery, and access management. It's a powerful partnership that can be deployed to foster a culture of data trust and collaboration.
The collaboration between Alation and Anomalo, on the other hand, further augments data discovery, source triaging and classification, and root cause assessments. Integrating the two platforms allows users, for example, to see Anomalo's column-level profile visualizations directly from within Alation. Alation’s table overview also contains a custom subsection for Anomalo’s data quality checks, with visual indicators for each check’s status. You can also click on the hyperlink to access the corresponding table view in Anomalo. The combination of capabilities offers an integrated solution for proactive data curation and quality monitoring, saving time and resources and enabling organizations to identify critical data issues promptly.
With each additional module, this amalgamation of platforms and services on top of Unity Catalog (see Figure 2) empowers FSIs to establish a data governance solution that compounds strengths and capabilities. It provides a centralized, unified operational core that feeds and powers all other governance-related platforms – streamlining data governance processes, promoting secure data handling, driving insightful data exploration, and ensuring high-quality data assets throughout the organization's data lifecycle.
Putting Operational Governance in Practice
"In theory there is no difference between theory and practice – in practice there is". With this simple statement, Yogi Berra encapsulates a fundamental truth that resonates profoundly within the financial services industry, especially regarding data governance. While meticulously designed in theory, data governance frameworks often encounter practical challenges that can deter even the most innovative technology. That is, without Databricks Unity Catalog as the operational core.
Step 1: Laying the Foundation
To demonstrate how Unity Catalog can bridge the gap between theory and practice, we turned to the world of banking. A notoriously tricky environment to govern, banks typically comprise some of the most siloed organizational structures – hampering most efforts to achieve total transparency, visibility, and seamless collaboration. In this scenario, we have a fictitious financial services provider, Summit Financial Group (SFG), with a retail banking division offering three core functions: card, lending, and risk. Each function's data, analytics, and intelligence requirements are supported with individual Databricks Workspaces (see Figure 3).
Before Unity Catalog, these functions (i.e., Workspaces) would operate in complete isolation – no metadata sharing, no cross-functional visibility, and no mechanisms for cross-jurisdictional collaboration. With Unity Catalog, SFG can associate each Workspace with a single Metastore, immediately addressing all these challenges with a button. Figure 4 shows how easy it is for a user from the Card business to get visibility of data from the Loan and Risk businesses.
Step 2: Fortifying the Estate
The most significant advantage of a shared ecosystem, complete visibility, is also its most significant risk. While FSIs like SFG want the ability to have comprehensive visibility across the data estate, they also need the necessary safeguards and controls to restrict access to sensitive datasets subject to regulatory compliance.
In this scenario, we combined the power of Unity Catalog with the data security services of Immuta. With Immuta, we can define both Subscription and Data policies to enforce granular attribute-based controls down to the row, column, and cell levels (see Figure 5).
Subscription policies offer a simple interface to control which user can request access to which data source, and provide four levels of restriction (in order of increasing restriction):
- Anyone: All users will automatically be granted access.
- Any who asks (and is approved): Users will have to explicitly request access and be granted permission by the configured approvers.
- Users with specific groups and/or attributes: Only users within a particular group or with specific attributes will be granted access.
- Individually selected users: Data owners must manually select users to grant access.
Data policies, on the other hand, can be leveraged to control row-, column-, and cell-level visibility (see Figure 6). Immuta offers a range of policy types to control masking, row redaction, and purpose restrictions.
Combining Subscription and Data policies enables users to quickly and easily define dynamic permission specifications applied to the underlying data source through Unity Catalog's centralized permission model. Figure 7 shows examples of the statements constructed by Immuta (based on the policies defined for SFG) and applied to the source catalogs, schemas, and tables through Unity Catalog.
Comparing Figures 4 and 8, we can see the security and privacy collaboration between Unity Catalog and Immuta. Subscription policies restrict the user's ability to view assets outside their allocated environment, while the applied Data policies limit the visibility of attributes containing sensitive information.
Step 3: Maintaining the Trust
Enforcing the required security and privacy controls is essential, but it's important to remember that it's only part of the overall solution. Beyond safeguarding sensitive and proprietary information, the very essence of innovation and progress lies in the ability of the organization to rely on the data at hand.
As the saying goes, "Garbage in equals garbage out". FSIs need to pay more attention to the importance of data quality. Even the most fortified data fortress can crumble if the data within lacks integrity and accuracy. In the second part of our case study, we connected Unity Catalog with Anomalo to supplement SFG's data quality management capabilities.
Through Anomalo, we get access to a simple, easy-to-use interface for granular data quality monitoring across various factors, including data freshness, anomalies in volume, missing data and their proprietary unsupervised machine-learning based table anomalies. Moreover, we can further define custom key metrics and validation rules bespoke to our datasets (see Figures 9 and 10). This comprehensive view is essential to understand the underlying root causes that deteriorate quality and obliterate trustworthiness in the source data. The results of these checks can be viewed directly from the Data Explorer view in Databricks Unity Catalog.
Additionally, Anomalo provides detailed source- and column-level data profiling (see Figure 11). These visualizations and statistical summaries can be used to understand what's happening within the underlying datasets and qualify or quantify the magnitude and implications of upstream quality issues. It's an automated approach to root-cause analysis that can help the organization isolate and contain the fallout from a given data quality problem.
This example reiterates the power of having Unity Catalog provide the operational core for the overall governance solution architecture. The same metastore repository used by Immuta to enforce security and privacy can be leveraged by Anomalo to monitor for inconsistencies and anomalies. It showcases the compounding power of the operational data governance model to deliver a solution that inherently facilitates cross-functional collaboration, achieves transparency and visibility, and builds trust in the underlying data assets themselves.
Step 4: Take it to the People
Last but certainly not least, it's essential to recognize that the path to truly becoming a data-minded and data-driven organization hinges on the fundamental principle of data democratization. Gone are the days when data was confined to the realms of IT departments. In today's fast-paced society, organizations cannot afford to limit data consumption to purely technical functions. The challenge, however, lies not only in opening the floodgates of data but also in providing the necessary clarity and context on the data itself.
For the final step of the case study, we teamed up with Alation to provide a business-friendly interface into the soul of Unity Catalog. Combining the two platforms allows business users to traverse the data ecosystem through a portal that makes finding and understanding trusted data easy (see Figures 12 and 13).
The integration with Unity Catalog allows Alation to leverage information already captured within the metastore. For example, Alation can access lineage information from Unity Catalog to provide a view for business users unilaterally consistent with that seen by technical users within Databricks. Further, in this example, the interoperability between Immuta and Alation on top of Unity Catalog allows the organization to leverage data security and privacy policies (defined in Immuta) to control visibility and access in Alation. Users can further leverage Alation features like Tags to dynamically update the security and privacy controls enforced through the Immuta and Unity Catalog relationship. For example, a user can tag a specific attribute as "sensitive" to have that column's values redacted from query results.
Finally, in this example, we connected Alation with Anomalo. This combination lets the user get a detailed view of the data quality metrics for a given source directly from within Alation (see Figure 14). It's a simple step that significantly simplifies the overall user experience and allows finding, understanding, and validating data through a single, trusted portal.
This simple example shows the compounding effect of the solution framework. We can leverage Immuta to distill complex governance policies into dynamic permission expressions that can be applied at scale. We can integrate with Anomalo to strengthen our quality monitoring process and gain detailed insights into the underlying issues that cause inconsistencies and anomalies that affect the trustworthiness of the data. And we call on Alation to be our first port of call for data intelligence, making it easy for technical and non-technical users to uncover the wealth of information available to the organization. Most importantly, the case study shows the power of building an operational data governance solution on top of Unity Catalog and its ability to translate theoretical principles into effective real-world practices with remarkable ease and practicality.
Generative Governance: A Peek into the Future
Mark Twain is often credited with saying, "It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so." When it comes to data governance, we must ask ourselves whether what we think we know is true. The onset of generative AI brings a new range of possibilities to reimagine the entire data governance framework.
In a world where customer data constantly morphs and adapts to emulate real-world conditions and where organizations restructure and transform to support new business and operating models, it begs the question of whether any data governance framework can truly become sustainable without the support of generative AI. With the rapid decrease in the total cost of ownership (TOC) of these tools, organizations have within their reach the opportunity to harness the power of generative AI to streamline governance processes, gain insights, and identify patterns that might otherwise remain hidden.
From data synthesis and profiling to policy generation and continuous contextualization, generative data governance holds tremendous potential to define how organizations manage and leverage their data. What's clear is that, with Unity Catalog at the core, organizations can lay the bedrock for a wholly transformed governance framework that can unlock a new era of innovation and foster a culture of data-mindedness.
Navigating the complexities of data governance in today's fast-paced, data-driven world can be daunting for FSIs. But data governance is more than merely a technology challenge. Solving the key challenges requires an organizational mind shift; a commitment to creating a data culture that ensures quality, respects privacy, promotes transparency and ultimately unlocks tangible business value from data.
Databricks Unity Catalog significantly simplifies the governance landscape, offering a robust, extensible core to power the operational functions of any data governance framework. Further pairing it with complementary platforms like Immuta, Anomalo, and Alation allows us to build a robust ecosystem that supports a comprehensive, sustainable data governance framework; all working in unison to develop and maintain trust in the data.
The question then is straightforward: does your organization have data trust, or will it go bust? If there's any doubt or uncertainty about the answer, it's time to rebuild your data governance strategy with Databricks Unity Catalog. In the age of data-driven decision-making, the quality, security, and accessibility of your data can make or break the organization. Equip your organization with the platform and tools not only to survive but thrive.
Originally published in partnership with Eon Retief on the Databricks blog.