Anomalo Deepens Integration with Databricks Unity Catalog
May 30, 2023
In the age of big data, managing and governing data assets has become increasingly complex. Data is spread across different systems and departments, making it challenging to locate, understand, and verify. To address this issue, many organizations have turned to data catalogs, which are centralized repositories of metadata about data assets. Last year, Databricks launched Unity Catalog which provides a unified governance solution for all data and AI assets in the lakehouse.
To get the most value out of a catalog, data discovery must be accompanied by signals from a data quality monitoring tool. Otherwise, catalog users won’t know whether they can trust the data they’ve found. They might have to ask someone from data engineering for help, or worse, accidentally use data that’s untrustworthy, leading to hard-to-detect bugs in reports, machine learning models, and products. That’s why Anomalo is committed to deeply integrating our data quality monitoring platform with Unity Catalog.
About Unity Catalog
Unity Catalog (UC) is a collaborative data catalog that enables data teams to discover, manage, and govern data assets across their organization. It’s part of the Databricks Unified Data Analytics Platform, which provides a cloud-based environment for data processing, machine learning, and business analytics. In addition to features like secure search and simplified governance using ANSI SQL, Unity Catalog supports Delta Live Tables (DLT). DLT is an extension of Delta tables for building and managing reliable batch and streaming pipelines, which can be used for real-time analytics and reporting.
Anomalo + Unity Catalog
Anomalo has supported UC since its release, allowing customers to easily connect tables in UC to Anomalo for comprehensive DQ monitoring. Anomalo complements UC by offering a suite of data quality checks and robust data profiling, enabling users to automatically gain insights into the structure, completeness, and accuracy of their data, and ensures that data is consistent and meets quality standards.
This goes beyond standard data validation checks and user-generated policies within UC. As discussed in our previous blog post (Anomalo: The Lighthouse for your Databricks Lakehouse), Anomalo is the only data quality solution that takes full advantage of the power of the Databricks Lakehouse Platform. Anomalo uses machine learning to check data quality automatically, offering deep data quality that can even identify unexpected trends within the data itself.
New features of the Anomalo + Unity Catalog integration
Users of UC and Anomalo can now experience an even more comprehensive integration. As of today, Anomalo automatically monitors tables in Unity Catalog, and data quality checks are incorporated directly into the Databricks Data Explorer UI. If a check fails, customers can link into Anomalo’s automated root cause analysis, which allows you to quickly pinpoint the cause of any issues.
Let’s say a data analyst needs to find a particular dataset for a report. Using Unity Catalog, they can search for the dataset by name or metadata, such as the owner or creation date. Once they have found the dataset, they can view its schema, preview the data, and assess its quality using Anomalo’s native integration. If they need to make changes to the dataset, they can use Delta Live tables to update the data in real-time. Furthermore, they could use the Anomalo API for Databricks Workflows to automatically initiate Anomalo checks and validate whether they have passed or failed before running additional jobs. The combination provides trust in every layer of their stack.
In conclusion, Databricks Unity Catalog is a powerful data catalog that provides a comprehensive set of features for managing and governing data assets. Its tight integration with Delta Lake and support for Delta Live Tables make it a compelling choice for organizations that require fast, scalable, and reliable data processing. And now, with Anomalo’s robust data quality features enabled directly into the UC Data Explorer UI, users can ensure that their data is trustworthy and accurate—with even less context-switching between tools.
This is only the beginning. We’ll have two more deep integrations into UC coming this summer that we cannot wait to tell you more about. To learn more, contact a member of the Databricks or Anomalo team.