Anomalo Announces Table Observability and Lineage for Databricks
June 28, 2023
Modern businesses are built on data. For a company to successfully operate at scale, they need a way to manage their data at scale, too. That’s why so many organizations use data lakehouses like Databricks to efficiently store data for both machine learning and business intelligence applications.
And because companies use their data to drive decisions, their data needs to be of the highest quality. For this, companies use Anomalo’s data quality monitoring platform to identify and resolve inevitable issues like missing or corrupted data.
In this post, we’re excited to share expansions to our integration with Databricks and two new features of the Anomalo data quality monitoring platform. With the addition of table observability and lineage, customers can now monitor their entire Lakehouse in a manner of minutes. These new features allow customers to see the big picture of issues with data moving through your Lakehouse, and then zoom in as needed to better detect and understand drift in the contents of your dataset using our existing data quality monitoring and root-cause-analysis products.
Table observability and data lineage for Databricks
With table observability through Anomalo, Databricks customers have a lightweight and affordable way to monitor their data quality from a bird’s-eye view. We’ve introduced basic checks and monitoring that take minimal configuration and are compatible with both the Hive Metastore and Databricks’ own Unity Catalog.
Table observability leverages data from the Databricks Unity Catalog to detect base table changes on an hourly basis, answering questions like:
- Does the table exist?
- Have any columns been dropped?
- Has the table been recently updated?
- Is the row count as expected?
Anomalo not only answers these questions at a frequent cadence, but can also comprehensively check an entire Databricks Delta Lake without executing robust (and sometimes more expensive) queries on the underlying data.
Table observability works hand-in-hand with Anomalo’s new lineage capabilities for Databricks. When something goes wrong, Anomalo uses data from the Unity Catalog to construct a lineage graph and identify both upstream causes and downstream consequences of a data quality issue.
We’re excited to add these expanded capabilities to Anomalo existing integration with Databricks Partner Connect, offering Databricks customers an exclusive free trial of the Anomalo platform.
Comprehensive Data Quality for the Databricks Lakehouse
Much like how the Databricks Lakehouse Platform combines the best elements of data lakes and warehouses, Anomalo now delivers the best of table observability and deeper data quality monitoring. Both table observability and lineage are a perfect complement to Anomalo’s existing data quality monitoring platform, also available as a free trial through Databricks Partner Connect.
Having confidence in the health of your data pipelines with table observability is one thing, but it’s even more meaningful for Databricks customers to have confidence in the actual data flowing through the pipes. Of all the monitoring solutions on the market, Anomalo is unique in that it emphasizes looking directly at the contents of your data, not just the movement of your data. Using machine learning, Anomalo automatically inspects data for unexpected trends and can intelligently alert the relevant stakeholders.
Between high-level table observability, lineage support, and deeper ML-powered monitoring,
Anomalo offers teams a range of monitoring solutions that supports teams as they move through the Databricks Data and AI maturity curve:
Anomalo’s layered solution aligns with Databricks’ layered approach to organizing lakehouse data in a medallion architecture. With gold, silver, and bronze data tiers, Databricks tables have different levels of data quality needs. Our integration with Databricks Workflows makes it easy to automate both table observability and data quality checks at key stages in their orchestration DAGs. Better yet, thanks to Anomalo’s existing Unity Catalog Integration, Unity Catalog customers have the full context about their data quality from directly within the Databricks Data Explorer UI.
These launches add value for mutual customers of Databricks and Anomalo by elevating confidence in data. Anomalo acts as the lighthouse to a Databricks lakehouse, providing end-to-end monitoring at the appropriate level for all customers.
How to start using Databricks and Anomalo together
Anomalo is a member of Databricks Partner Connect, meaning Databricks customers can connect to Anomalo without leaving the Databricks UI. The integration also offers new users to Anomalo access to an exclusive free trial. Check out this step-by-step guide to get started.
We’re excited to continue our partnership with Databricks and deliver even more ways to effectively incorporate data quality monitoring into the modern data stack. For a personalized demo on how Anomalo can fit into your data stack, request a demo with a member of our team today.