Blog

Chapter 8: Towards a self-driving data future

June 10, 2026

By Team Anomalo

Home
Blog
Chapter 8: Towards a self-driving data future

Welcome to “Use AI to modernize your data quality strategy,” a series spotlighting insights from our O’Reilly book, Automating Data Quality Monitoring. This post corresponds to Chapter 8: Operating Your Solution At Scale.

The previous seven chapters of Automating Data Quality Management described how you can build a data quality monitoring solution from the ground up, with unsupervised machine learning at its core. In short, the plan is: Choose the best-fit platform, deploy in a way that suits your needs, and have an eye towards improvement in the long term. Each of these steps has a big impact on the success of your data governance program and on the accuracy of your business data.

As you make implementation decisions in the new self-driving data age, you’ll also start seeing opportunities to leverage autonomous data agents in your workflow. Agentic data monitoring means more easy wins, less fuss, and greater transparency. And the decisions you make at the beginning of your data monitoring journey directly influence your tool’s agentic capabilities.

Build or buy?

We sell a data monitoring platform, so obviously, we have a horse in the race. That said, it’s certainly possible to build your own, and some have good reasons for doing so.

As with much of software development these days, there are open source packages you can use, so you don’t have to start from scratch. The two major advantages to building your own tools are greater control and minimal budget outlay. If you’re dealing with a limited amount of data, or have specific requirements such as a consistent internal UX or integration with bespoke data infrastructure, it might be worthwhile.

But — and you knew there’d be a but — anyone who’s tried to put together their own business software will tell you it’s not as easy or straightforward as it might seem, even if you shortcut the process by starting with an open source package. You’ll be responsible not only for the initial design and build, but for all of the support, maintenance, onboarding, and training. As time goes on, you’ll need to constantly update libraries, apply security patches, build new integrations, and adapt to evolving business needs.

Once you have built your data quality solution, you will need to ensure it is easy to adopt and use consistently across your organization. Different teams may have their own custom data quality solutions and integrating with them to provide consistent visibility may be challenging.

In the long run, you’re likely to question whether saving the expense of a purchased solution was worth all the engineering hours, not to mention the impact of any data quality issues your homegrown system may have missed.

Another potential decision factor here is agentic data monitoring. With self-driving data now a reality, it’s only a matter of time before agentic insights and analysis are non-negotiable capabilities in your data monitoring solution. Are you prepared to build, test, and maintain these components yourself?

Rather than adding another, large project to your stack of things to do, you could go with a vendor. There are many solutions to choose from, each backed by dedicated engineering, research, and support resources. They’re available faster, because you don’t have to wait for development to start, proceed, and end before you can begin onboarding.

One note: we didn’t mention data security or privacy as a reason to go in-house. Nowadays, fully in-VPC or on-premises vendor deployments are available for companies that demand their sensitive data not leave their environment, enabling a high standard for security.

Onboarding and rollout plans

You should also think about how you’ll work your tool into your organization’s systems, budget, and culture. There’ll be lots of switches to toggle, parameters to set, and workflows to modify, and the decisions you make here will dictate your ongoing data warehouse costs and compute needs. These decisions become even more important if you want to take advantage of the many benefits offered by an agentic data platform.

Table coverage

While you could thoroughly monitor every table your catalog knows about, we suggest starting with the most important data to get results quickly and help build trust in your process. Doing a bit of legwork up front to prioritize your coverage can accelerate adoption and demonstrate the value of quality data to your organization.

To start, ask the people who work directly with data which tables merit attention. This might be fairly obvious at a small company, but at a bigger one you’ll want a structure to make use of that information, which could be as simple as allowing individuals to set up monitoring themselves. You can also look at SQL query logs to determine which tables, columns, and segments are frequently queried, and hence what to look at more closely.

Monitoring vs. not monitoring is likely not a binary decision. Many platforms, such as Anomalo, offer multiple degrees of monitoring. For the long tail of infrequently queried tables, metadata monitoring is cheap and easy, and might be more than sufficient. You can always dial monitoring up or down over time, as well.

Time horizon

If possible, keep your monitoring to the most recent data. As we’ve explained over several chapters, Anomalo’s best practice is to do a daily comparison of only data from the previous day to a baseline. That said, not all tables capture when data has been added or modified; an example is updated-in-place tables, where values are often edited within an existing row rather than appended in a new row. Refer to Chapter 5 for strategies for these cases.

Configuration approach

We recommend thoughtful automation. Once you get your footing as to how your new platform works, use API hooks to configure very similar tables at scale. Resist the temptation to bulk-configure big bunches of heterogenous datasets; if you rush, you risk creating more headaches than you save.

It’s also important to have the option of configuring monitoring via the UI, whether for adding a net new table or for making monitoring adjustments to a previously configured table. A user-friendly UI helps democratize data governance and allows non-technical subject matter experts to contribute to data monitoring efforts.

User onboarding

You’ll also want to make sure your teams know how to make the most of your new data quality monitoring platform. Even with a relatively intuitive system (if we do say so ourselves) such as Anomalo, you’ll see a lot more success when engineers, analysts, and others understand the potential.

The right approach to user onboarding depends on your company’s size and culture. In a smaller, more collegial company, a few live sessions, weekly office hours, and broad access controls might fit. For a large organization in a regulated industry with several divisions, this probably looks like on-demand curriculum, dedicated support staff, and role- and team-based access controls. Ease of onboarding is often another reason to buy a solution over a DIY approach: the vendor is likely to have much of the material and best practices you’ll need for deployment ready for you to use.

Don’t just maintain, improve

Just like a gym membership, buying into a data monitoring platform won’t do much on its own. Your teams need structure and discipline to take advantage of your new platform’s benefits, and to improve how they use it over time. Unlike your gym routine, when done right, a solid data quality practice reduces how much you sweat.

Here are a few practices that will instill some rigor into your process. Ultimately, these practices help your teams spend less time worrying about data quality and more time using data to drive important outcomes.

Document your procedures. Create a runbook that standardizes the processes for onboarding users, and for triaging and addressing data issues. Teams may want to customize the data quality solution for their own needs, and that’s great: adaptation correlates strongly with adoption.
Define ownership. There’s little point in monitoring a table for quality if it’s unclear who’s supposed to address any issues with that table. See Chapter 6 for advice on this topic, and reach out for a demo of Anomalo’s Data Documentation Agent, which provides a frictionless way to build out table documentation.
Build out your data infrastructure. As you see the value of high-quality monitoring, you may find other points of your data stack to be weak links. Improve them!
Establish internal norms. Create expectations of the timeliness of data delivery and how quickly issues of various severity will be resolved.
Create data quality dashboards. Let senior leadership—and everyone else—see what your platform has been finding. Patterns and spikes might indicate systemic issues that individual alerts may miss.

Thanks for coming with us on this data quality journey, and check out our previous chapters if you haven’t had a chance to read them! We’ve shared a lot about how we do what we do because we believe in democratizing data quality. We want you to know that there’s a path forward to more reliable, trustworthy data. Now you’re ready to take the next step.

Book

Ready to Trust Your Data? Let’s Get Started

Meet with our team to see how Anomalo transforms data quality from a challenge into a competitive edge.

Request a Demo

Chapter 8: Towards a self-driving data future

Build or buy?

Onboarding and rollout plans

Table coverage

Time horizon

Configuration approach

User onboarding

Don’t just maintain, improve

Related Resources

Blog

Chapter 7: Integrations multiply the power of your autonomous data tools

Blog

Chapter 6: High-quality notifications bring the right information to the right people at the right time

Blog

Why Data Quality Without Unsupervised Machine Learning Leaves Results on the Table

Ready to Trust Your Data? Let’s Get Started