Skip to content 🎉 Download a free copy of our book: Automating Data Quality Monitoring
CASE STUDIES

How Branch Achieved a Stable Data Operation with Automated Data Quality Monitoring

October 24, 2023

Meaningful Stats

  • 10+ million records and counting covered by automated daily monitoring
  • 5 hours saved on average per data quality incident from automated root-cause analysis in Anomalo.
  • 3+ FTE resources saved from automated anomaly detection in place of exhaustively writing data validation rules.

Request a Demo

10M+

Records and counting covered by automated daily monitoring

5 hrs

Saved on average per data quality incident from automated root-cause analysis in Anomalo

3+

FTE resources saved from automated anomaly detection in place of exhaustively writing data validation rules

“We literally went from nothing to having something that was automated and kicking out alerts to us.” No longer was his team constantly reacting to data concerns elsewhere in the company. With a new, proactive approach, the team has achieved a stable state “in a completely different spot from [where they were] a year ago.”

Carson Wilshire, MBA - Sr. Analytics Manager

The Challenge

Data is one of Branch’s key enterprise assets, informing decisions on everything from how to set insurance rates to how to target marketing campaigns. To do their jobs, salespeople, actuaries, support staff, and others all depend on various forms of data, be it customer data like demographics, or other data from first- and third-party systems.

To that end, Branch had built up a modern data stack equipped to serve these needs at scale. With a growing community of members, Branch was careful to select tools that could handle large volumes of data—for example, their Offers dataset alone contained millions of records. The company went with cloud-forward vendors like Fivetran and Rivery for their ELT pipeline, GitHub for version control, and Google BigQuery for the all-important data warehouse. Even so, Branch found itself constantly firefighting the inevitable issues that plague big data systems. Their data stack was incomplete with respect to data quality monitoring.

Branch’s data quality issues tended to follow a pattern: an internal stakeholder who depends on data to make a decision would flag an anomaly with Branch’s 25-person data team. That team, with limited resources, would have to scramble to unblock the data consumer, who’s already lost a degree of trust in the data.

When Carson Wilshire stepped in to lead Branch’s data operations, the data quality problem stood out as a top priority. As he put it, “I knew that it would be unsustainable for us to do [data quality monitoring] manually. It was on my roadmap almost from day one. From a data architecture infrastructure standpoint, I wanted what I would call physical analysis or automated testing on our warehouse itself, so we were alerted to things in an automated fashion.”

In short, Branch needed to augment their data stack with a proactive approach to data quality—one that could identify anomalies automatically and before end-users found them.

The Solution

The naive solution to catching data quality problems before they affect end users would be to write out checks and use software tools to enforce those standards. As Carson and his team recognized however, this was impractical at scale. It’s hard to anticipate many data discrepancies, and even for the predictable ones, it’s infeasible to list every potential validation rule for millions of records split over several tables.

Instead, Branch turned to Anomalo to introduce automation into the data quality monitoring process. With machine learning at its core, Anomalo doesn’t require enumerating validation rules to do its job. Rather, the platform scans historical data and intelligently alerts the appropriate stakeholders when there’s a statistically significant abnormality in new data. Along with root-cause detection features, Anomalo made it possible for Branch’s data team to identify and resolve data issues well before they affected downstream consumers.

Since Branch joins together data from first- and third-party sources in its ELT pipeline, there are several junctures where data quality might be compromised, which Carson calls “Frankenstein points.” His team has been intentional about using Anomalo to mitigate issues at these critical stages in the data journey. “When we’re bringing a bunch of different datasets together whether it be external or internal, that’s where we sit Anomalo strategically to make sure those key datasets are fresh, accurate, and as expected,” Carson said.

While Branch leans on Anomalo most of all for its automated monitoring, the company also uses Anomalo for running custom checks. With support for both automated and rules-based data quality monitoring, Anomalo enables Branch to have high coverage of their datasets while ensuring the most sensitive ones get the additional oversight from validation rules.

Today, Branch’s data incident response protocol looks a lot different than it once was. A group of just five people keep an eye on a dedicated Slack channel for Anomalo alerts. From there, it’s relatively straightforward to dive into Anomalo and discern the cause, leading to a swift resolution. Rarely now do issues escalate beyond Carson’s team.

The Outcome

Anomalo exactly addressed Branch’s data quality needs.

No longer was his team constantly reacting to data concerns elsewhere in the company. With a new, proactive approach, the team has achieved a stable state “in a completely different spot from [where they were] a year ago.” That shift introduced material time savings that are precious to a growth-stage startup. Previously, the typical data escalation would not only be an interruption to an analyst’s day, but the process of understanding the specific issue and tracing its source could easily take five hours more. With Anomalo, the system is far less sporadic. Analysts don’t need to communicate back and forth with data consumers and it’s easy to view the probable root cause instead of having to trudge through the entire data pipeline.

In addition to saving analysts’ time, Anomalo let Carson keep his team lean. Legacy data quality monitoring products are grounded in rules-based checks, which don’t scale. About manually writing validation rules, Carson said, “I’d have to hire more folks to sit around and query data or build out all those testing scenarios ourselves and it would require us to bloat our team.” By opting for a modern, automated data quality strategy, Carson is instead able to redeploy headcount to add higher value elsewhere.

Ultimately, data exists to serve the end business—with elevated data quality through Anomalo, Branch has elevated overall trust in their data and the decisions that come from that. Internal stakeholders are rarely blocked on data issues because those problems get sorted well in advance. Being in the insurance business, Branch has also derived tremendous value from reduced regulatory risk now that their data is monitored more thoroughly.

Looking forward, Branch is intent on expanding their use of Anomalo to even more of their datasets. In the first stage of introducing data quality monitoring, Carson’s team prioritized internal datasets that other stakeholders used for business decisions. As for what’s next, Carson said, “I would like anything that goes external to have automated testing, alerts, monitoring, and statistical analysis. Anything that goes out the door should be complete, accurate, and 100%.” A year ago, setting such an ambition would have been at odds with the immediate fires Branch had to address. Now for the first time, that goal is actually within reach.