December 6, 2022
Trust takes years to build and seconds to break—a phrase we often hear applied to relationships, but what about data? The story is the same. The moment you have some bad data appear in an important analytics report, for example, leadership starts to wonder whether your business can really rely on the data you’re using to make decisions. And this mistrust ends up hurting the adoption of data-driven strategies and tools.
At the DataEngBytes conference in Melbourne, data engineers Akira Takihara Wang and Xinwei Jiang spoke about their experience using Anomalo to promote data trust in their company, Afterpay. Afterpay is a buy now, pay later platform based in Australia that was acquired by Block in early 2022. In this article, we’ll recap the Afterpay team’s framework for data quality and how they scaled their data quality monitoring efforts across their entire organization.
You can also watch the full video of Akira and Xinwei’s talk to learn more about Afterpay’s overall data framework:
Afterpay uses a system of six dimensions to make it clear who owns which specific aspects of data quality. This division of labor enables data engineers and consumers to focus on the domains they know best. In the next section, we’ll see how Anomalo provides visibility into these dimensions.
These six dimensions are split across two main user groups: data engineers and data consumers. Data engineers tend to own the operational dimensions, whereas data consumers own the business-focused dimensions of data quality. The dimensions are:
Operational dimensions (data engineers)
1. Timeliness: Whether data is refreshed and available on time
2. Completeness: Whether all data is ingested into the database
3. Uniqueness: Whether the primary key column contains duplicate values
Business dimensions (data consumers)
4. Accuracy: Whether the data match the expected values for a given field
5. Consistency: Whether the data are anomalous over time
Shared (data engineers and data consumers)
6. Validity: Operationally, whether the data are of the correct type or not null, and business-wise, whether the data satisfy checks against business logic
Even though there are discrete responsibilities, the Afterpay team believes in democratizing access to data quality monitoring. Xinwei elaborated, “In our framework, all the metrics and data quality checks are transparent to our users. If we keep all the metrics within the engineering team, then in the future all the requests will come to the engineering team. This is not scalable over time.”
Next, Afterpay’s data engineers shared how Anomalo helps them achieve a centralized and transparent data quality monitoring solution that covers all six of their data quality dimensions.
Thanks to Anomalo’s simple UI and powerful monitoring features, the platform supports both engineers and data consumers in maintaining data quality. Depending on their familiarity and preferences, some Afterpay team members use Anomalo’s no-code UI while others take advantage of the API for programmatic management of data quality efforts. In addition to creating and updating their own data quality checks programmatically, the data engineering team uses the Anomalo API to track aggregated results and stay up to date with all the monitoring rules that data consumers across the company are setting up.
Afterpay uses Anomalo to track multiple types of data quality indicators:
A number of Anomalo’s other features help integrate the platform into Afterpay’s workflows. For instance, Afterpay configures the frequency at which Anomalo runs data quality checks and the channels through which alerting takes place. Furthermore, Afterpay uses an ETL pipeline to store data quality check results and make sense of them in the context of the business’ overall data. When data quality issues arise, Anomalo’s root cause analysis makes it much easier to resolve problems. Overall, Anomalo “perfectly fits our purpose,” Xinwei explained.
Afterpay uses a weighted score to measure how well their data discovery and data trust efforts are doing. The ratio of failed checks to the total number of data quality checks represents the data trust component of the weighted score. While we didn’t cover Afterpay’s data discovery efforts in this article, the full talk dives deeper into how the company uses a platform called Amundsen to this end.
In just four months, the Afterpay team has achieved remarkable results. Between Anomalo and Amundsen, Afterpay was able to address 30% of their data-related questions in a self-serve capacity. Anomalo has made data quality metrics transparent across all team members at Afterpay, which is foundational to trustworthy, and therefore useful, data stack. To learn more about using Anomalo in your own company, request a demo today.