Skip to content 🎉 Download a free copy of our book: Automating Data Quality Monitoring

How retailers use Databricks and Anomalo to deliver an accurate next-gen CX

Personalization you can trust

How do you engage today’s fickle, demanding, distracted customer? By investing in a next-generation customer experience, often abbreviated as “next-gen CX.” Essentially, this means leveraging recent technical innovations, ranging from chatbots to AI to sensors. Arguably the most critical component is personalization, that is, adapting the experience to what you know about someone.

When it works, it’s a win-win: customers like you, spend more, and are more likely to buy again. But making sense of all the data you could possibly use, and then ensuring that it’s accurate, is a daunting challenge. In fact, for all that’s been invested, the majority of US consumers think CX could be improved at most companies, the consequence of issues like fragmented customer data, siloes, and legacy systems that can’t deliver data in real-time.

This consumer disappointment is your opportunity. You can join the ranks of personalization leaders by maximizing automation and leveraging lots of data. You can stay there by making sure your data is trustworthy.

Here’s how Databricks and Anomalo combine to make a rock-solid foundation for sales-boosting, brand-building personalization.

Retailers use Databricks to understand their customers and deliver personalized experiences

We’re big fans of Databricks (and it seems the feeling is mutual). Retailers are too, because Databricks’ Data Intelligence Platform for Retail combines their industry-leading data governance with innovative built-in intelligence. In plain language, Databricks can take in and manage a whole lot of data and let you use it in sophisticated ways, such as precise personalization for customers.

Databricks does this by combining data from any number and kind of zero-, first- and third-party data sources to maintain a 360-degree real-time view of customers’ past activities, current behaviors, and even likely next steps. That third-party data is easy to incorporate with the streamlined Databricks Marketplace, allowing retailers to minimize the expense and hassle of integrating data from other sources. Instead, they can focus on leveraging all this data to create the experiences that make customers feel good and spend more.

So what’s the catch? Personalization is only as good as the data that goes into it. And with retail, there’s a whole lot that can go wrong.

AI is transforming every aspect of retail, from improving the efficiency of distribution centers through powering real-time personalized experiences with consumers. But AI is 90% the data feeding the engine, and it’s only as good as the quality of that data. What is exciting about Anomalo is their focus on cost-effective monitoring of large volume, ever-changing data. Automation is key and why some of our leading customers rely on Anomalo.

– Rob Saker, Global VP Retail & Manufacturing


Anomalo exists because of retail data quality challenges

This topic is particularly important to us. Our CEO and CTO are former Instacart execs who founded Anomalo because too much retail data is unpredictably erroneous.

Of all the retail types, grocery is probably the most complex type of retail in terms of data. Instacart wrestles with an extra order of complexity by ingesting data from many different grocery stores and the various vendors they use to manage their data. But every type of retailer these days manages so many different first- and third-party data sources, from warehousing to weather APIs.

Personalization is built on top of this data. Everything from inventory to purchase history to profit margins influences the products, messaging, and offers that a customer sees. To truly trust automated personalization, you have to trust the data you’re using— for all its benefits, automation’s scale also multiplies the impact of bad data.

On an individual level, a customer who’s been steered to a product that can’t be delivered or been congratulated for the wrong birthday will lose esteem for you and be that much likely to shop a competitor. Not only can more individuals be let down quicker than ever, but the problems can compound when generative AI and machine learning base their iterative processes on faulty data.

Personalization requires accuracy

Any part of a data-driven retail organization, from supply chain optimization to demand forecasting, will benefit from thorough data quality monitoring that alerts you to issues before you notice them. Today we’ll focus on personalization because that’s where incorrect data risks damaging your relationship with your most important and least forgiving stakeholder: the customer.

There is no personalization without data. From buyer behavior to real-world conditions, it’s data that informs the automated decisions that go into today’s marketing and on-platform messaging. Some examples include:

  • Recommendations. Machine learning has gotten very good at suggesting products by comparing your purchase (and browsing) history with others.
  • Coupons. Offer discounts on frequently purchased items they’ve already bought or that you suspect they’ll like. These can compel a shopper to return to you rather than try a competitor.
  • Bundling. Suggest the purchase of multiple items at once in assortments that make sense to the customer as you know them. For instance, a grocer might present both lighter fluid and beer to a middle-aged suburban man with charcoal in his cart.
  • Communications. Your marketers might write in a different tone for a 47-year-old woman in Ohio versus a 22-year-old Texan man. Data will inform which copy and graphics to send to whom.
  • Ad targeting. Personalization can extend well beyond your site and email list. Ads placed through networks can remind shoppers what’s in their cart, suggest what they might like based on demographics, or even offer discounts to lapsed customers.
  • Support. Adjust prioritization, flexibility, and even appeasement amounts based on the customer’s past behavior.

Shoppers respond to personalization done right by spending more and feeling good about it. According to a BCG study, “when the shopping experience was highly personalized, customers indicated that they were 110% more likely to add additional items to their baskets and 40% more likely to spend more than they had planned.” They also gave a 20% higher Net Promoter Score.

It takes a whole lot of data to do it right. And just one or a few bad data points to mess it up. Here are just a few examples of the sorts of errors that might go unidentified and end up degrading the value of personalization:

  • Suggesting out-of-stock items. The manufacturer accidentally defines a case of pasta sauce as having 120 items, not 12. Since stores order by the case, the inventory management system that ingests this metadata thinks every store has way more stock on hand than they do. The recommendation engine continues to suggest this pasta sauce to people with spaghetti in their cart because the flag to repress suggestions for out-of-stock items doesn’t trigger. Hundreds of orders across the grocery chain end up with a substitution or no sauce at all.
  • Unnecessary discounts. A cart logging system experiences degraded performance, so its timestamps are recorded after the transaction is completed. With no transaction subsequent to the incorrect timestamps, it appears that these users have abandoned their cart. The retention engine sends discount codes to reliable customers, hurting margins.
  • Mistargeted campaigns. Dates are stored in the original table as integers offset from a reference date. After an acquisition, when customer data is transferred from one platform to another, the birth date column is converted into date format with the wrong reference date. Nobody notices until college students get emails about senior discounts.

Data quality monitoring identifies fires before the flames spread

So how do you evaluate and detect problems? Traditional rules-based data quality monitoring approaches simply don’t scale across data at the scale used by today’s retailers. It’s even harder to ensure that third-party data is useful, because there are so many more things that can break when data changes hands.

Anomalo can help prevent all of the above issues. With unsupervised machine learning (UML), it’s constantly analyzing every dataset for things that just don’t look right: values that are uncommonly high or low, a lot more or fewer entries in a given timespan than normal, and so on. This approach can even let you know when there’s something wrong with either the content or delivery of third-party data, such as from the Databricks Marketplace. For example, in the advertising example described above, customer birth dates could have been sourced from a third-party provider. Because Anomalo is using UML to identify outlying trends in the data, it can spot issues despite knowing nothing about the data’s provenance or history.

UML is a game-changer for data quality in two main ways. The first is straightforward: it scales effortlessly so it can be applied to every single one of your datasets. The second is less intuitive: because it is looking for unexpected changes of any sort, it can detect a wider range of issues than manual tests that monitor adherence to specific expectations.

To reference the examples above, Anomalo could detect:

  • when the stock of a certain item is suddenly much higher than it ever has been
  • when the rate of apparently abandoned carts jumps
  • when average customer age increases by decades

Anomalo’s near-instant alerts include context, so data teams can quickly triage and address issues as they emerge. In some circumstances, such as the birthday goof, errors can be remedied before there’s any impact on personalization. With others their impact can be minimized by being addressed sooner, such as the timestamping delays due to degraded performance.

Because of the tight integration with Databricks, Anomalo is simple to implement and natural to use. Simply connect Anomalo to Databricks and select which tables or views you want to monitor, no extensive configuration required. A native integration with Unity Catalog (UC), Databricks’ unified governance solution, allows customers to efficiently run hourly observability checks across their entire Databricks Data Intelligence Platform. In addition, Anomalo pushes check results directly into UC’s Data Explorer to serve as trust signals, and it can parallel UC’s access groups to ensure unified access across both products. Finally, this is all available via Databricks Partner Connect, offering Databricks customers an exclusive and convenient free trial of the Anomalo platform.

Personalization is core to next-generation retail CX. Such an important function is largely automated with human guidance and monitoring. It’s time to treat data quality monitoring the same way. Talk to Anomalo about how you can protect both your brand and your margins by ensuring that what you present to customers is based on accurate data.

Get Started

Meet with our expert team and learn how Anomalo can help you achieve high data quality with less effort.

Request a Demo