How Data Quality Monitoring Caught a Spammer

Data quality monitoring protected a company with millions of subscribers from spam abuse
December 7, 2021
TL;DR Protecting their network from spammers didn’t just keep this content platform’s subscribers happy and informed. It kept their publishers in business. Anomalo data quality monitoring assisted in the detection of a wide-reaching user integrity problem, allowing them to implement processes to better protect publisher revenue and business credibility for their entire network.

In this post, we’ll share how one content platform(referred to as “customer” so they may remain anonymous) used Anomalo to identify and stop spammers from abusing their services and potentially blocking significant revenue to their publishers.

With a near constant threat of bad actors looking for ways to exploit bandwidth for their own nefarious dealings, the customer knew the potential for spam abuse could not only lock up their network but could affect the profitability and livelihoods of their network of publishers. They needed an automated method of monitoring their systems.

<bolderbold>How spammers can affect the bottom line<bolderbold>

A company that sends millions of emails every week, with high open and engagement rates, is a rich target for spammers seeking to make easy money.

For example, a spammer could create a new account on their content platform, start a new “publication”, upload the email addresses of their “audience,” and begin sending “content” that is a thinly veiled advertisement for whatever the spammer is hawking.

Spam emails get reported through the email client to the ISP and can affect the deliverability of other Platform emails

Receiving spam is not only a bad experience for consumers, but it can harm a publisher’s ability to deliver their real content. When consumers receive spam email, they report them to Internet Service Providers (ISPs), which flags the customer’s IP address for sending spam. This causes the ISP to de-prioritize other genuine content emails from that IP address.

Ultimately, they could suffer brand damage and loss of customers.

One measure to reduce spam that the customer put into place was to cap the number of accounts a new user could create per day while still meeting the needs of their prolific publishers. Unfortunately, persistent spammers quickly learned these limits and attempted to operate just beneath them.

Because detecting unusual events that occurred in their email data was a high priority for their data team, they turned to Anomalo to look for and immediately report these events before they had a chance to proliferate. Using the Entity Outlier check (for more details, see the Detecting Extreme Data Events post), they began monitoring the maximum number of publications created by new users:

The maximum number of publications created by users on a given day spiked to 20 on multiple days, well above the predicted upper bound

On 2021–04–22, they were alerted that a user had started exactly 20 publications, which had happened several times over the prior weeks.

Anomalo identified the offending user_id as 35037445:

User_id 35037445 created 4x the number of publications (red bar) than the next most active user (grey bar), and 25% more than our model predicted as an upper bound (green line)

Further, Anomalo provided a detailed root cause analyses that profiled these 20 new publications against the rest of the population:

All the publications created by this user were in French and had the same anonymous_id and user_agent string

This root cause analysis helped them to identify that these publications were likely from a spammer. They were all French, from the same user agent, and most happened in a single visit.

The customer’s data team was able to immediately review the content user ID 35037445 was publishing and manually flag them as a spammer in their internal systems:

A screenshot of an internal tool the customer used to ban this specific user account

However, the spammer continued to create many new accounts like this, each with slightly different email or spoofed IP addresses, while sending out the same shoddy content. This became evident when the trend in the number of publications created per user was still increasing:

The trend in the maximum number of publications created per day by user, removing seasonality and noise components, steadily increased.

The customer’s data team quickly organized several work streams to better protect their platform from spammers, including:

  • More sophisticated account verification processes
  • Advanced email and list validation algorithms
  • Other proprietary innovations and processes

Through the implementation of data monitoring and new processes to analyze anomalous data, they were able to mitigate the risks imposed by potential spammers to their network of users and publishers. While the platform continues to experience rapid growth and an overall increase in volume, the maximum number of publications created per user per day has begun to normalize.

The overall trend in the maximum number of publications created per user has begun to normalize

The efforts of the customer’s data team and their implementation of Anomalo data quality monitoring to mitigate spammers have clearly paid off.

In addition to reducing the impact of spammers, the customer protects thousands of tables in their Snowflake data warehouse by implementing Anomalo to:

  • Ensure their data arrives on time and is complete
  • Detect missing or corrupted data in key tables
  • Identify duplicate data or unexpected distribution changes

The Anomalo UI allows fast-moving and rapidly growing companies to leverage intelligent data monitoring to ensure the data they collect is of high quality and flagged when unexpected or unusual events occur. By automating the querying of a data warehouse like Snowflake, results can be easily summarized and notifications will be sent for unusual behavior through Slack or Microsoft Teams.

This is just one example of how our customer used Anomalo to identify bad actors who were sending spam that annoyed consumers and could have negatively impacted their business. What kind of issues could be found in your data? Request a demo to find out.

Written By
Jeremy Stanley
Try Anomalo with your team for free.
Lorem ipsum dolor sit amet, cour adipiscing elit ullam congue.
More Ideas and Announcements