How Data Quality Monitoring Caught a Spammer
December 7, 2021
TL;DR Protecting their network from spammers didn’t just keep this content platform’s subscribers happy and informed. It kept their publishers in business. Anomalo data quality monitoring assisted in the detection of a wide-reaching user integrity problem, allowing them to implement processes to better protect publisher revenue and business credibility for their entire network.
In this post, we’ll share how one content platform(referred to as “customer” so they may remain anonymous) used Anomalo to identify and stop spammers from abusing their services and potentially blocking significant revenue to their publishers.
With a near constant threat of bad actors looking for ways to exploit bandwidth for their own nefarious dealings, the customer knew the potential for spam abuse could not only lock up their network but could affect the profitability and livelihoods of their network of publishers. They needed an automated method of monitoring their systems.
How spammers can affect the bottom line
A company that sends millions of emails every week, with high open and engagement rates, is a rich target for spammers seeking to make easy money.
For example, a spammer could create a new account on their content platform, start a new “publication”, upload the email addresses of their “audience,” and begin sending “content” that is a thinly veiled advertisement for whatever the spammer is hawking.
Receiving spam is not only a bad experience for consumers, but it can harm a publisher’s ability to deliver their real content. When consumers receive spam email, they report them to Internet Service Providers (ISPs), which flags the customer’s IP address for sending spam. This causes the ISP to de-prioritize other genuine content emails from that IP address.
Ultimately, they could suffer brand damage and loss of customers.
One measure to reduce spam that the customer put into place was to cap the number of accounts a new user could create per day while still meeting the needs of their prolific publishers. Unfortunately, persistent spammers quickly learned these limits and attempted to operate just beneath them.
Because detecting unusual events that occurred in their email data was a high priority for their data team, they turned to Anomalo to look for and immediately report these events before they had a chance to proliferate. Using the Entity Outlier check (for more details, see the
Detecting Extreme Data Events post), they began monitoring the maximum number of publications created by new users:
On 2021–04–22, they were alerted that a user had started exactly 20 publications, which had happened several times over the prior weeks.
Anomalo identified the offending user_id as 35037445:
Further, Anomalo provided a detailed root cause analyses that profiled these 20 new publications against the rest of the population:
This root cause analysis helped them to identify that these publications were likely from a spammer. They were all French, from the same user agent, and most happened in a single visit.
The customer’s data team was able to immediately review the content user ID 35037445 was publishing and manually flag them as a spammer in their internal systems:
However, the spammer continued to create many new accounts like this, each with slightly different email or spoofed IP addresses, while sending out the same shoddy content. This became evident when the trend in the number of publications created per user was still increasing:
The customer’s data team quickly organized several work streams to better protect their platform from spammers, including:
- More sophisticated account verification processes
- Advanced email and list validation algorithms
- Other proprietary innovations and processes
Through the implementation of data monitoring and new processes to analyze anomalous data, they were able to mitigate the risks imposed by potential spammers to their network of users and publishers. While the platform continues to experience rapid growth and an overall increase in volume, the maximum number of publications created per user per day has begun to normalize.
The efforts of the customer’s data team and their implementation of Anomalo data quality monitoring to mitigate spammers have clearly paid off.
In addition to reducing the impact of spammers, the customer protects thousands of tables in their Snowflake data warehouse by implementing Anomalo to:
- Ensure their data arrives on time and is complete
- Detect missing or corrupted data in key tables
- Identify duplicate data or unexpected distribution changes
The Anomalo UI allows fast-moving and rapidly growing companies to leverage intelligent data monitoring to ensure the data they collect is of high quality and flagged when unexpected or unusual events occur. By automating the querying of a data warehouse like Snowflake, results can be easily summarized and notifications will be sent for unusual behavior through Slack or Microsoft Teams.
This is just one example of how our customer used Anomalo to identify bad actors who were sending spam that annoyed consumers and could have negatively impacted their business. What kind of issues could be found in your data?
Request a demo to find out.