Facebook’s spam filter blocked the most popular articles about its 50m user breach


When news broke yesterday that Facebook had suffered a breach affecting at least 50,000,000 stories, Facebook users (understandably) began to widely share links to articles about the breach.

The articles were so widely and quickly shared that they triggered
Facebook’s spam filters, which blocked the most popular stories about
the breach, including an AP story and a Guardian story.

There’s no reason to think that Facebook intentionally suppressed
embarrassing news about its own business. Rather, this is a cautionary
tale about the consequences of content filtering on big platforms.

Facebook’s spam filter is concerned primarily with stopping spam, not
with allowing through storm-of-the-century breaking news headlines that
everyone wants to share. On a daily basis, Facebook gets millions of
spams and (statistically) zero stories so salient that every Facebook
user shares them at once. Any kind of sanity-check on a spam filter that
allowed through things that appeared to be breaking news would
represent a crack in Facebook’s spam defenses that would let through
much more spam than legitimate everywhere-at-once stories, because those
stories almost never occur, while spam happens every second of every
minute of every hour of every day.

And yet, storm-of-the-century stories are incredibly important (by
definition) and losing our ability to discuss them – or having that
ability compromised by having to wait hours for Facebook to discover,
diagnose and repair the problem – is a very high price to pay.

It’s a problem with the same underlying mechanics as the incident in which a man was sent an image of his mother’s grave decorated with dancing cartoon characters and party balloons
on the anniversary of her funeral. Facebook sends you these annual
reminders a year after you post an image that attracts a lot of “likes”
and images that attract a lot of likes are far more likely to be happy
news than they are to be your mother’s tombstone. You only bury your
mother once, while you celebrate personal victories repeatedly.

So cartoon characters on your mother’s grave is a corner-case; an
outlier, just like a spam filter suppressing a story about a breach of
50,000,000 Facebook accounts. But they are incredibly important
outliers, outliers that the system should never, ever miss.

It may not ever be possible to design a system with two billion users
that doesn’t involve these kinds of outliers: a one-in-a-billion outlier
in a system with two billion users will happen twice a day, on average.
We don’t really know how to design a system that can address the
majority of cases and also every one-in-a-billion corner-case.

But the answer shouldn’t be to shrug our shoulders and give up. If it’s
impossible to run a system for two billion users without committing
grave, unforgivable sins on a daily basis, then we shouldn’t have
systems with two billion users.

Unfortunately, the rising chorus of calls for the platforms to filter their users are trapped in the idea that the platforms can fix their problems – not that the platforms are
the problems. Filtering for harassment will inevitably end up filtering
out many discussions of harassment itself, in which survivors of
harassment are telling their stories and getting support. Same goes for
filtering for copyright infringement, libel, “extremist content” and
other “bad speech” (including a lot of speech that I personally find
distasteful and never want to see in my own online sessions).

It’s totally true that filtering doesn’t scale up to billion-user
platforms – which isn’t to say that we should abandon our attempts to
have civil and civilized online discussions, but that the problem may
never be solved until we cut the platforms down to manageable scales.


It’s astounding to me how much the issues presented in this story are complete non-issues on functional websites that 1: Only show you content from people you follow and 2: That’s it. Even tumblr manages this better than facebook – I’ve never received spam on tumblr, and I’ve never received an image of my mothers gravestone decorated with cartoon animals, because none of the people I follow post spam and none of the people I follow create automated decorated pictures. These issues are not because facebook has two billion users, it’s because facebook has the audacity to believe they can build automatic systems that serve two billion users.

EDIT: Actually, correction: “can build automated systems at all.” You can’t automate this shit for a user base of five people either. It’s not that they have too many users to automate, it’s that they automate at all.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s