Blekko has declared war on Google spam, and has grandiosely demonstrated this by banning 20 sites from its index.
Leaving aside the sticky issue of what "spam" is, who determines if a site is worthy of the designation "spam?" The crowd (through the collective wisdom of individuals that flag pages as "spam")? The assessment of the crowd is great when what you're looking for is the lowest common denominator, such as the unpersonalized results for a one- or two-word search. But what might be "spam" for one query might be a perfectly good match for another. And, of course, what might be considered spam by one user might be considered perfectly good information by another.
The spam flag methodology is at best a blunt instrument, and at worst editorial hubris masquerading as a mathematically-based methodology. This is particularly true of the ban list as it is currently implemented, where only a small number of big sites are being banned: it is not, it appears, a mechanism that is being applied evenly against all content in the index.
But of course what's most disturbing about Blekko's move is that it doesn't simply demote sites but excludes them from the index altogether. Those sites, along with connections they help form between pages and entities, is gone. And looking just at eHow – as an example of a domain on Blekko's ban list – the information there is unique, even if it is not original (not to mention the inherent uniqueness of at least 150,000 videos). This information is lost to all further inquiry.
One might reasonably ask why Blekko simply didn't incorporate their spam designation into their algorithm as a dampening factor on rankings, rather than eradicating certain high-profile sites from their index altogether. Could it be that query-based content sites are simply too clever for any algorithm to effectively address, or might it rather be a case of a Google rival demonstrating that – unlike Google – it's "getting tough" on spam?
Blekko's "spam ban" can certainly be seen as an enthusiastic response to Vivek Wadhwa's call for a "clean up":
The bottom line is that we’re fighting a losing battle for the web and need alternative ways of finding the information that we need. I hope that Blekko and a new breed of startups fill this void: that they do to Google what Google did to the web in the late 90’s—clean up the spam and clutter.
Blekko's method of addressing "spam and clutter" has not been, to say the least, very nuanced, even if it has very publicly demonstrated that it ain't no spam-lovin' Google. Adages about babies and bath water certainly come to mind. And if this ban is an "innovation" it is certainly worth quoting Michael Martinez on the subject from a recent post:
The problem with innovation is that it requires you to find 2,000 ways NOT to make a light bulb before the light actually goes on and stays on. Meanwhile, you leave this swath of toxic ideas in your wake that threaten to destroy the environment.
At the end of the day, I have no respect for a search engine that censors my results based notions of quality, rather than relevancy. It ceases to be comprehensive, it smacks of elitist righteousness and – most of all – decisions about the validity of content are being made on my behalf by people I don't know.
So Blekko, can I please have my spam back?
Update: If Blekko has intended to position itself as the spam-fighter extraordinaire of the search engine world, it certainly seems to have won some traction. Here's a sampling of recent Twitter reactions to the ban.
While Blekko may be winning the hearts and minds of spam-weary searchers, the recounted experience of at least one tweeter lends support to my view that blanket bans may not ultimately be in the best interests of searchers (something Marshall Kirkpatrick has discussed very eloquently in a recent piece on ReadWriteWeb):
And @LeonBlade, I like your style. It's nice to be reminded that not everyone lives and breathes search engine lore.