Blekko, Can I Please Have My Spam Back?

by Aaron Bradley on February 2, 2011

in Search Engines

Blekko Can I Please Have My Spam Back?

Blekko has declared war on Google spam, and has grandiosely demonstrated this by banning 20 sites from its index.

Leaving aside the sticky issue of what "spam" is, who determines if a site is worthy of the designation "spam?"   The crowd (through the collective wisdom of individuals that flag pages as "spam")?  The assessment of the crowd is great when what you're looking for is the lowest common denominator, such as the unpersonalized results for a one- or two-word search.  But what might be "spam" for one query might be a perfectly good match for another.  And, of course, what might be considered spam by one user might be considered perfectly good information by another.

The spam flag methodology is at best a blunt instrument, and at worst editorial hubris masquerading as a mathematically-based methodology.  This is particularly true of the ban list as it is currently implemented, where only a small number of big sites are being banned:  it is not, it appears, a mechanism that is being applied evenly against all content in the index.

But of course what's most disturbing about Blekko's move is that it doesn't simply demote sites but excludes them from the index altogether.  Those sites, along with connections they help form between pages and entities, is gone.  And looking just at eHow – as an example of a domain on Blekko's ban list – the information there is unique, even if it is not original (not to mention the inherent uniqueness of at least 150,000 videos).  This information is lost to all further inquiry.

One might reasonably ask why Blekko simply didn't incorporate their spam designation into their algorithm as a dampening factor on rankings, rather than eradicating certain high-profile sites from their index altogether.  Could it be that query-based content sites are simply too clever for any algorithm to effectively address, or might it rather be a case of a Google rival demonstrating that – unlike Google – it's "getting tough" on spam?

Blekko's "spam ban" can certainly be seen as an enthusiastic response to Vivek Wadhwa's call for a "clean up":

The bottom line is that we’re fighting a losing battle for the web and need alternative ways of finding the information that we need. I hope that Blekko and a new breed of startups fill this void: that they do to Google what Google did to the web in the late 90’s—clean up the spam and clutter.

Blekko's method of addressing "spam and clutter" has not been, to say the least, very nuanced, even if it has very publicly demonstrated that it ain't no spam-lovin' Google.  Adages about babies and bath water certainly come to mind.  And if this ban is an "innovation" it is certainly worth quoting Michael Martinez on the subject from a recent post:

The problem with innovation is that it requires you to find 2,000 ways NOT to make a light bulb before the light actually goes on and stays on. Meanwhile, you leave this swath of toxic ideas in your wake that threaten to destroy the environment.

At the end of the day, I have no respect for a search engine that censors my results based notions of quality, rather than relevancy.  It ceases to be comprehensive, it smacks of elitist righteousness and – most of all – decisions about the validity of content are being made on my behalf by people I don't know.

So Blekko, can I please have my spam back?

Update: If Blekko has intended to position itself as the spam-fighter extraordinaire of the search engine world, it certainly seems to have won some traction.  Here's a sampling of recent Twitter reactions to the ban.

Twitter Responses to Blekko's Ban on 20 Content Farm Sites

While Blekko may be winning the hearts and minds of spam-weary searchers, the recounted experience of at least one tweeter lends support to my view that blanket bans may not ultimately be in the best interests of searchers (something Marshall Kirkpatrick has discussed very eloquently in a recent piece on ReadWriteWeb):

MyArtisticHome on Blekko Banning eHow

And @LeonBlade, I like your style.  It's nice to be reminded that not everyone lives and breathes search engine lore.

LeonBlade Tweets that Censoring Searches is Stupid

{ 10 comments… read them below or add one }

1 Peter Bird February 4, 2011 at 12:08 am

I don’t know about spam, but for the search results I follow, the results in Blekko are just plain useless and crap compared to Google and Bing. My own sites rank well in all 3, but its the other sites that show up in Blekko are not close to being useful.

Reply

2 Marcos February 4, 2011 at 3:47 am

Actually, taking a hard stance on content farms by banning them altogether is a great way to do it. This will make any content farm think twice about what they are producing and ultimately will lead to higher quality results.

So some “good” content is lost to blekko users from banning those 20 sites, but think about how much collateral damage is caused by Google to genuinely unique sites whenever Google plays around with their algo.

Reply

3 Gary Stock February 4, 2011 at 8:29 am

Self-righteous exhortations against “editorial hubris” are all well and good… but that kinda misses the point of blocking spam. Any editor must exert control over content.

People have virtually no idea what’s on the web without asking a search engine. If a search engine observes a glut of unacceptable content — according to its own definition of “unacceptable,” which is subject to judgment over time in the marketplace — then it should be free to exclude that content. I would argue even that they have a business obligation to exclude that content.

I’m glad Blekko did something slightly radical. It’s easy to stand outside and criticize their decision — but when you’ve spent years buried in algorithms and business rules, you realize that direct human intervention is sometimes the only solution.

Reply

4 Searchengineman February 11, 2011 at 8:08 am

I think the ball is now in Mahalo’s (Demand Media’s) court – Jason Calacanis, must clean up the content farm model and concentrate on the Users experience. If a visitor is having a substandard experience and feels burned by the results page, I think banning is only a temporary measure. Why not implement a warning “Content Farm Site – Results may be Auto Generated!” or use a code differentiate sites that are determined to be mass Aggregators. This would protect the search engines and pretty much warn users what to expect.

Searchengineman

Reply

5 Spyros Papaspyropoulos February 14, 2011 at 1:55 pm

Did you check this out?

It looks like Google is doing the same thing, but it is giving the power to the users to choose what sites they want to block using a Google Chrome extension. This means that Google is really thinking of blocking sites. Maybe this could be the 1st step? Could this be implemented into each personalized Google account, affecting personal SERPs?

Cheers

Reply

6 Aaron Bradley February 14, 2011 at 2:15 pm

Yes, saw that announced today Spyros – and I’ve been going down some of the same conjectural lines as you. Data from the extension could indeed be used as a dampening factor in rankings for either personalized or non-personalized SERPs. I certainly hope, as per the gist of my post, that if this data is used it is not employed to blacklist sites from Google’s results. Indeed, this extension puts the power of site bans just in the place it should reside, in my opinion – with users.

Reply

7 Gail Gardner March 4, 2011 at 2:37 pm

I call b.s. on this “clean up the Internet” nonsense. I do NOT want my choices made for me by some paternalistic entity that believes they know best for me. This all started with Google’s CEO’s Internet Cesspool comment and is turning into outright censorship – which already exists. (I have many links about that saved.)

Speaking of saving links, I highly recommend that Internet users save a copy of the URLs of any sites they use regularly or might want to find again because the time is fast approaching when you won’t find them in any keyword searches. Right now you can sometimes only find them if you know the url and the content already.

I use Tomboy Notes for that purpose but it is slowing down so maybe it wasn’t intended for someone who keeps 2,586 notes and growing every day.

Reply

8 Gail Gardner March 4, 2011 at 2:47 pm

Another major fact: If you want to know how well crowd-sourcing works for spam you have only to look at the mess Akismet turned into – banning many of my most regular and active commentators (and me). That prompted me to do a survey of bloggers of what they considered spam.

The results and the comments were quite surprising. Some bloggers consider a comment from anyone they don’t already know spam, others report any comment they don’t happen to like as spam, some will flag as spam any comment that links to a site that sells anything no matter how legit, and a few said they only approve comments they like. See Spam or Not Spam poll and comments for more details.

Akismet probably hates me now because the GrowMap anti-spambot plugin Andy Bailey at CommentLuv wrote for me is spreading rapidly across the Blogosphere replacing Akismet in major blogs. There is a tab on my blog with details and it can be found in the official WordPress plugin repository by searching for GrowMap.

Reply

9 Stephen June 4, 2012 at 6:49 pm

I opened Fuze today and it said there was an important update to install and that the program would not work without it. I allowed the update and later, when I went to open the Chrome browser, suddenly my default home page and search engine was Blekko. Spam fighter?!? How about spammer of the worst kind. I thought the days of switching your home page or search engine without asking your permission were past but it appears the past has come back to haunt us with a Russian-backed search engine that installs crap on your machine without your knowledge. Blekko is Scammo.

Reply

10 Ben Oren October 14, 2013 at 10:30 am

By manually removing websites form their index the bekko show to their users that they do not have a sophisticated enough spam detectors in their algorithm to do this kinds of task efficiently.

Reply

Leave a Comment

Previous post:

Next post: