Towards a credible alternative to trackbacks and pingbacks

Opinion — I began running this blog a bit over a month now and I mentioned two aspects where WordPress — and other blogware — could be better in my initial post. Since then, I’ve gotten a chance to explore, among other blog-related technologies, the usage of XML feeds, the difference between technorati and del.icio.us when it comes to spreading a meme, and most importantly, trackbacks and pingbacks. Trackbacks and pingbacks are a great idea, but neither is a credible protocol that will last in the long run. Here are several critics to the two protocols I would like to share.

If you are a newbie to blogging, the purpose of trackbacks and pingbacks is to let a web site automatically link back to you when you mention it a story. As in: If I write a post on my blog and cite a post on your blog, your blog automatically mentions and links to my post.

My first concern about the two protocols is mainly related to trackbacks, in fact. Basically, you need to manually enter the urls you want to send trackback pings to, unless of course both ends feature trackback auto-discovery. When they do, pingbacks and trackbacks mostly differ in the protocol they use (simple http post on the one side, xml-rpc on the other). A proper, usable protocol should require no user input whatsoever, except for the manual insertion of an anchor.

Then, there is link spam. A trackback or a pingback will let link spammers insert links toward their customers’ web sites. As I mentioned in a previous column, the rel=nofollow attribute will not stop spam, but it will damage the web. Thus, end users have very limited choices here: Either they’ve decent spam-protection tools or they’re turning trackbacks and pingbacks off. A proper protocol should include supervised and unsupervised machine learning features that auto-reject unwanted pings.

Dead or irrelevant links also pose problems. Trackbacks and pingbacks should wither when the links die or are no longer relevant. Quite frankly, I’m amazed spammers still haven’t used this to promote their customers’ sites. You could imagine someone hiring a throng of chinese web surfers to trackback tens of blogs with more or less relevant comments. And redirect all these comments to doorway pages once the trackback pings are moderated. If anything, a proper protocol should auto-check and auto-wither links every once in a while.

Worse still, cloaking could make this problem even more nasty. You could imagine a spammer pretending to track your story, with a more or less relevant post. Then, memorize, say… the 10 first IP’s who click from your blog. Obviously, you’re likely to be in the lot. Any of these 10 IP’s will get a normal-looking web site, while everyone else would get, say… a porn site. A proper protocol should be cloak-proof.

Protocol-wise, there is the stateful nature of the trackback and pingback protocols, which generate unfriendly — not to say buggy — interfaces when it comes to post publishing. I’ve seen WordPress take minutes to publish posts several times, only to discover it did not manage to send every ping it was meant to send. A proper protocol should be stateless and should catch these faults with no user intervention.

Moreover, the active nature of the trackback and pingback protocols’ auto-discovery process is not relevant. If you publish a post, link to a site, and six month later the site implements pingbacks, a ping should occur at that time. Likewise, if you publish a post, and your web site fails to properly ping the web site you mention but stores you pinged it nevertheless there’s no easy way to correct this. A proper protocol should allow passive auto-discovery in addition to active auto-discovery.

Lastly, a proper protocol should process the related referrals and sort them by relevance. Referral spam set aside, the more traffic a link brings to your website, the greater the odds it is a relevant link. Assuming you only display the top-10 trackbacks, for instance, you wouldn’t care at all if a casino tracked your site. And you would eventually locate unauthoritative tracks, because they would bring you no traffic. Thus, a proper protocol should take a link’s authoritativeness into account at the very least.