Twitter Domain Duplicate Content Issues

Posted on March 06, 2009

Found this issue when vanity searching for ‘hwork‘, my bowdoin handed-out internet alias. Turns out that twitter maintains three (at least) different subdomains for each user that are all crawlable by google: twitter.com/hwork, http://m.twitter.com/hwork, and explore.twitter.com/hwork. The first is obviously the main site (they redirect all requests from www.twitter.com/* to twitter.com), the second is the mobile site (which they redirect you to if you connect via a mobile user-agent). The third I don’t recognize. But it’s literally the exact same content as the normal domain (maybe some differences based on your session data).

Anyways, the solution for Twitter to fix this is really simple. Just disallow all URLs via robots.txt on the m.twitter.com and explore.twitter.com domains. http://m.twitter.com/robots.txt should look like:

User-agent: *
Disallow: /

This should remove a ton of duplicate twitter urls from goog. According to the SEO for Firefox plugin, Ev’s regular twitter page as a pagerank of 7, while the mobile version of his page only has a pagerank of 4. Interestingly, ev’s explore domain page also has a pagerank of 7.

Update: Found another dupe content domain, here: http://api.twitter.com/hwork.

Trackbacks

Trackbacks are closed.

Comments

Comments are closed.

  1. Andy Beard Wed, 25 Mar 2009 13:18:47 UTC

    Robots.txt is a really bad tool for SEO – they would be much better off using meta noindex follow.

    Also please add a simple way for me to just report a spam link in Crunchbase, especially on incoming links in the sidebar.

  2. Henry Work Wed, 25 Mar 2009 16:02:09 UTC

    Hey Andy,

    Thanks for the headsup on robots.txt. It’s what Matt Cutts recommends (http://www.youtube.com/watch?v=nM2VDkXPt0I) and it seems to work fine with the webmaster console. Why do you recommend using noindex?

    Good idea on the spam link. Those urls are reviewed before they go live so they shouldn’t be spam once they get there anyways.

    -Henry