Found this issue when vanity searching for ‘hwork‘, my bowdoin handed-out internet alias. Turns out that twitter maintains three (at least) different subdomains for each user that are all crawlable by google: twitter.com/hwork, http://m.twitter.com/hwork, and explore.twitter.com/hwork. The first is obviously the main site (they redirect all requests from www.twitter.com/* to twitter.com), the second is the mobile site (which they redirect you to if you connect via a mobile user-agent). The third I don’t recognize. But it’s literally the exact same content as the normal domain (maybe some differences based on your session data).
Anyways, the solution for Twitter to fix this is really simple. Just disallow all URLs via robots.txt on the m.twitter.com and explore.twitter.com domains. http://m.twitter.com/robots.txt should look like:
This should remove a ton of duplicate twitter urls from goog. According to the SEO for Firefox plugin, Ev’s regular twitter page as a pagerank of 7, while the mobile version of his page only has a pagerank of 4. Interestingly, ev’s explore domain page also has a pagerank of 7.
Update: Found another dupe content domain, here: http://api.twitter.com/hwork.