Published by nick on 13 Feb 2009 at 11:18 am
Google Canonical Href - with Mediawiki
It’s time to unwind the giant mess of 301’s, meta tag, and robots.txt hacks that we have in place — all aimed and eliminating "duplicate" content for search engines. We now have a simple way to tell search engines what the canonical representation of a url. That’s the promise of the new canonical tag, and I think it will work. Here’s the syntax:
<link type="canonical" href="/the/trusted/url/of/the/page">
More info here:
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
And note that it is also supported by Yahoo! and MSN.
Why am I so excited about it? Because I implemented it at Wikia, who was Google’s "trusted user" (note that Google mentions starwars.wikia.com as one of their examples)
Mediawiki has a problem with duplicate content. First, it has "soft" redirects, where two articles with different urls can point to the same content (which Google labels as "duplicates"). I had previously written extensions for Mediawiki that turn these into "hard" redirects (by issuing a Location: header with a 301 redirect). This showed a positive uplift for SEO, but it always felt like a hack. The canonical tag is a far more elegant solution, and improves performance by reducing 301’s.
Second, there are many entry points into an article in mediawiki:
/wiki/Article_Name /wiki/index.php/Article_Name /wiki/index.php?title=Article+Name /wiki/Article_Name?action=view
All of the above urls will produce the *exact* same content in Mediawiki, but search engines will treat them as different urls, which splits page rank and may introduce the infamous duplication penalty.
Both of these problems can be easily solved with the new canonical tag, and it’s quite elegant.
I’ve written a new Mediawiki Extension for supporting the google canonical href tag at Wikia. It’s open source, and available at Wikia’s SVN repo for all to use. I will be contributing it to the core mediwiki software as an extension soon. Update: Now available in the Wikimedia SVN repo
This is a big help outside of Mediawiki as well, take "printable" pages as an example, or even urls with extra parameters in the query string - the canonical tag can funnel all of the page rank into one version of the page.
Kudos to Google (esp. Matt Cutts), Yahoo!, and MSN on coming together to provide a clean and elegant solution to help fight the duplicate content problem.
UPDATE: I believe this is part of Mediawiki core now, so the extension shouldn’t be necessary
Rena on 23 Aug 2009 at 9:34 am #
Hey Nick! I’m very interested in using this extension, but I can’t find it on the listing of MediaWiki’s approved extensions. Is there a problem with the extension? Thanks!
nick on 23 Aug 2009 at 4:45 pm #
It is available in Wikia’s public repo:
http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/CanonicalHref/CanonicalHref.php
Although I believe it’s going to be a in the Mediawiki core soon, if it’s not already.
-Nick
Jools on 18 Oct 2009 at 11:29 am #
Hi,
this does not work with mediawiki 1.16svn. I played around and this works (and seems a more compatible way to add a link to the template
[code]
$wgHooks[’BeforePageDisplay’][] = "canonicalHref";
$wgExtensionCredits[’specialpage’][] = array(
‘path’ => __FILE__,
‘name’ => ‘Canonical Href’,
‘author’ => ‘Nick Sullivan nick at wikia-inc.com’,
‘description’ => ‘This extension prints a link type="canonical" tag with a canonical representation of the url, which is used by Google, MSN, and Yahoo! to funnel PageRank’
);
function canonicalHref(&$out, &$skin){
$out->addLink(
array(
‘rel’ => ‘canonical’,
‘href’ => $skin->mTitle->getFullURL()
)
);
return true;
}
[/code]
Jools on 18 Oct 2009 at 12:50 pm #
I realise this blog is perhaps not the best place to include source. Anyway, I added a bug report to the bugzilla so i could attach a proper diff etc.
Please see
https://bugzilla.wikimedia.org/show_bug.cgi?id=21173
Jools on 20 Oct 2009 at 6:50 am #
Some discussion has taken place on the bugzilla link above. Seems the support in core only handles canonical urls for redirects, and not for other things, which means your examples above
/wiki/Article_Name
/wiki/index.php/Article_Name
/wiki/index.php?title=Article+Name
/wiki/Article_Name?action=view
still don’t contact a canonical url with 1.16. Please add your thoughts and comments to the thread.