Archive for the 'wikia' Category

Published by nick on 13 Feb 2009

Google Canonical Href - with Mediawiki

It’s time to unwind the giant mess of 301’s, meta tag, and robots.txt hacks that we have in place — all aimed and eliminating "duplicate" content for search engines. We now have a simple way to tell search engines what the canonical representation of a url. That’s the promise of the new canonical tag, and I think it will work. Here’s the syntax:
<link type="canonical" href="/the/trusted/url/of/the/page">

More info here:
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

And note that it is also supported by Yahoo! and MSN.

Why am I so excited about it? Because I implemented it at Wikia, who was Google’s "trusted user" (note that Google mentions starwars.wikia.com as one of their examples)

Mediawiki has a problem with duplicate content. First, it has "soft" redirects, where two articles with different urls can point to the same content (which Google labels as "duplicates"). I had previously written extensions for Mediawiki that turn these into "hard" redirects (by issuing a Location: header with a 301 redirect). This showed a positive uplift for SEO, but it always felt like a hack. The canonical tag is a far more elegant solution, and improves performance by reducing 301’s.

Second, there are many entry points into an article in mediawiki:

/wiki/Article_Name
/wiki/index.php/Article_Name
/wiki/index.php?title=Article+Name
/wiki/Article_Name?action=view

All of the above urls will produce the *exact* same content in Mediawiki, but search engines will treat them as different urls, which splits page rank and may introduce the infamous duplication penalty.

Both of these problems can be easily solved with the new canonical tag, and it’s quite elegant.

I’ve written a new Mediawiki Extension for supporting the google canonical href tag at Wikia. It’s open source, and available at Wikia’s SVN repo for all to use. I will be contributing it to the core mediwiki software as an extension soon. Update: Now available in the Wikimedia SVN repo

This is a big help outside of Mediawiki as well, take "printable" pages as an example, or even urls with extra parameters in the query string - the canonical tag can funnel all of the page rank into one version of the page.

Kudos to Google (esp. Matt Cutts), Yahoo!, and MSN on coming together to provide a clean and elegant solution to help fight the duplicate content problem.

Published by nick on 11 Feb 2009

I only block for memcached

I only block for memcached tshirt

Want to know the magic secret to building scalable apps? Don’t have blocking calls. Think back to all the performance problems you’ve ever had. Chances are your app was waiting on one of the following:

  • Database
  • HTTP ( a remote web service)
  • File system (either local or remote [NFS])
  • Some other blocking service

Remove all these, and your app is fast! The one exception that has always treated me well is memcached. memcached in and of itself doesn’t block, and if you can write an app that only blocks on memcached, you’ve written a very well scaled app.

You may think I’m exaggerating. And maybe I am, to make a point. But I just did built an app that scales using memcached, and it powers all of Wikia’s ad traffic on two servers, handling approximately 500 transactions a second, and storing 7500 pieces of data per second.

Actually, one server can handle the load, I just use two for redundancy. ;-)

Scale with memcached, skip all the other blocking calls! Tip: Ajax isn’t blocking.

I liked this idea so much that I had T-shirts made that say "I only block for memcached". I’ve sold a couple to friends, and I will be giving one to Brad Fitzpatrick, the creator of memcached, as a Thank You from all of the users of memcached. Thanks Brad!

Published by nick on 23 Sep 2008

Good bye OpenX. Hello Google Ad Manager.

Websites need ads. It’s one of the things that make the internet go ’round. That and porn.

If a startup wants to have a free solution for serving ads, there has really only been one choice for many years, OpenX, formerly known as phpAdsNew. OpenX has been at Wikia for quite some time. After hitting some brick walls with scalability, having downtime/slowness issues, and getting frustrated with basic functionality that work without taking down the server, I decided it was time to try something new.

I looked into Google Ad Manager over the past few days. It seems like it can do the job, and last night I wrote all the code. Today I switched all of the wikia.com websites from OpenX for serving Spotlight Ads to Google Ad Manager.

Here are the compelling reasons I found for switching.

  • OpenX is crap — It is possible to write high scale web applications in PHP/Mysql. I’ve done it, multiple times. OpenX has not. Sorry for being a bit arrogant here, but I will happily engage an OpenX architect and question numerous design decisions. As an example: Logging impressions to a relational database in real time is a horrible idea. Horrible. It will never scale. Telling people that the right way to solve this problem is by logging on the app servers? Even worse.
  • Google’s infrastructure — Even if OpenX wasn’t horrible, I still don’t want to have to worry about buying servers, system administration time, and bandwith for my ad infrastructure. I put more faith in Google’s and Yahoo!’s infrastructure than anything a startup can build.
  • It’s easier to use — I found the interface and code setup far more intutive than OpenX. So have the 4 other people that I’ve been working with to load ads. They love how simple Google Ad Manager is. That being said, there are a couple of less-than-intuitive things with Google Ad Manager, so it wasn’t completely painless. Maybe Apple needs to come out with iAdManager? :)
  • It’s Free — Ok. Did you guys hear that? Free. Free hosting of the graphics. Free server infrastructure. Estimates are that this will save Wikia.com $5000 a month in bandwidth and servers.

    Is this the death of OpenX? No. There are still some things that Google Ad Manager can’t do. There is also a bunch of technical weirdos that think Google has too much power, so they will continue to use OpenX out of fear.

    However - Google just flexed their muscle, and they pulled off a great first product. Good work Google.

    And if someone knows how to short OpenX stock, let me know, ;-)

Published by nick on 09 Jul 2008

Searchsig Social Search Panel Recap

I spoke on a panel for Social Search yesterday, representing Wikia Search. More info.

I enjoyed having an audience for Wikia Search to be demonstrated and displayed, particularly this audience — which I would consider the upper echelon of technology in Silicon Valley. I met a few intelligent people, and saw some familiar faces from Yahoo!. Overall I think it went well. I wasn’t aggressive enough about butting in; I didn’t speak much, but when I did, I think it made sense. DJcline.com covered the event and is going to be doing a write up in a few days, so we should see pictures and videos up there soon, I’ll update with a link when it is available.

Update: Here it is from Jason’s Ustream:

Broadcast by Ustream.TV

There was a big focus on monetization/advertising, which I found odd for a search technology conference. Perhaps a sign of what people are worried about? Reminds me of the big focus on monetization we saw right before the Web 1.0 bubble busted. Anyone remember the "B2B" craze when "B2C" fizzled? The cycle goes innovation->consolidation->innovation->consolidation->… I think we are entering a second round of consolidation. Time to buckle down?

It became obvious that the definition of "Social Search" was unclear. The panel was clearly bifurcated (as one audience member eloquently put it). With Facebook and Friendfeed (finding people) on one side, and then Wikia and Mahalo (community/people powered search), there were two camps, and neither one was truly "Social Search".

When this audience was thinking about Social Search, I think they were expecting for search results to be filtered based on what people similar to them were interested in. That sounds crazy to me. If I want my friends opinion on a local restaurant, I’ll just ask them. I don’t want a search engine to only search through my friends comments. No offense to my friends — but I want their input *and* everyone elses.

I think the valley is still trying to figure out what Social Search really is. It’s probably better to label Wikia as "Community Powered Search" than a "Social Search". We are focused on improving algorithmic search results with people’s input. Same for Mahalo, who was also on the panel (represented by their CEO, Jason Calacanis). I think Mahalo is the closest thing Wikia Search has to a competitor right now. They have a hybrid Wikipedia/Google approach, with paid editors.

I think the folks in the audience were entertained, engaged, and learned some stuff, but at least a few came expecting something different than what the panel had to offer.

Jason from Mahalo lived up to his reputation. He led out the panel publicly bashing Jimmy Wales based on personal issues, and it was up to me retort. I took a deep breath, rose above, and focused on what the crowd came to listen to — search and technology — not Ego battling. He stopped with the insults and we got down to business, but kept up the cynicism/aggression, at one point telling Facebook that they will never be able to monetize their traffic through advertising. Audacious!

I sympathize with him — he must spend a lot of effort and energy justifying to himself and/or investors why Mahalo his better than Wikia Search. Wikia Search must be viewed as a big threat, so his insecurity manifests itself in odd ways. I’m told that he has a past reputation for his attitude and this is "just how he is, it’s not personal". Best of luck to him with this approach. Sometimes I wish I was more bold, brash, and outspoken. It might be a good way to get attention; and it certainly made the panel more lively. ;-) Jason — thank you for providing the entertainment.

It was fun. It was good for me to be in the hot seat, and the preparation I went through ahead of time did help focus me and think through a lot of challenges that lay before us at Wikia, including:

  1. Spam control
  2. Reputation/Quality of ratings
  3. Openness and transparency

Special thanks to Robert and Safa Rashtchy for putting the event together, it was a blast.

Published by nick on 07 Jul 2008

Wikia and Wikia Search in a nutshell

So what is Wikia? What are they up to? Here’s my perspective and opinion as an employee. Wikia was founded a couple of years ago by Jimmy Wales, the founder of Wikipedia. Wikia is a separate company from Wikipedia, even though the name is close and they share a founder.

Wikia’s core business model is to build community sites based on the Wiki concept. We all know Wikipedia - Wikipedia focuses on encyclopedic level knowledge of a subject, Wikia goes further and gives the community a place for detailed information about each subject.

For example, I love the TV show Family Guy. It’s great. Now from Wikipedia’s perspective, the information on the Family Guy Wikipedia page should include:

  • Characters and descriptions
  • What network the show is on
  • How long it’s been running
  • Brief Staff credits (important actors, director, creator, etc)
  • Any cultural impacts the show has had
  • Criticisms

You know, encyclopedia worthy stuff. However, Wikipedia doesn’t want it to become a fan page. Wikipedia discourages the use of trivia on their articles, and they don’t want detailed accountings of every Family Guy episode. This is where Wikia steps in. When there is a community around a particular concept, and it exceeds what is worthy of Encyclopedic content, Wikia provides that community with a way to share all of this information very thoroughly, with a site that is themed appropriately. End users are encouraged to provide in-depth information about the topic for the world to see.

Some notable examples:

  • familyguy.wikia.com - Gather around Spooner Street for the best collection of useless information on Family Guy
  • muppet.wikia.com - a wiki dedicate to every thing Muppets. This particular wiki is co-maintained by one of the dedicated Wikia product folks. Go Danny!
  • www.wowwiki.com - a thorough World of Warcraft wiki - this is the 2nd largest Wiki in the world, after Wikipedia.

Now Wikia is also working on Wikia Search.

As a preface - I think that the idea of closed source ranking algorithms are destined for extinction. See a previous post on why I think the community will replace GoogleNote: I wrote this before I worked for Wikia, and before I knew they had a search. We need an open and transparent solution for web search. I don’t know about you guys, but whenever one company grows too powerful and omnipotent, I have these visions of a Big Brother slapping me on the shoulder and telling me what I do and don’t like. I’m not going to name names. *Cough* Microsoft *Cough*, but let’s just say no one likes Big Brother.

On the flip side, the trend we’re seeing with successful web companies is openness, and this will continue. One of the best and most obvious example is Facebook and their API. Facebook has successfully distanced itself from the competition by enabling users to build applications on top of their platform. Hats off to them.

Psst. Rumor is that Yahoo is working on something similar to enable developers to use Yahoo data and infrastructure to build applications. Shh.

Good companies are open companies. Jimmy Wales likes to say that Wikia Search is a political statement, and in some ways it is. Wikia is saying that search should be an open, transparent effort that that is controlled and managed by the community.

For example, on Wikia Search, if I do a search and the results don’t make sense, I can change them. If their is spam, I can remove it. We’ve all done searches where we get back a page that is a link farm. With Wikia, you just remove the result.

For a great demo of this, watch this video:

Often times the 1st result in Google is a good result. But sometimes it’s the 2nd, 3rd, or even 20th result that is the best for a particular query. If you find this, shouldn’t you be able to make this the first result for someone else? Wikia thinks so.

I’ll wrap it up with this — Organizing the Web’s information should not be controlled by one company, but it should be controlled by the community as a whole, and Wikia is looking to enable developers and users to do that through their platform — and in a nutshell, that’s what Wikia is up to.

Published by nick on 14 Apr 2008

Going to Wikia

I left Yahoo recently, Friday was my last day. After taking some time off for family travel, I’m going to work for Wikia.com. Wikia is building niche community sites on the wiki concept, using the Mediawiki software.

They are also working on Wikia Search, a community driven search engine and crawling effort. If Jimmy Wales and Wikia can do what they did for Wikipedia, and apply that concept to search, I think has a decent chance of taking on Google