Published by nick on 11 Feb 2009

I only block for memcached

I only block for memcached tshirt

Want to know the magic secret to building scalable apps? Don’t have blocking calls. Think back to all the performance problems you’ve ever had. Chances are your app was waiting on one of the following:

  • Database
  • HTTP ( a remote web service)
  • File system (either local or remote [NFS])
  • Some other blocking service

Remove all these, and your app is fast! The one exception that has always treated me well is memcached. memcached in and of itself doesn’t block, and if you can write an app that only blocks on memcached, you’ve written a very well scaled app.

You may think I’m exaggerating. And maybe I am, to make a point. But I just did built an app that scales using memcached, and it powers all of Wikia’s ad traffic on two servers, handling approximately 500 transactions a second, and storing 7500 pieces of data per second.

Actually, one server can handle the load, I just use two for redundancy. ;-)

Scale with memcached, skip all the other blocking calls! Tip: Ajax isn’t blocking.

I liked this idea so much that I had T-shirts made that say "I only block for memcached". I’ve sold a couple to friends, and I will be giving one to Brad Fitzpatrick, the creator of memcached, as a Thank You from all of the users of memcached. Thanks Brad!

Published by nick on 19 Jan 2009

Automatically figure out which social bookmarking site to use with css

With dozens of them, displaying them all is rediculous (although I’ve seen it). Why not just display the ones that the user visits?

This write up (from someone I work with at Wikia) explains how - by using the CSS for "visited" links.

http://www.azarask.in/blog/post/socialhistoryjs/

A bit creepy, but interesting enough to pass along.

Published by nick on 13 Jan 2009

Ultimate vimrc file - Good for php, bash, ruby and others

I’ve built this one up for a few years, and now it’s time to share. Notable features:

  • Syntax highlighting
  • Test compile for syntax errors. This means that every time you write the file, it will check the file for syntax errors and alert you immediately. This saves much back and forth with development. It works with the following languages:
    1. php
    2. bash
    3. perl
    4. httpd.conf
    5. xml
    6. ruby
    7. puppet
    8. javascript (if you have jslint installed)
  • Tab completion of php functions (if you download Rasmus’s function list and put it in your home directory. curl -o ~/.phpfunclist.txt -v http://lerdorf.com/funclist.txt)

    Ok, enough teasing. The ultimate vimrc file can be found here

Published by nick on 07 Nov 2008

Apple - Please put Wireless Broadband in every MacBook

Dear Steve Jobs,

Right now I use Verizon Wireless broadband. It’s better than not having it, but I would much prefer to have it built into my Macbook Pro. Right now I have external USB device, and while it works, it is clunky at best.

Please partner with AT&T, so that I pay one bill, and get my iPhone and wireless Broadband together. Do your usual routine where you make it easy to use, painless, intuitive, etc.

It shouldn’t be that difficult to pull off, as you already have 3G iPhones available. They already have wireless broadband built into laptops in Europe, check there for more info.

Thanks,

Loyal Apple Customer

Published by nick on 23 Sep 2008

Good bye OpenX. Hello Google Ad Manager.

Websites need ads. It’s one of the things that make the internet go ’round. That and porn.

If a startup wants to have a free solution for serving ads, there has really only been one choice for many years, OpenX, formerly known as phpAdsNew. OpenX has been at Wikia for quite some time. After hitting some brick walls with scalability, having downtime/slowness issues, and getting frustrated with basic functionality that work without taking down the server, I decided it was time to try something new.

I looked into Google Ad Manager over the past few days. It seems like it can do the job, and last night I wrote all the code. Today I switched all of the wikia.com websites from OpenX for serving Spotlight Ads to Google Ad Manager.

Here are the compelling reasons I found for switching.

  • OpenX is crap — It is possible to write high scale web applications in PHP/Mysql. I’ve done it, multiple times. OpenX has not. Sorry for being a bit arrogant here, but I will happily engage an OpenX architect and question numerous design decisions. As an example: Logging impressions to a relational database in real time is a horrible idea. Horrible. It will never scale. Telling people that the right way to solve this problem is by logging on the app servers? Even worse.
  • Google’s infrastructure — Even if OpenX wasn’t horrible, I still don’t want to have to worry about buying servers, system administration time, and bandwith for my ad infrastructure. I put more faith in Google’s and Yahoo!’s infrastructure than anything a startup can build.
  • It’s easier to use — I found the interface and code setup far more intutive than OpenX. So have the 4 other people that I’ve been working with to load ads. They love how simple Google Ad Manager is. That being said, there are a couple of less-than-intuitive things with Google Ad Manager, so it wasn’t completely painless. Maybe Apple needs to come out with iAdManager? :)
  • It’s Free — Ok. Did you guys hear that? Free. Free hosting of the graphics. Free server infrastructure. Estimates are that this will save Wikia.com $5000 a month in bandwidth and servers.

    Is this the death of OpenX? No. There are still some things that Google Ad Manager can’t do. There is also a bunch of technical weirdos that think Google has too much power, so they will continue to use OpenX out of fear.

    However - Google just flexed their muscle, and they pulled off a great first product. Good work Google.

    And if someone knows how to short OpenX stock, let me know, ;-)

Published by nick on 07 Sep 2008

Apple: When are you going to make a Game Console?

PS3 and Xbox 360 are ok, but suffer from some major usability issues. Stuff doesn’t work as it should. It’s obvious to me as someone who uses Apple products regularly that the game console world could benefit from Apple influence.

Loyal apple fans would gladly line of for a console if you made one.

Please put that on the list.

Published by nick on 01 Aug 2008

PHP Performance tip: require versus require_once

One of the big performance oriented complaints with PHP is that it doesn’t do well with large frameworks that have a lot of included files. Symfony and Mediawiki are two that I’ve had this problem with.

Why is it slow to load a lot of files in PHP?

Let’s take a closer look.

Quick note: In this post I’ll assume you are already using a PHP Accelerator, such as APC, or Turk MMCache or eaccelerator. If not, you need to be. My personal pick is APC, mostly because it’s the preferred one at Yahoo!, which has the largest installation of PHP, and the author of PHP works there. The lead maintainer of APC also works there, so I feel good knowing that APC is well supported. There are rumors that PHP 6 will have this accelerator built in, and that it will be based on APC’s code.

With that PHP accelerator plug out of the way, let’s get back to business.

Normally when php does a require to include a file, it does a stat to see if the file has changed, and if not, loads it from the APC cache. Here’s what that looks like at the C level:

 * stat64("./classes/Class1.php", {st_mode=S_IFREG|0644, st_size=2057, ...}) = 0

Tip: Want to know how to look at what code is doing at the C level? Check out this tutorial on using strace to debug web apps

Nice, simple, clean, one stat per file. Note: with APC, you can set apc.stat to off, and this will skip the above stat call as well. The downside: You have to restart apache whenever you change your code.

Now let’s take a look at what happens when you use require_once instead of require:

 * lstat64("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki/include_test", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki/include_test/classes", {st_mode=S_IFDIR|0755, st_size=20480, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki/include_test/classes/Class1.php", {st_mode=S_IFREG|0644, st_size=2058, ...}) = 0

That’s one stat for each directory. It does this for every single file you include. With require_once, php must call realpath (at the C level) to know what the actual path of the file is. Otherwise, it won’t know if require_once '../../../mydir/Class.php'; is the same as require_once '../mydir/Class.php';

Note that it also must do this for every directory in your include_path, so if you don’t have that set up correctly, this is exacerbated even more. Each one of these stats is a system call that takes time. More work for your servers and slower responses for your users.

Theory: The extra stats required for require_once and include_once introduce a lot of overhead for applications that include a lot of files.

A real world test — At Wikia, we had a common include file that was loading all of our Mediawiki extensions. It had 113 calls to require_once and 172 calls to include_once. By changing these to require and include respectively, the results were significant.

First, strace revealed that there were 2848 syscalls to serve a page, down from 4782, (-40%). Next I went to ab for more testing, and found that the average page request time went down to 36.5, from 46.5ms (-22%), and the server was able to serve 27.1 requests per second, up from 21.4 (+%22) . View the complete output from ab

Conclusion: require_once does not perform as well as require. Don’t use require_once unless you need it. require will save system calls and deliver pages faster to end users. This also applies to the include/include_once counterparts.

It would be great if someone would write up a tool that walked through your code base and made recommendations for these types of performance tweaks. Hmm….

Published by nick on 09 Jul 2008

Searchsig Social Search Panel Recap

I spoke on a panel for Social Search yesterday, representing Wikia Search. More info.

I enjoyed having an audience for Wikia Search to be demonstrated and displayed, particularly this audience — which I would consider the upper echelon of technology in Silicon Valley. I met a few intelligent people, and saw some familiar faces from Yahoo!. Overall I think it went well. I wasn’t aggressive enough about butting in; I didn’t speak much, but when I did, I think it made sense. DJcline.com covered the event and is going to be doing a write up in a few days, so we should see pictures and videos up there soon, I’ll update with a link when it is available.

Update: Here it is from Jason’s Ustream:

Broadcast by Ustream.TV

There was a big focus on monetization/advertising, which I found odd for a search technology conference. Perhaps a sign of what people are worried about? Reminds me of the big focus on monetization we saw right before the Web 1.0 bubble busted. Anyone remember the "B2B" craze when "B2C" fizzled? The cycle goes innovation->consolidation->innovation->consolidation->… I think we are entering a second round of consolidation. Time to buckle down?

It became obvious that the definition of "Social Search" was unclear. The panel was clearly bifurcated (as one audience member eloquently put it). With Facebook and Friendfeed (finding people) on one side, and then Wikia and Mahalo (community/people powered search), there were two camps, and neither one was truly "Social Search".

When this audience was thinking about Social Search, I think they were expecting for search results to be filtered based on what people similar to them were interested in. That sounds crazy to me. If I want my friends opinion on a local restaurant, I’ll just ask them. I don’t want a search engine to only search through my friends comments. No offense to my friends — but I want their input *and* everyone elses.

I think the valley is still trying to figure out what Social Search really is. It’s probably better to label Wikia as "Community Powered Search" than a "Social Search". We are focused on improving algorithmic search results with people’s input. Same for Mahalo, who was also on the panel (represented by their CEO, Jason Calacanis). I think Mahalo is the closest thing Wikia Search has to a competitor right now. They have a hybrid Wikipedia/Google approach, with paid editors.

I think the folks in the audience were entertained, engaged, and learned some stuff, but at least a few came expecting something different than what the panel had to offer.

Jason from Mahalo lived up to his reputation. He led out the panel publicly bashing Jimmy Wales based on personal issues, and it was up to me retort. I took a deep breath, rose above, and focused on what the crowd came to listen to — search and technology — not Ego battling. He stopped with the insults and we got down to business, but kept up the cynicism/aggression, at one point telling Facebook that they will never be able to monetize their traffic through advertising. Audacious!

I sympathize with him — he must spend a lot of effort and energy justifying to himself and/or investors why Mahalo his better than Wikia Search. Wikia Search must be viewed as a big threat, so his insecurity manifests itself in odd ways. I’m told that he has a past reputation for his attitude and this is "just how he is, it’s not personal". Best of luck to him with this approach. Sometimes I wish I was more bold, brash, and outspoken. It might be a good way to get attention; and it certainly made the panel more lively. ;-) Jason — thank you for providing the entertainment.

It was fun. It was good for me to be in the hot seat, and the preparation I went through ahead of time did help focus me and think through a lot of challenges that lay before us at Wikia, including:

  1. Spam control
  2. Reputation/Quality of ratings
  3. Openness and transparency

Special thanks to Robert and Safa Rashtchy for putting the event together, it was a blast.

Published by nick on 07 Jul 2008

Wikia and Wikia Search in a nutshell

So what is Wikia? What are they up to? Here’s my perspective and opinion as an employee. Wikia was founded a couple of years ago by Jimmy Wales, the founder of Wikipedia. Wikia is a separate company from Wikipedia, even though the name is close and they share a founder.

Wikia’s core business model is to build community sites based on the Wiki concept. We all know Wikipedia - Wikipedia focuses on encyclopedic level knowledge of a subject, Wikia goes further and gives the community a place for detailed information about each subject.

For example, I love the TV show Family Guy. It’s great. Now from Wikipedia’s perspective, the information on the Family Guy Wikipedia page should include:

  • Characters and descriptions
  • What network the show is on
  • How long it’s been running
  • Brief Staff credits (important actors, director, creator, etc)
  • Any cultural impacts the show has had
  • Criticisms

You know, encyclopedia worthy stuff. However, Wikipedia doesn’t want it to become a fan page. Wikipedia discourages the use of trivia on their articles, and they don’t want detailed accountings of every Family Guy episode. This is where Wikia steps in. When there is a community around a particular concept, and it exceeds what is worthy of Encyclopedic content, Wikia provides that community with a way to share all of this information very thoroughly, with a site that is themed appropriately. End users are encouraged to provide in-depth information about the topic for the world to see.

Some notable examples:

  • familyguy.wikia.com - Gather around Spooner Street for the best collection of useless information on Family Guy
  • muppet.wikia.com - a wiki dedicate to every thing Muppets. This particular wiki is co-maintained by one of the dedicated Wikia product folks. Go Danny!
  • www.wowwiki.com - a thorough World of Warcraft wiki - this is the 2nd largest Wiki in the world, after Wikipedia.

Now Wikia is also working on Wikia Search.

As a preface - I think that the idea of closed source ranking algorithms are destined for extinction. See a previous post on why I think the community will replace GoogleNote: I wrote this before I worked for Wikia, and before I knew they had a search. We need an open and transparent solution for web search. I don’t know about you guys, but whenever one company grows too powerful and omnipotent, I have these visions of a Big Brother slapping me on the shoulder and telling me what I do and don’t like. I’m not going to name names. *Cough* Microsoft *Cough*, but let’s just say no one likes Big Brother.

On the flip side, the trend we’re seeing with successful web companies is openness, and this will continue. One of the best and most obvious example is Facebook and their API. Facebook has successfully distanced itself from the competition by enabling users to build applications on top of their platform. Hats off to them.

Psst. Rumor is that Yahoo is working on something similar to enable developers to use Yahoo data and infrastructure to build applications. Shh.

Good companies are open companies. Jimmy Wales likes to say that Wikia Search is a political statement, and in some ways it is. Wikia is saying that search should be an open, transparent effort that that is controlled and managed by the community.

For example, on Wikia Search, if I do a search and the results don’t make sense, I can change them. If their is spam, I can remove it. We’ve all done searches where we get back a page that is a link farm. With Wikia, you just remove the result.

For a great demo of this, watch this video:

Often times the 1st result in Google is a good result. But sometimes it’s the 2nd, 3rd, or even 20th result that is the best for a particular query. If you find this, shouldn’t you be able to make this the first result for someone else? Wikia thinks so.

I’ll wrap it up with this — Organizing the Web’s information should not be controlled by one company, but it should be controlled by the community as a whole, and Wikia is looking to enable developers and users to do that through their platform — and in a nutshell, that’s what Wikia is up to.

Published by nick on 20 Jun 2008

Howto - Simple backups for Linux using rsnapshot

Backups are something you must master to be a great system administrator.

You’e probably found this because you were looking for a simple backup solution. Yes, you’ve seen Amanda. And Bacula, but they aren’t simple. Amanda and Bacula are great products if you need all of their features — and if you are like me, I don’t want to spend time with my backups, I just want something that works.

My choice — rsnapshot. rsnapshot is a perl script that wraps around rsync. It’s most beautiful feature: it uses hard-links when it can, so if you are backing up the same file more than once, it just creates a link. This means backups only take up more space if the files change. I’ve heard that this is how Apple’s Time Machine works. I’m now using rsnapshot in multiple production environments. Here’s a quick how-to guide for how to set up reliable, robust, efficient, self-rotating backups in just a few minutes with rsnapshot.

  1. Install rsnapshot, either by downloading/compiling the source code, or using this RPM for linux
  2. Edit /etc/rsnapshot.conf for your settings. Warning: The config file makes a distinction between tabs and spaces. Make sure you use tabs! Pay special attention to these settings:
    1. snapshot_root — this is where your backups will be stored.
    2. The backup section will define which directories you want backed up.
      <br /> backup /etc/ localhost/<br /> backup /root/ localhost/<br /> backup /home/ localhost/<br />
    3. rsnapshot handles the rotation of backups for you! The interval section will define how many daily, weekly, etc. backups are kept. Example:
      <br /> interval hourly 6<br /> interval daily 7<br /> interval weekly 4<br /> interval monthly 3<br />
  3. As a test, you can run: rsnapshot -v -t hourly and this will parse the config file, and show you the commands it will run when it runs hourly.
  4. After you are done tweaking the config file, it’s time to add the crontab entries for the various backups. Scheduling is a bit tricky. Here’s mine as an example:
    <br /> # There is a pid file that will preven two from running at the same time.<br /> # This is why hourly starts after the others. Hourly should be skipped when daily/weekly/monthly is running.<br /> 19 */3 * * * nice rsnapshot -v hourly<br /> 18 1 * * * rsnapshot -v daily<br /> 17 2 * * 0 rsnapshot -v weekly<br /> 16 3 1 * * rsnapshot -v monthly<br />

That’s it! You’re ready to go. Your backups will be stored in rsnapshot_root.

« Prev - Next »

Optimize your ads with Liftium.com