Archive for February, 2009

Published by nick on 24 Feb 2009

PHP vs Java vs C/C++ for web applications

From Incremental Operations Blog, this call stack shows the layers involved in a [typical?] Java stack.

To be fair, this is not necessarily Java itself, but poorly written Java code. But based on my experience, this type of excessive architecture is accepted best practice in the Java world, and the architecture astronauts seem to gravitate to this technology.

The original pipe dream of Java was "build once, run anywhere". Since that hasn’t exactly materialized, Java is the bastard step child of programming. It doesn’t really fit in anywhere. It is neither high performance nor robust (C/C++), nor easy to program in (PHP/Python/Ruby). It’s awkwardly stuck in the middle, and doesn’t do either well. If you need performance that exceeds native PHP/Python capabilities (rare in the typical work place), use a C/C++ extension for the heavy lifting, and if that’s not enough, your app is at the top 1% of performance demand, and you need to use C/C++ directly.

I’ve heard the defendants of java claim that’s faster than C/C++, but fundamentally that’s not possible, since it’s a layer on top of C/C++. Yes, JIT this and caching that, but these add complexity, which violates my #1 rule of software design, and if you added those same JIT and caching layers to C/C++, they would be even faster.

I will give Java the win in 2 areas of web based programming:

  1. Where development time and performance does not matter, and data integrity is the absolute most important factor. For example, stock trading and banking sites. If I was asked to build E*trade.com, I would use Java and Oracle instead of PHP and MySQL. It would take 5 times longer to do everything, and hardware/software costs would be 10x more, and the web site would be slower, but it would be the most robust solution.
  2. Where development time and performance does not matter, and there are advantages in maintaining advanced "state" information with transactions and rollbacks. For example - online poker sites and ticketmaster.com (advanced reservationing). I’m sure someone’s done that using Ajax, but I wouldn’t trust money flowing over such a system, and I’d recommend Java.

For the other 99.9% of web applications, scripting languages or C/C++ is a better choice, and the complexity that Java introduces is despicable, and in my opinion, making the choice for Java is doing a disservice to your company in terms of cost (both development time and hardware).

Show me a web application that scales well in Java, and I’ll rewrite it for in in PHP in half the time and it will be twice as fast and one more "9″ in availability. If it’s still not fast enough, it needs to be done in C.

I am not always popular with this argument. Quite a few of my developer peers, whom I respect, have strong pro-java arguments. I have a bet with one of these Java ninny’s - I think that Java will be less prevalent in 5 years than it is now, because of it’s excels-at-nothing nature. There is a bottle of expensive tequila riding on this, so I expect to be right. We’ll see. ;-)

Published by nick on 13 Feb 2009

Google Canonical Href - with Mediawiki

It’s time to unwind the giant mess of 301’s, meta tag, and robots.txt hacks that we have in place — all aimed and eliminating "duplicate" content for search engines. We now have a simple way to tell search engines what the canonical representation of a url. That’s the promise of the new canonical tag, and I think it will work. Here’s the syntax:
<link type="canonical" href="/the/trusted/url/of/the/page">

More info here:
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

And note that it is also supported by Yahoo! and MSN.

Why am I so excited about it? Because I implemented it at Wikia, who was Google’s "trusted user" (note that Google mentions starwars.wikia.com as one of their examples)

Mediawiki has a problem with duplicate content. First, it has "soft" redirects, where two articles with different urls can point to the same content (which Google labels as "duplicates"). I had previously written extensions for Mediawiki that turn these into "hard" redirects (by issuing a Location: header with a 301 redirect). This showed a positive uplift for SEO, but it always felt like a hack. The canonical tag is a far more elegant solution, and improves performance by reducing 301’s.

Second, there are many entry points into an article in mediawiki:

/wiki/Article_Name
/wiki/index.php/Article_Name
/wiki/index.php?title=Article+Name
/wiki/Article_Name?action=view

All of the above urls will produce the *exact* same content in Mediawiki, but search engines will treat them as different urls, which splits page rank and may introduce the infamous duplication penalty.

Both of these problems can be easily solved with the new canonical tag, and it’s quite elegant.

I’ve written a new Mediawiki Extension for supporting the google canonical href tag at Wikia. It’s open source, and available at Wikia’s SVN repo for all to use. I will be contributing it to the core mediwiki software as an extension soon. Update: Now available in the Wikimedia SVN repo

This is a big help outside of Mediawiki as well, take "printable" pages as an example, or even urls with extra parameters in the query string - the canonical tag can funnel all of the page rank into one version of the page.

Kudos to Google (esp. Matt Cutts), Yahoo!, and MSN on coming together to provide a clean and elegant solution to help fight the duplicate content problem.

UPDATE: I believe this is part of Mediawiki core now, so the extension shouldn’t be necessary

Published by nick on 11 Feb 2009

I only block for memcached

I only block for memcached tshirt

Want to know the magic secret to building scalable apps? Don’t have blocking calls. Think back to all the performance problems you’ve ever had. Chances are your app was waiting on one of the following:

  • Database
  • HTTP ( a remote web service)
  • File system (either local or remote [NFS])
  • Some other blocking service

Remove all these, and your app is fast! The one exception that has always treated me well is memcached. memcached in and of itself doesn’t block, and if you can write an app that only blocks on memcached, you’ve written a very well scaled app.

You may think I’m exaggerating. And maybe I am, to make a point. But I just did built an app that scales using memcached, and it powers all of Wikia’s ad traffic on two servers, handling approximately 500 transactions a second, and storing 7500 pieces of data per second.

Actually, one server can handle the load, I just use two for redundancy. ;-)

Scale with memcached, skip all the other blocking calls! Tip: Ajax isn’t blocking.

I liked this idea so much that I had T-shirts made that say "I only block for memcached". I’ve sold a couple to friends, and I will be giving one to Brad Fitzpatrick, the creator of memcached, as a Thank You from all of the users of memcached. Thanks Brad!

Optimize your ads with Liftium.com