Archive for the 'php' Category

Published by nick on 24 Feb 2009

PHP vs Java vs C/C++ for web applications

From Incremental Operations Blog, this call stack shows the layers involved in a [typical?] Java stack.

To be fair, this is not necessarily Java itself, but poorly written Java code. But based on my experience, this type of excessive architecture is accepted best practice in the Java world, and the architecture astronauts seem to gravitate to this technology.

The original pipe dream of Java was "build once, run anywhere". Since that hasn’t exactly materialized, Java is the bastard step child of programming. It doesn’t really fit in anywhere. It is neither high performance nor robust (C/C++), nor easy to program in (PHP/Python/Ruby). It’s awkwardly stuck in the middle, and doesn’t do either well. If you need performance that exceeds native PHP/Python capabilities (rare in the typical work place), use a C/C++ extension for the heavy lifting, and if that’s not enough, your app is at the top 1% of performance demand, and you need to use C/C++ directly.

I’ve heard the defendants of java claim that’s faster than C/C++, but fundamentally that’s not possible, since it’s a layer on top of C/C++. Yes, JIT this and caching that, but these add complexity, which violates my #1 rule of software design, and if you added those same JIT and caching layers to C/C++, they would be even faster.

I will give Java the win in 2 areas of web based programming:

  1. Where development time and performance does not matter, and data integrity is the absolute most important factor. For example, stock trading and banking sites. If I was asked to build E*trade.com, I would use Java and Oracle instead of PHP and MySQL. It would take 5 times longer to do everything, and hardware/software costs would be 10x more, and the web site would be slower, but it would be the most robust solution.
  2. Where development time and performance does not matter, and there are advantages in maintaining advanced "state" information with transactions and rollbacks. For example - online poker sites and ticketmaster.com (advanced reservationing). I’m sure someone’s done that using Ajax, but I wouldn’t trust money flowing over such a system, and I’d recommend Java.

For the other 99.9% of web applications, scripting languages or C/C++ is a better choice, and the complexity that Java introduces is despicable, and in my opinion, making the choice for Java is doing a disservice to your company in terms of cost (both development time and hardware).

Show me a web application that scales well in Java, and I’ll rewrite it for in in PHP in half the time and it will be twice as fast and one more "9″ in availability. If it’s still not fast enough, it needs to be done in C.

I am not always popular with this argument. Quite a few of my developer peers, whom I respect, have strong pro-java arguments. I have a bet with one of these Java ninny’s - I think that Java will be less prevalent in 5 years than it is now, because of it’s excels-at-nothing nature. There is a bottle of expensive tequila riding on this, so I expect to be right. We’ll see. ;-)

Published by nick on 13 Jan 2009

Ultimate vimrc file - Good for php, bash, ruby and others

I’ve built this one up for a few years, and now it’s time to share. Notable features:

  • Syntax highlighting
  • Test compile for syntax errors. This means that every time you write the file, it will check the file for syntax errors and alert you immediately. This saves much back and forth with development. It works with the following languages:
    1. php
    2. bash
    3. perl
    4. httpd.conf
    5. xml
    6. ruby
    7. puppet
    8. javascript (if you have jslint installed)
  • Tab completion of php functions (if you download Rasmus’s function list and put it in your home directory. curl -o ~/.phpfunclist.txt -v http://lerdorf.com/funclist.txt)

    Ok, enough teasing. The ultimate vimrc file can be found here

Published by nick on 23 Sep 2008

Good bye OpenX. Hello Google Ad Manager.

Websites need ads. It’s one of the things that make the internet go ’round. That and porn.

If a startup wants to have a free solution for serving ads, there has really only been one choice for many years, OpenX, formerly known as phpAdsNew. OpenX has been at Wikia for quite some time. After hitting some brick walls with scalability, having downtime/slowness issues, and getting frustrated with basic functionality that work without taking down the server, I decided it was time to try something new.

I looked into Google Ad Manager over the past few days. It seems like it can do the job, and last night I wrote all the code. Today I switched all of the wikia.com websites from OpenX for serving Spotlight Ads to Google Ad Manager.

Here are the compelling reasons I found for switching.

  • OpenX is crap — It is possible to write high scale web applications in PHP/Mysql. I’ve done it, multiple times. OpenX has not. Sorry for being a bit arrogant here, but I will happily engage an OpenX architect and question numerous design decisions. As an example: Logging impressions to a relational database in real time is a horrible idea. Horrible. It will never scale. Telling people that the right way to solve this problem is by logging on the app servers? Even worse.
  • Google’s infrastructure — Even if OpenX wasn’t horrible, I still don’t want to have to worry about buying servers, system administration time, and bandwith for my ad infrastructure. I put more faith in Google’s and Yahoo!’s infrastructure than anything a startup can build.
  • It’s easier to use — I found the interface and code setup far more intutive than OpenX. So have the 4 other people that I’ve been working with to load ads. They love how simple Google Ad Manager is. That being said, there are a couple of less-than-intuitive things with Google Ad Manager, so it wasn’t completely painless. Maybe Apple needs to come out with iAdManager? :)
  • It’s Free — Ok. Did you guys hear that? Free. Free hosting of the graphics. Free server infrastructure. Estimates are that this will save Wikia.com $5000 a month in bandwidth and servers.

    Is this the death of OpenX? No. There are still some things that Google Ad Manager can’t do. There is also a bunch of technical weirdos that think Google has too much power, so they will continue to use OpenX out of fear.

    However - Google just flexed their muscle, and they pulled off a great first product. Good work Google.

    And if someone knows how to short OpenX stock, let me know, ;-)

Published by nick on 01 Aug 2008

PHP Performance tip: require versus require_once

One of the big performance oriented complaints with PHP is that it doesn’t do well with large frameworks that have a lot of included files. Symfony and Mediawiki are two that I’ve had this problem with.

Why is it slow to load a lot of files in PHP?

Let’s take a closer look.

Quick note: In this post I’ll assume you are already using a PHP Accelerator, such as APC, or Turk MMCache or eaccelerator. If not, you need to be. My personal pick is APC, mostly because it’s the preferred one at Yahoo!, which has the largest installation of PHP, and the author of PHP works there. The lead maintainer of APC also works there, so I feel good knowing that APC is well supported. There are rumors that PHP 6 will have this accelerator built in, and that it will be based on APC’s code.

With that PHP accelerator plug out of the way, let’s get back to business.

Normally when php does a require to include a file, it does a stat to see if the file has changed, and if not, loads it from the APC cache. Here’s what that looks like at the C level:

 * stat64("./classes/Class1.php", {st_mode=S_IFREG|0644, st_size=2057, ...}) = 0

Tip: Want to know how to look at what code is doing at the C level? Check out this tutorial on using strace to debug web apps

Nice, simple, clean, one stat per file. Note: with APC, you can set apc.stat to off, and this will skip the above stat call as well. The downside: You have to restart apache whenever you change your code.

Now let’s take a look at what happens when you use require_once instead of require:

 * lstat64("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki/include_test", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki/include_test/classes", {st_mode=S_IFDIR|0755, st_size=20480, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki/include_test/classes/Class1.php", {st_mode=S_IFREG|0644, st_size=2058, ...}) = 0

That’s one stat for each directory. It does this for every single file you include. With require_once, php must call realpath (at the C level) to know what the actual path of the file is. Otherwise, it won’t know if require_once '../../../mydir/Class.php'; is the same as require_once '../mydir/Class.php';

Note that it also must do this for every directory in your include_path, so if you don’t have that set up correctly, this is exacerbated even more. Each one of these stats is a system call that takes time. More work for your servers and slower responses for your users.

Theory: The extra stats required for require_once and include_once introduce a lot of overhead for applications that include a lot of files.

A real world test — At Wikia, we had a common include file that was loading all of our Mediawiki extensions. It had 113 calls to require_once and 172 calls to include_once. By changing these to require and include respectively, the results were significant.

First, strace revealed that there were 2848 syscalls to serve a page, down from 4782, (-40%). Next I went to ab for more testing, and found that the average page request time went down to 36.5, from 46.5ms (-22%), and the server was able to serve 27.1 requests per second, up from 21.4 (+%22) . View the complete output from ab

Conclusion: require_once does not perform as well as require. Don’t use require_once unless you need it. require will save system calls and deliver pages faster to end users. This also applies to the include/include_once counterparts.

It would be great if someone would write up a tool that walked through your code base and made recommendations for these types of performance tweaks. Hmm….

Published by nick on 23 Sep 2007

When NOT to use a database abstraction layer in PHP

The most common reason that people cite for using a database abstraction layer is — "You can change the underlying database and your application doesn’t need to be rewritten."

From an academic perspective, this makes sense, and it’s hard to argue with. Especially in an interview or when you are talking to the fresh college grad with a Computer Science degree.

In the real world, it doesn’t happen.

Show me one person that has built their application with one database in mind, and then switched to another without having to rewrite some code, and I’ll show you 200 others that built their project using a database abstraction layer and they will never change the underlying database.

The concept of database neutrality is valuable under certain circumstances. If you are writing an open source software platform such as MediaWiki, and people that you don’t know will be installing the software on their own servers, and you want them to be able to use the database of their choice, then using a database abstraction layer for database neutrality is a clear winner.

There are other good reasons to use database abstraction layers:

  • Convenience methods that you want to be able to reuse. My personal favorite example is getRowFromSql(), where you give it a sql statement that returns 1 row, and it will return an associative array with the columns/values.
  • Common configuration settings. Ie, your username, password, and host information are all in one place.
  • Central configuration of write vs. read connections. Do this right from the start, because you’ll need it when you go to scale mysql.
  • Reuse of database connections for performance

These are enough for me, and they are the reason I use database abstraction layers. But if the only reason you are using a database abstraction layer is so that you can easily switch your database to something else in the future, you’re nuts, and you may want to make sure you are following the number one rule of software.

Me personally, I just have a simple class that extends PDO.

Optimize your ads with Liftium.com