Archive for the 'technology' Category

Published by nick on 07 Nov 2008

Apple - Please put Wireless Broadband in every MacBook

Dear Steve Jobs,

Right now I use Verizon Wireless broadband. It’s better than not having it, but I would much prefer to have it built into my Macbook Pro. Right now I have external USB device, and while it works, it is clunky at best.

Please partner with AT&T, so that I pay one bill, and get my iPhone and wireless Broadband together. Do your usual routine where you make it easy to use, painless, intuitive, etc.

It shouldn’t be that difficult to pull off, as you already have 3G iPhones available. They already have wireless broadband built into laptops in Europe, check there for more info.

Thanks,

Loyal Apple Customer

Published by nick on 23 Sep 2008

Good bye OpenX. Hello Google Ad Manager.

Websites need ads. It’s one of the things that make the internet go ’round. That and porn.

If a startup wants to have a free solution for serving ads, there has really only been one choice for many years, OpenX, formerly known as phpAdsNew. OpenX has been at Wikia for quite some time. After hitting some brick walls with scalability, having downtime/slowness issues, and getting frustrated with basic functionality that work without taking down the server, I decided it was time to try something new.

I looked into Google Ad Manager over the past few days. It seems like it can do the job, and last night I wrote all the code. Today I switched all of the wikia.com websites from OpenX for serving Spotlight Ads to Google Ad Manager.

Here are the compelling reasons I found for switching.

  • OpenX is crap — It is possible to write high scale web applications in PHP/Mysql. I’ve done it, multiple times. OpenX has not. Sorry for being a bit arrogant here, but I will happily engage an OpenX architect and question numerous design decisions. As an example: Logging impressions to a relational database in real time is a horrible idea. Horrible. It will never scale. Telling people that the right way to solve this problem is by logging on the app servers? Even worse.
  • Google’s infrastructure — Even if OpenX wasn’t horrible, I still don’t want to have to worry about buying servers, system administration time, and bandwith for my ad infrastructure. I put more faith in Google’s and Yahoo!’s infrastructure than anything a startup can build.
  • It’s easier to use — I found the interface and code setup far more intutive than OpenX. So have the 4 other people that I’ve been working with to load ads. They love how simple Google Ad Manager is. That being said, there are a couple of less-than-intuitive things with Google Ad Manager, so it wasn’t completely painless. Maybe Apple needs to come out with iAdManager? :)
  • It’s Free — Ok. Did you guys hear that? Free. Free hosting of the graphics. Free server infrastructure. Estimates are that this will save Wikia.com $5000 a month in bandwidth and servers.

    Is this the death of OpenX? No. There are still some things that Google Ad Manager can’t do. There is also a bunch of technical weirdos that think Google has too much power, so they will continue to use OpenX out of fear.

    However - Google just flexed their muscle, and they pulled off a great first product. Good work Google.

    And if someone knows how to short OpenX stock, let me know, ;-)

Published by nick on 07 Sep 2008

Apple: When are you going to make a Game Console?

PS3 and Xbox 360 are ok, but suffer from some major usability issues. Stuff doesn’t work as it should. It’s obvious to me as someone who uses Apple products regularly that the game console world could benefit from Apple influence.

Loyal apple fans would gladly line of for a console if you made one.

Please put that on the list.

Published by nick on 01 Aug 2008

PHP Performance tip: require versus require_once

One of the big performance oriented complaints with PHP is that it doesn’t do well with large frameworks that have a lot of included files. Symfony and Mediawiki are two that I’ve had this problem with.

Why is it slow to load a lot of files in PHP?

Let’s take a closer look.

Quick note: In this post I’ll assume you are already using a PHP Accelerator, such as APC, or Turk MMCache or eaccelerator. If not, you need to be. My personal pick is APC, mostly because it’s the preferred one at Yahoo!, which has the largest installation of PHP, and the author of PHP works there. The lead maintainer of APC also works there, so I feel good knowing that APC is well supported. There are rumors that PHP 6 will have this accelerator built in, and that it will be based on APC’s code.

With that PHP accelerator plug out of the way, let’s get back to business.

Normally when php does a require to include a file, it does a stat to see if the file has changed, and if not, loads it from the APC cache. Here’s what that looks like at the C level:

 * stat64("./classes/Class1.php", {st_mode=S_IFREG|0644, st_size=2057, ...}) = 0

Tip: Want to know how to look at what code is doing at the C level? Check out this tutorial on using strace to debug web apps

Nice, simple, clean, one stat per file. Note: with APC, you can set apc.stat to off, and this will skip the above stat call as well. The downside: You have to restart apache whenever you change your code.

Now let’s take a look at what happens when you use require_once instead of require:

 * lstat64("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki/include_test", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki/include_test/classes", {st_mode=S_IFDIR|0755, st_size=20480, ...}) = 0
 * lstat64("/home/webuser/src/mediawiki/include_test/classes/Class1.php", {st_mode=S_IFREG|0644, st_size=2058, ...}) = 0

That’s one stat for each directory. It does this for every single file you include. With require_once, php must call realpath (at the C level) to know what the actual path of the file is. Otherwise, it won’t know if require_once '../../../mydir/Class.php'; is the same as require_once '../mydir/Class.php';

Note that it also must do this for every directory in your include_path, so if you don’t have that set up correctly, this is exacerbated even more. Each one of these stats is a system call that takes time. More work for your servers and slower responses for your users.

Theory: The extra stats required for require_once and include_once introduce a lot of overhead for applications that include a lot of files.

A real world test — At Wikia, we had a common include file that was loading all of our Mediawiki extensions. It had 113 calls to require_once and 172 calls to include_once. By changing these to require and include respectively, the results were significant.

First, strace revealed that there were 2848 syscalls to serve a page, down from 4782, (-40%). Next I went to ab for more testing, and found that the average page request time went down to 36.5, from 46.5ms (-22%), and the server was able to serve 27.1 requests per second, up from 21.4 (+%22) . View the complete output from ab

Conclusion: require_once does not perform as well as require. Don’t use require_once unless you need it. require will save system calls and deliver pages faster to end users. This also applies to the include/include_once counterparts.

It would be great if someone would write up a tool that walked through your code base and made recommendations for these types of performance tweaks. Hmm….

Published by nick on 20 Jun 2008

Howto - Simple backups for Linux using rsnapshot

Backups are something you must master to be a great system administrator.

You’e probably found this because you were looking for a simple backup solution. Yes, you’ve seen Amanda. And Bacula, but they aren’t simple. Amanda and Bacula are great products if you need all of their features — and if you are like me, I don’t want to spend time with my backups, I just want something that works.

My choice — rsnapshot. rsnapshot is a perl script that wraps around rsync. It’s most beautiful feature: it uses hard-links when it can, so if you are backing up the same file more than once, it just creates a link. This means backups only take up more space if the files change. I’ve heard that this is how Apple’s Time Machine works. I’m now using rsnapshot in multiple production environments. Here’s a quick how-to guide for how to set up reliable, robust, efficient, self-rotating backups in just a few minutes with rsnapshot.

  1. Install rsnapshot, either by downloading/compiling the source code, or using this RPM for linux
  2. Edit /etc/rsnapshot.conf for your settings. Warning: The config file makes a distinction between tabs and spaces. Make sure you use tabs! Pay special attention to these settings:
    1. snapshot_root — this is where your backups will be stored.
    2. The backup section will define which directories you want backed up.
      <br /> backup /etc/ localhost/<br /> backup /root/ localhost/<br /> backup /home/ localhost/<br />
    3. rsnapshot handles the rotation of backups for you! The interval section will define how many daily, weekly, etc. backups are kept. Example:
      <br /> interval hourly 6<br /> interval daily 7<br /> interval weekly 4<br /> interval monthly 3<br />
  3. As a test, you can run: rsnapshot -v -t hourly and this will parse the config file, and show you the commands it will run when it runs hourly.
  4. After you are done tweaking the config file, it’s time to add the crontab entries for the various backups. Scheduling is a bit tricky. Here’s mine as an example:
    <br /> # There is a pid file that will preven two from running at the same time.<br /> # This is why hourly starts after the others. Hourly should be skipped when daily/weekly/monthly is running.<br /> 19 */3 * * * nice rsnapshot -v hourly<br /> 18 1 * * * rsnapshot -v daily<br /> 17 2 * * 0 rsnapshot -v weekly<br /> 16 3 1 * * rsnapshot -v monthly<br />

That’s it! You’re ready to go. Your backups will be stored in rsnapshot_root.

Published by nick on 05 Jun 2008

Phone Home Script to Protect Your Laptop

Let’s say your laptop is stolen. Wouldn’t that be awful?

Now, what if you had a way to track down the person that took it and get it back?

Using Linux or Mac, it’s easy. let’s take a look at a script that will do this for you. It will take you less than 5 minutes to set up.

Save the above in /tmp/phone.bash (change $yourserver to a place where you can have a file hosted), then set add a crontab entry to have it run every 5 minutes:

*/5 * * * * /tmp/phone.bash

What does it do? Every 5 minutes, this script will run, and it will execute whatever code you have placed in the phonehome file on $yourserver. By default, I have my phonehome file just set to run true, which does nothing. But if my laptop gets stolen, I modify the phonehome file to include bash commands. I have the full power of bash on my laptop as soon as the thief connects to the internet. Imagine the possibilities. Here are some ideas:

  1. traceroute - Let’s go ahead and get his IP address and where he is at. We should be able to give this to the police who can then contact his ISP and get his address. Save the output and copy it to $yourserver:

    traceroute > /tmp/traceroute.out
    scp /tmp/traceroute.out $yourserver:/tmp/theiftraceroute

    Tip: Set up ssh without a password to $yourserver so you can easily send information back and forth with rsync and/or scp.

  2. Keystroke logger - Now let’s record everything he types. Using bash, we can download, install, and run a keystroke logger. Here’s one that works for OSX
  3. Take a picture - If you have a Mac with a built in iSight camera, you can use isightcapture to take a pictures of the thief and send them to you!

    curl –silent "$urlforisightcapturescript" > /tmp/isightcapture
    chmod a+x /tmp/isightcapture
    /tmp/isightcapture –file /tmp/pictureoftheif$RANDOM
    scp /tmp/pictureoftheif* $yourserver:/tmp/

    Update:. See my post on automatically capturing pictures with isightcapture

Someone. Please. Steal my laptop. I can’t wait to use this.

Published by nick on 02 Jun 2008

Debugging web apps with strace

Want to be an advanced debugger? My #1 Superman debugging tool is Linux’s strace. If you have ever run into problems where a user complains that the site is slow, and you can’t figure out why, you may want to give strace a try.

From http://sourceforge.net/projects/strace/:

strace is a system call tracer, i.e. a debugging tool which prints out a trace of all the system calls made by a another process/program.

In other words, strace tells you what a program is doing, at the C function call level. This is great for finding the problems where a page just "hangs" for no apparent reason. Let’s walk through what it takes to set up strace on Apache in a LAMP environment, with some real world examples that I’ve run into.

First, you’ll need to install strace, if it isn’t already installed. My favorite method is just yum install strace, but if you want to, you can download and compile it yourself.

Next, you will need a place where you can test the slow page. For the rest of this article, we will assume you have a development environment that is all to your own, where you can start/stop Apache at will, and no one else will be using it. Note: If a separate development environment isn’t available, I suggest running another Apache on a different port, say 81 instead of 80. This way you can still work on the production site without affecting end users.

Environment set up? Good. Let’s get down to debugging.

  1. Start Apache in "Debug Mode" with the -X option. This has Apache start one process, instead of a bunch of children, and then all the requests will go through one process.

    httpd -X

  2. In another terminal window, find the process id for the listening Apache that you just started. ps auxw | grep httpd should do the trick.
  3. Once you have the process id, attach strace with the -p option:

    strace -p $processidofapacheprocess

  4. Go to your browser and go to the url that is hanging. While it is running, watch the output from strace in your terminal window. You’ll see a ton of system calls stream by, but the important thing to look for is when it stops. What is it doing?

I’ve used this approach to find several "Superman" level problems (problems that other people spent at least a day trying to figure out what was going on — sometimes weeks). Here are some examples.

  1. Sendmail hanging via PHP - The reported problem was that certain pages were slow (30-300 seconds). Load on the machines seemed fine, but certain requests were painfully slow. strace revealed that the PHP script was waiting for sendmail to come back with a response. Upon looking further, sendmail was doing a reverse dns lookup that was timing out, which resulted in a 30+ second delay. Problem resolved by reconfiguring sendmail.
  2. PHP pages slow on an NFS server - The reported problem was a development environment with pages that were slow to load. strace revealed that the pages were hanging at a flock call to a directory that was mounted via NFS. Here’s the actual output from strace:

    …pages of output snipped…
    fcntl(24, F_SETFL, O_RDWR) = 0
    sendto(24, "incr toys:stats:request_with_ses"…, 40, MSG_DONTWAIT, NULL, 0) = 40
    poll([{fd=24, events=POLLIN|POLLERR|POLLHUP, revents=POLLIN}], 1, 500) = 1
    recvfrom(24, "76\r\n", 8192, MSG_DONTWAIT, NULL, NULL) = 4
    open("/home/phpsessions/sess_079113645a3da0fe50f68e4ce6ed58d2″, O_RDWR|O_CREAT, 0600) = 25
    flock(25, LOCK_EX

    So we can see here that the file /home/phpsessions/sess_079113645a3da0fe50f68e4ce6ed58d2 has been opened, and the flock call is hanging. Turns out NFS doesn’t deal well with flock. When we saw this, there was a big smack on the forehead. Why on earth were the sessions being stored via NFS anyway? Especially for a development server, where only one box needed to store it. To solve the problem, we changed the session.save_path in the php configuration file to a directory that was not on NFS.

  3. Memcached hanging - Again, certain requests were hanging, causing pages to be slow to load. Again, strace to the rescue! Turns out PHP was hanging when talking to memcached. Once this was determined, we also ran strace on memcached, and found a bug with the particular memcached client we were using via PHP. We upgraded the memcached client to the latest version, and the problem was solved.

In all of the above cases, the problem could have been found through other means, but strace made it a much easier and faster to figure out where the slowdowns were.

There are other helpful uses of strace. In addition to finding hanging web pages, I’ve also used strace to find why/where Apache was segfaulting. Just run strace and look to see what the last thing it did. It should give you an indication of why the script stopped when it did.

Also, I’ve used Apache as a troubleshooting tool to find out where most of the time is being spent by analyzing the entire request.

Good luck in your adventures with strace, it’s been a big help for me. Feel free to leave a comment with your findings.

Published by nick on 25 Apr 2008

Debugging rules

My co-worker, Artur Bergman, gave a talk today at the Web 2.0 Expo.

He highlighted these rules of debugging, and they were great, so I wanted to share.

  1. Understand the system
  2. Make it fail
  3. Quit thinking and look
  4. Divide and conquer
  5. Change one thing at a time
  6. Keep an audit trail
  7. Check the plug
  8. Get a fresh view
  9. If you didn’t fix it, it ain’t fixed.

Brilliant! See http://www.debuggingrules.com/

Published by nick on 11 Mar 2008

Wikinvest wins award for Best Business website of 2007

In addition to being full time at Yahoo!, another company I work for part time, Wikinvest.com, won an award at the 11th Annual South by Southwest Interactive Web Awards for best website of 2007 in the business category. See the complete list of winners

Hooray!

Shameless Plug:
Wikinvest is a website that applies the concept of wikipedia (community edited content) to the world of investing. We’ve got some great content, some very interesting expansions to the MediaWiki software, and the Search is pretty damned cool do (I built that). ;-)

If you are into investing, I encourage you to take a look

Published by nick on 07 Mar 2008

Is Microsoft changing it’s evil ways?

As a Yahoo! employee that stands emphatically against Microsoft’s hostile takeover, I’ve been pretty vocal about how awful Microsoft is because of their poor technology, their anti-open source stance, and their anti competitive history. So much so that Yahoo! management has asked me to "tone it down". Ha. Ever since grade school, I do what I think is right, not what I am told to do.

Could I be wrong? Could Microsoft be changing it’s ways?

First, Microsoft announced that they were open-sourcing some of their platforms, a clear attempt at making-good with the open source community. I cautiously applauded this when it happened, even though I think that the timing was very convenient for appeasing Yahoos.

Today I ran across an article by Robert Scoble, where he highlights what good Microsoft has done in the last 6 months, and I agree the results are encouraging.

I just had dinner with a bunch of Italy’s top tech bloggers and technologists and Marc Canter. Plus I’ve been talking with people all day long. Microsoft hit major Internet home runs today with its announcements, based on what I’m hearing from formerly-skeptical developers.

I haven’t heard this level of excitement about Microsoft’s Internet Strategy in years.

Interesting story, it’s worth a read. I support a change of heart by Microsoft, and I hope it’s genuine.

Not believing that Microsoft could really be trying to be a good citizen, I dug deeper and found an article by Dana Gardner, and he poses some interesting questions on what Microsoft’s ulterior motive may be:

And that raises the same old questions. Will the power increase to a point where the openness declines? Will the standards over time be increasingly set by the de facto marker leader? Will the Internet and its efficiencies work best for consumers and users, or those that can manipulate it best?

See the full story here

Hmm. What are they up to? We will see. Most of the past 10 years of Microsoft’s business practices are marred with bad karma. If they are good citizens for the next 5, they can show the world that they have changed.

« Prev - Next »

Optimize your ads with Liftium.com