Published by nick on 15 Jun 2009

What software developers can learn from BattleBots

I attended Robogames in San Francisco this past weekend with my 3 sons. We had a great time watching the robots try to kill each other. I let out a Tim Allen grunt at several points, especially when we saw one of the robots with a flame thrower.

I had some time for reflection in between matches, and my pattern-recognition-heavy brain kicked into gear to find out what was similar between the engineering efforts behind robot building and a software engineering. Not everything lines up, but I did notice that some of my thoughts of simplicity and robustness carried over to the world of attack robots. In no particular order:

  • Staying moving is more important than having a good weapon. While it did occasionally happen that a kick ass weapon ended a match (and when it did, it was supremely cool!), it was certainly not the majority. Instead of weapons, most of the time the other robot was defeated by getting "stuck" in one way or another, either by being flipped upside down or stuck on an obstacle.

    How does this translate to the software world? If you have a great piece of software, but it doesn’t work, it won’t be successful. Never let any new weapon feature compromise stability and robustness.

  • Simplicity wins - The more moving parts, the more things that can go wrong and break. It was easy to see the robots products that were over-engineered by the uber geeks. I laughed when I saw these robots quickly die because something simple went wrong. My favorite example was a group of "rocket scientists" that built a robot with two ginormous spinning wheels as weapons. Each wheel was turned by an elaborate contraption of gears and pulleys. You know the end of the story. Resist the temptation to over-engineer, and pride yourself on the simplest solution that accomplishes the goals. Or, to quote Albert Einstein,

    Everything should be made as simple as possible, but not simpler.

  • Test, Test, Test - you could really tell a difference between the "mature" robots who had seen a couple of fights, and you could really tell the robots that their owners decide to use them for the first time in the ring. Thorough testing should be done before the robot a product goes into battle the real world
  • Quality construction and materials. This was almost as important as design. Flimsy aluminum and other cheap materials quickly gave way under battle, and poorly engineered welds or connections suffered the same fate. Same goes in software - spaghetti code may look like it’s going to work, but won’t stand up to a fierce battle.

It’s always fun when we can learn from the convergence of two different sciences.

Published by nick on 11 May 2009

The cure for spam - forever

SPAM. No, not the barely edible processed meat, I’m talking about unsolicited messages we all receive, typically via e-mail. BTW - a common misconception is that SPAM is all junk mail. By the nerdiest of definitions, spam is only the unsolicited messages. If you signed up for a newsletter or a alert every time a new job is posted, that’s not spam, that’s something you asked for. SPAM, by definition, has to be unsolicited.

Another useful tidbit - the term SPAM comes from a famous Monty Python skit - the analogy is that it’s something that keeps coming over and over again even though you didn’t ask for it nor did you want it. Here’s the original skit on youtube, but fair warning, it’s so repetitive that it is a bit painful to watch… much like well, SPAM.

Lots of theories abound for how to solve the spam problem. Filters. Bayesian spam detectors. There is an entire anti-spam industry where companies spend hundreds of millions (billions?) of dollars a year fighting spam.

Fear not, I have the ultimate solution so everyone can stop getting spam. And it’s quite easy, but will take the effort of the world over. Intrigued? Read on.

Why does spam exist? Because advertisers can cheaply and easily reach millions of eyeballs with very low cost. Sure, most of the people that get the e-mail don’t read it. Quite a few are even annoyed. But of the remaining that is left, a few read it. And guess, what, a few of those actually go to the website to see what is being sold. And a few of those wind up buying something.

So a spammer’s math works like this. I send out 10,000,000 messages for the cost of bandwidth and processing time, which can be free if they are using a hacked/hijacked machine, or just a few dollars if they pay someone. Let’s say it costs $25. Sure, ISP’s don’t like this behavior, so they’ll be kicked off, but they just go to the next one.

Of that 10,000,000 messages sent out, let’s say 50% of them bounce, because the spammers e-mail list is out of date, or the person’s inbox is full, or the spam filter rejects them. 5,000,000 left.

Of that 5,000,000 people who get the message, 1% read it. 50,000 left.

Of the 50,000 people who actually read the message, 1% go to the website. 500 left.

Of the 500 people that go to the website, 10% buy something. Assuming they make $25 on each product purchased, that means they make $1250 (50 X $25). Not too shabby a return for $25. In some parts of the world, $1200 is more than people make in a year.

So how do we stop all this madness? Drum roll….

STOP BUYING PRODUCTS FROM SPAMMERS!

If no one buys from spammers, the above formula breaks. Spammers will continue sending spam as long as it works. As long as you one-percenters are out there making it worth their while, they’ll keep leveraging the law of numbers and technology to send us messages we don’t want.

Do your part. Don’t support spammers. Don’t buy products advertised via spam!

Published by nick on 24 Apr 2009

Why on earth doesn’t javascript have a json_encode?!

Every major language now has tools for JSON encoding a string of text, which is a format that is natively read, understand, and used by Javascript. PHP’s json_encode works great, I use it all the time for data transport. The reasons for using XML are getting harder and harder to come by. JSON is much easier to work with.

Every major language has a built in json_encode, except, Javascript!

WTF? Irony at it’s best.

I’m using Douglas Crockford’s JSON.js in the meantime, but Firefox, Internet Explorer, Safari developers - please include support for this in a future release.

Update Jun 29 2009: I was listened too. :) Firefox 3.1+ and IE 8+ now have native JSON support. More info

Published by nick on 31 Mar 2009

Automatic isight capture to have your Mac be a time lapse camera

My Apple MacBook Air has a built in iSight camera. I wanted to have it take pictures automatically every minute and save them in a folder for an upcoming home improvement project, effectively turning it into a time lapse camera. It’s also good for being able to take pictures of people using your computer. Handy when used in combination with my phone home script. ;-)

Here are the steps.

  1. Download isightcapture, a command line utility for capturing images. Drag it into /Applications/ folder
  2. Make a folder for where you want the images stored. I used ~/isight_capture/.
  3. Set up a cronjob to take the pictures and name them with a minute in the filename. From terminal (you need to know vi:
    1. crontab -e
    2. Paste */2 * * * * /Applications/isightcapture ~/isight_capture/image.`date +\%Y-\%m-\%d_\%H_\%M`.jpg
    3. Save and quit

That’s it! Within one minute, you’ll start to see images in your ~/isight_capture folder every two minutes. Note that they will build up as long as the computer is on, which is about 15 megabytes a day. You may want to have something else that cleans them up. You could also try every 5 minutes instead with */5 for the minutes for the cron, which will do it every 5 minutes instead.

Published by nick on 24 Feb 2009

PHP vs Java vs C/C++ for web applications

From Incremental Operations Blog, this call stack shows the layers involved in a [typical?] Java stack.

To be fair, this is not necessarily Java itself, but poorly written Java code. But based on my experience, this type of excessive architecture is accepted best practice in the Java world, and the architecture astronauts seem to gravitate to this technology.

The original pipe dream of Java was "build once, run anywhere". Since that hasn’t exactly materialized, Java is the bastard step child of programming. It doesn’t really fit in anywhere. It is neither high performance nor robust (C/C++), nor easy to program in (PHP/Python/Ruby). It’s awkwardly stuck in the middle, and doesn’t do either well. If you need performance that exceeds native PHP/Python capabilities (rare in the typical work place), use a C/C++ extension for the heavy lifting, and if that’s not enough, your app is at the top 1% of performance demand, and you need to use C/C++ directly.

I’ve heard the defendants of java claim that’s faster than C/C++, but fundamentally that’s not possible, since it’s a layer on top of C/C++. Yes, JIT this and caching that, but these add complexity, which violates my #1 rule of software design, and if you added those same JIT and caching layers to C/C++, they would be even faster.

I will give Java the win in 2 areas of web based programming:

  1. Where development time and performance does not matter, and data integrity is the absolute most important factor. For example, stock trading and banking sites. If I was asked to build E*trade.com, I would use Java and Oracle instead of PHP and MySQL. It would take 5 times longer to do everything, and hardware/software costs would be 10x more, and the web site would be slower, but it would be the most robust solution.
  2. Where development time and performance does not matter, and there are advantages in maintaining advanced "state" information with transactions and rollbacks. For example - online poker sites and ticketmaster.com (advanced reservationing). I’m sure someone’s done that using Ajax, but I wouldn’t trust money flowing over such a system, and I’d recommend Java.

For the other 99.9% of web applications, scripting languages or C/C++ is a better choice, and the complexity that Java introduces is despicable, and in my opinion, making the choice for Java is doing a disservice to your company in terms of cost (both development time and hardware).

Show me a web application that scales well in Java, and I’ll rewrite it for in in PHP in half the time and it will be twice as fast and one more "9″ in availability. If it’s still not fast enough, it needs to be done in C.

I am not always popular with this argument. Quite a few of my developer peers, whom I respect, have strong pro-java arguments. I have a bet with one of these Java ninny’s - I think that Java will be less prevalent in 5 years than it is now, because of it’s excels-at-nothing nature. There is a bottle of expensive tequila riding on this, so I expect to be right. We’ll see. ;-)

Published by nick on 13 Feb 2009

Google Canonical Href - with Mediawiki

It’s time to unwind the giant mess of 301’s, meta tag, and robots.txt hacks that we have in place — all aimed and eliminating "duplicate" content for search engines. We now have a simple way to tell search engines what the canonical representation of a url. That’s the promise of the new canonical tag, and I think it will work. Here’s the syntax:
<link type="canonical" href="/the/trusted/url/of/the/page">

More info here:
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

And note that it is also supported by Yahoo! and MSN.

Why am I so excited about it? Because I implemented it at Wikia, who was Google’s "trusted user" (note that Google mentions starwars.wikia.com as one of their examples)

Mediawiki has a problem with duplicate content. First, it has "soft" redirects, where two articles with different urls can point to the same content (which Google labels as "duplicates"). I had previously written extensions for Mediawiki that turn these into "hard" redirects (by issuing a Location: header with a 301 redirect). This showed a positive uplift for SEO, but it always felt like a hack. The canonical tag is a far more elegant solution, and improves performance by reducing 301’s.

Second, there are many entry points into an article in mediawiki:

/wiki/Article_Name
/wiki/index.php/Article_Name
/wiki/index.php?title=Article+Name
/wiki/Article_Name?action=view

All of the above urls will produce the *exact* same content in Mediawiki, but search engines will treat them as different urls, which splits page rank and may introduce the infamous duplication penalty.

Both of these problems can be easily solved with the new canonical tag, and it’s quite elegant.

I’ve written a new Mediawiki Extension for supporting the google canonical href tag at Wikia. It’s open source, and available at Wikia’s SVN repo for all to use. I will be contributing it to the core mediwiki software as an extension soon. Update: Now available in the Wikimedia SVN repo

This is a big help outside of Mediawiki as well, take "printable" pages as an example, or even urls with extra parameters in the query string - the canonical tag can funnel all of the page rank into one version of the page.

Kudos to Google (esp. Matt Cutts), Yahoo!, and MSN on coming together to provide a clean and elegant solution to help fight the duplicate content problem.

Published by nick on 11 Feb 2009

I only block for memcached

I only block for memcached tshirt

Want to know the magic secret to building scalable apps? Don’t have blocking calls. Think back to all the performance problems you’ve ever had. Chances are your app was waiting on one of the following:

  • Database
  • HTTP ( a remote web service)
  • File system (either local or remote [NFS])
  • Some other blocking service

Remove all these, and your app is fast! The one exception that has always treated me well is memcached. memcached in and of itself doesn’t block, and if you can write an app that only blocks on memcached, you’ve written a very well scaled app.

You may think I’m exaggerating. And maybe I am, to make a point. But I just did built an app that scales using memcached, and it powers all of Wikia’s ad traffic on two servers, handling approximately 500 transactions a second, and storing 7500 pieces of data per second.

Actually, one server can handle the load, I just use two for redundancy. ;-)

Scale with memcached, skip all the other blocking calls! Tip: Ajax isn’t blocking.

I liked this idea so much that I had T-shirts made that say "I only block for memcached". I’ve sold a couple to friends, and I will be giving one to Brad Fitzpatrick, the creator of memcached, as a Thank You from all of the users of memcached. Thanks Brad!

Published by nick on 19 Jan 2009

Automatically figure out which social bookmarking site to use with css

With dozens of them, displaying them all is rediculous (although I’ve seen it). Why not just display the ones that the user visits?

This write up (from someone I work with at Wikia) explains how - by using the CSS for "visited" links.

http://www.azarask.in/blog/post/socialhistoryjs/

A bit creepy, but interesting enough to pass along.

Published by nick on 13 Jan 2009

Ultimate vimrc file - Good for php, bash, ruby and others

I’ve built this one up for a few years, and now it’s time to share. Notable features:

  • Syntax highlighting
  • Test compile for syntax errors. This means that every time you write the file, it will check the file for syntax errors and alert you immediately. This saves much back and forth with development. It works with the following languages:
    1. php
    2. bash
    3. perl
    4. httpd.conf
    5. xml
    6. ruby
    7. puppet
    8. javascript (if you have jslint installed)
  • Tab completion of php functions (if you download Rasmus’s function list and put it in your home directory. curl -o ~/.phpfunclist.txt -v http://lerdorf.com/funclist.txt)

    Ok, enough teasing. The ultimate vimrc file can be found here

Published by nick on 07 Nov 2008

Apple - Please put Wireless Broadband in every MacBook

Dear Steve Jobs,

Right now I use Verizon Wireless broadband. It’s better than not having it, but I would much prefer to have it built into my Macbook Pro. Right now I have external USB device, and while it works, it is clunky at best.

Please partner with AT&T, so that I pay one bill, and get my iPhone and wireless Broadband together. Do your usual routine where you make it easy to use, painless, intuitive, etc.

It shouldn’t be that difficult to pull off, as you already have 3G iPhones available. They already have wireless broadband built into laptops in Europe, check there for more info.

Thanks,

Loyal Apple Customer

Next »