Archive for September, 2007

Published by nick on 25 Sep 2007

JSON vs XML vs serialize() for data

Who is this Jason guy and what does he want with our data? JSON stands for JavaScript Object Notation, and I think there are some pretty compelling reasons to use it all the time instead of php’s serialize() function, and maybe even to replace XML under some circumstances.

Why use JSON instead of XML for Asynchronous Javascript requests for your favourite web application? Well, why not? After all, it’s the simplest approach.

We’re going to build a widget for looking up the cities within 5 miles of a user supplied zip code. We want the user to enter their zip code into a form, and without reloading the page, use Javascript to go get data from our web service, and display it on the page. Let’s look at using JSON for the data type verses using XML.

JSON for web service

Server side:


$locations=getLocationsForZip($zip);
echo json_encode ($locations);

Client side:


var json=fetch(url);
var locations=eval(json);

XML approach

Server side:


$locations=getLocationsForZip($zip);
if (!empty($locations)){
  echo '<?xml version="1.0" encoding="utf-8"?>' . "\n";
  echo "<locations>\n";
  foreach ($locations as $location){
    echo "<location>\n";
    echo "<city>" . htmlspecialchars(utf8_encode($city)) . "</city>\n";
    echo "<state>" . htmlspecialchars(utf8_encode($state)) . "</state>\n";
    echo "<country>" . htmlspecialchars(utf8_encode($country)) . "</country>\n";
   echo "</location>\n";
  }
  echo "</locations>\n";
}

Client side:


var xml=fetch(url);
// TODO: write nasty code for parsing XML into DOM
// More nasty code to iterate through the DOM object it to put it into a usable array.

Now, that makes the most sense for Asynchronous Javascript requests, but going further, does it make sense to do it even for normal data transport mechanisms?

Language Neutral Data Storage

We need to store data on the file system or in a database that needs to be programming language independent. Historically, XML has been the obvious choice for this task. I’ve seen some people use serialized PHP (yuck). You could roll your own pipe delimited or CSV kludge. I think JSON is a best choice.

  1. JSON is higher performance than XML, both in construction and parsing. Those of us who have [tried] to build scalable applications using XML/XSL have learned… not to.
  2. JSON is language neutral, and built in! Every major language now has a json encode/decode capabilities.
  3. JSON is simpler. Just run json_encode() via PHP, and then ‘eval’ in javascript, and you’re done. See above examples.
  4. JSON is more compact than XML. Less data on the disk, less data over the wire.
  5. JSON contains character set information, and handles encoding issues for you (for the most part) — PHP’s serialize does NOT–, and this will cause problems for you when you internationalize
  6. JSON maintains structure and objects, unlike pipe delimited or CSV hacks.

The more I use JSON for building my web applications, the more I’m finding that my reasons for using XML are fading away. Fast. XML is more difficult to deal with.

If you really feel compelled to use XML, do the rest of us a favor and make your webservice support both formats. Tip: Build a RESTful webservice and allow for a .json extension in your url in addition to .xml.

Props to Chris Cowan for helping me to see the light, and Douglas Crockford for evangelizing the use of Javascript.

-Nick

Published by nick on 23 Sep 2007

When NOT to use a database abstraction layer in PHP

The most common reason that people cite for using a database abstraction layer is that you can change the underlying database and your application doesn’t need to be rewritten.

From an academic perspective, this makes sense, and it’s hard to argue with. Especially in an interview or when you are talking to the fresh college grad with a Computer Science degree.

In the real world, it doesn’t happen.

Show me one person that has built their application with one database in mind, and then switched to another without having to rewrite some code, and I’ll show you 200 others that built their project using a database abstraction layer and they will never change the underlying database.

The concept of database neutrality is valuable under certain circumstances. If you are writing an open source software platform such as MediaWiki, and people that you don’t know will be installing the software on their own servers, and you want them to be able to use the database of their choice, then using a database abstraction layer for database neutrality is a clear winner.

There are other good reasons to use database abstraction layers:

  • Convenience methods that you want to be able to reuse. My personal favorite example is getRowFromSql(), where you give it a sql statement that returns 1 row, and it will return an associative array with the columns/values.
  • Common configuration settings. Ie, your username, password, and host information are all in one place.
  • Reuse of database connections for performance

These are enough for me, and they are the reason I use database abstraction layers. But if the only reason you are using a database abstraction layer is so that you can easily switch your database to something else in the future, you’re nuts, and you may want to make sure you are following the number one rule of software.

Published by nick on 23 Sep 2007

Simplicity in software design

Rule #1 for building software. Simplicity.

Most good discussions wind up quoting someone famous, so here’s mine to get us started:

“Everything should be made as simple as possible, but not simpler.”
-Albert Einstein

As geeks, we are drawn to adding layers of abstraction. Maybe it’s because it makes us feel smart. Maybe it makes us feel clever. Maybe we think we are better developers because we are engineering for the future. Maybe we are trying to impress our colleagues by showing that we “follow best practices”.

Why do we add layers of abstraction? Well, we tell ourselves it’s to make future changes easier. That makes sense. But too often, the scenario for the abstraction layer becoming worthwhile is contrived beyond practical reality. When that happens, you have become cargo cult programmer.

So we add a layer of abstraction to our code under the guise of making it more extensible, usually without fully considering the key consequence of layers of abstraction: increased complexity. More layers to dig through to find problems. More code to maintain. More code to document. More code for the new guy to understand before he can be productive.

Beyond the human implications of having more complexity, don’t forget it’s harder on our servers too, because those extra layers of code also take more CPU cycles to calculate.

My suggestion? When you add a layer of abstraction, make sure that it’s needed in the foreseeable future, and the scenario for it’s use is very likely and thought through, otherwise, skip it and keep things simple.

Published by nick on 22 Sep 2007

Don’t use YAML for PHP, use parse_ini_file

YAML is a syntax for configuration files.

Huh?

Why on earth do we need another technology for config files?

I first ran across YAML while working with Symfony. YAML’s use in PHP is especially troubling, because PHP has a built in function for parsing config files, parse_ini_file(). This config file syntax is the same as the php.ini file, so it is well known by all PHP system administrators. It’s human readable, supports basic name/value pairs, and allows for comments.

It’s also high performance, since it’s a built in function, config files are parsed with the speed of C instead of text processing with PHP. This may not matter much for Joe Bloe’s blog website, but when you have lots of users, this makes a difference.

From what I saw with it’s use within symfony, everything could have been easily done using parse_ini_file(), and to make matters worse, when it was noticed that there where performance problems with parsing YAML, the Symfony authors decided to add a caching layer. Great. More complexity. This violates the number one rule of good software: Simplicity.

The Symfony authors should have used an existing technology that was built into PHP for handling config settings. Don’t make the same mistake.

Published by nick on 16 Sep 2007

sudo make me a sandwich

If only our wives behaved like our servers:

Man: Make me a sandwich. Woman: No. Man: sudo Make me a sandwich. Woman: Okay.

For those that don’t get it, sudo is a command in unix that executes the command as the super user, root. Often times you are working with unix and you try to do something as a normal user, but get a permission denied message, and you use sudo with the same command to run it as root, and it works.

-Nick

Published by nick on 16 Sep 2007

Web 4.0

Very interesting write up on what could be coming.

http://sethgodin.typepad.com/seths_blog/2007/01/web4.html

Some of us are likely to freak out about privacy concerns… but I for one am eager to embrace some of the cool technology innovations mentioned here, such as:

“I’m late for a dinner. My GPS phone knows this (because it has my calendar, my location, and the traffic status). So, it tells me, and then it alerts the people who are waiting for me.”

-Nick

Published by nick on 16 Sep 2007

REST for the real world

Then I looked at wikipedia for the definition of REST, and it let me down. It told me that it stood for Representational State Transfer, about the fundamentals of the idea, as it relates to Roy Fielding’s doctoral dissertation, but I didn’t find it very helpful. I was left asking:

How does this apply in the real world?

I don’t know about you guys, but for me, when a technical exploration winds up centered in academic trivia, it’s frustrating. I’m always looking for the real world implications.

I kept looking beyond wikipedia, and I now understand it’s usage with in Yahoo!, and I think it has potential implications for all - so I’ve decided to share my findings.

Here’s how REST can be used in the real world (at least my version).

Take a typical CRUD app that has to do the following:

CREATE
READ
UPDATE
DELETE

Overlay this on a typical web CRUD framework, and you have a resulting hypothetical file structure that looks like something like this:

CREATE /people/create?username=smartguy&pass=geeksforfun
READ /people/edit?id=323
UPDATE /people/update?id=323&username=smartguy&pass=geeksforfun
DELETE /people/delete?id=323

This should look familiar to all geeks. It can be handled with separate files for the different actions, or use an MVC framework with a controller that intercepts the call and redirects it to a specific action.

Now, consider the built in HTTP methods GET, PUT, POST, DELETE. We all know GET and POST. PUT and DELETE are less common, but they’ve existed since HTTP 1.1. Notice the following associations:

CREATE -> PUT
READ -> GET
UPDATE -> POST
DELETE -> DELETE

Some of you may have just had the light bulbs start to flicker. Next step. Let’s put these together! While we are at it, shorten the urls in the name of SEO. Let’s take a look at our new url structure:

CREATE - PUT /people/
READ - GET /people/323
UPDATE - POST /people/323
DELETE - DELETE /people/323

This is the first 1/3rd of REST. Using an appropriate HTTP request methods to trigger the corresponding CRUD action. Your accepting application should be checking the request method (which is available
in PHP as $_SERVER[’HTTP_REQUEST_METHOD’]) and use that to fork the appropriate action.

Now, the second component of REST is using HTTP status codes for error handling. We already have plenty of status codes built into HTTP. Let’s use them. Here’s a few samples:

200 if it worked.
403 for permission denied.
304 for not modified.
404 for an invalid id
50x for errors

More lightbulbs?

The third component of REST is handling different representations of the data. This is accomplished by specifying an extension on the request.

GET /people/323.txt
GET /people/323.csv
GET /people/323.pdf
GET /people/323.xml

The above calls would all send back the same data, but in different formats based on the extension.

And the fourth component; and this one is key; is that the same methodologies are used by all the data types.

Ie.
GET /people/323.txt
GET /stuff/323.txt
GET /things/323.txt
PUT /stuff/
DELETE /things/323.xml

So you have consistency amongst all of the data elements. They all accept a common set of actions. so that all the web services deal in the same way.

In summary, to build a RESTful web service, it should:
1) Use the GET, POST, PUT, DELETE request methods defined by HTTP to trigger the various actions
2) Take advantage of the extension that the request is using to determine the content type of the response (xml, csv, txt, jpg, etc)
3) Use HTTP status codes (200, 304, 400, 405) to indicate the status of the operation.
4) Be simple and consistent. Implement same methods for CRUDding across all data elements.

Interesting model for web services.

Some may be asking. Isn’t this just SOAP or xml-rpc? Kinda, in that they are all methodologies for implementing web services. REST relies more on HTTP methods and status codes instead of XML for triggering actions.

Additional reading that I found useful:
http://doc.opengarden.org/Articles/REST_for_the_Rest_of_Us
http://webservices.xml.com/pub/a/ws/2003/09/30/soa.html
http://www.xfront.com/REST-Web-Services.html
http://tech.groups.yahoo.com/group/rest-discuss/messages

-Nick

Published by nick on 14 Sep 2007

Teach your kids web development

It’s been a fun experience teaching Conor, my oldest son, about HTML. He’s 11 years old. He’s excited, and I have to admit that I am too. There are some thought provoking questions that come up when you start teaching your children your trade. You wind up questioning your own career. Is this what’s write for him? Do I really want him to be me? You see, workaholism is a blessing, but it’s also a disease. Shoudn’t he be allowed to just be a kid?

Well, I’ve wrestled enough with that moral quandry enough for now. Hell, I am not paying for his college education, so the boy needs to learn. My focus has become how to teach him web development. Where to start with web development?

My first mantra was that he needs to understand vi[m]. F**k WYSIWIG editors. He’s going to learn the right way, right from the start, with no crutches. Conor. Repeat after me. “WYSIWIG editors are for wussies”.

This means he also needs to learn SSH. I recently ought him a little iBook. Did you know that there is a school program for kids with laptops? Amazing! Imagine if Einstein would have had access to e-mail from the time he was 11. He wouldn’t have come up with the theory of relativity, but I bet we’d have some revolutionary ways to view porn.

The first day was motly unix basics. Vi, ssh, cd, creating accounts, terminal on OSX. He learned the basics of file systems: cp, mv, rm. He learned when [not] to use sudo. He learned when to STOP and ask, and when it was ok to experiment. So many people stunt their learning because they don’t know when it’s ok to mess around.

He learned how to make a change with vi over an SSH terminal, then view it in a browser. Surprisingly, he picked up vi pretty well — although he uses the arrow keys instead of j/k/l/m. Damn vim. It spoils new-comers. When I was his age… I used vi, which forced you to use j/k/l/m for up/down/left/right.

The second day he learned the basic structure of HTML, bold, center, italic, img, anchor tags. Rather uneventful, he learned very quickly. The light bulbs started going off when I told him that he could create a web page for his little brothers and sisters.

The third day we worked on HTML tables. This was interesting exercise. Should I even teach him tables, or should I just leap frog tables and go straight to CSS? I decided it was best for him to learn tables, to know what they are, and I still think that they are the best way for tabular data to be represented, even though I haven’t updated my resume to use table-less layout. Someday…

On the fourth day we sat down, and I created a diagram of an HTML page using Omnigraffle (diagram tool for Mac), and then had Conor recreate this using html. He learned all about tables, rows, columns, alignment, colspan, and background color.

Then, I showed him divs had him recreate the same table using no tables. Now he’s working on updating my resume to use a tableless layout. Look at that, he’s already being productive. I hope he knows what he’s in for.

What’s up next? PHP? Javascript? I’m not sure he’s ready. We will let him get good with HTML and CSS first.

Published by nick on 14 Sep 2007

How to get started in computers

Every accomplished computer geek has had someone come up to them and ask them how they get started in computers. Sometimes it’s an awkward conversation, because the person obviously isn’t qualified. It usually goes something like this.

Jimmy: "So what is it that you do for a living?"

(thinking to myself): So you mean you can’t tell by the glasses and the bad choice in clothes?

Me: “I’m a computer geek.”

(Jimmy gets excited)

Jimmy: “Where do you work?”

(thinking to myself): I shouldn’t tell him, he’ll be following me around all night.

Me: “I work for Yahoo!”

Jimmy: “Oh wow! I use Yahoo! Mail. But didn’t Google buy them?”

Me: Sigh. “No. Yahoo and Google are completely separate companies. Google does very well with search. Yahoo does other things very well.” (comparison)

Jimmy: “You know, I’ve been wanting to get started in computers. All the people around me are afraid of ‘em, but not me. I got e-mail and youtube and everything.”

Me: “Huh. Yeah. E-mail is pretty cool. And did you seen the fire fart video on Youtube?”

(thinking to myself): Hopefully the firefart video will distract him.

Jimmy: “So do you think you can get me a job at Yahoo! doing computer stuff?”

Me: “Maybe in a few years. Just keep surfing around and teach yourself. That’s all I did.”

When people talk to me, they are often amazed that I’m self-taught. “You mean you didn’t go to school for this stuff?” They always want to know how I got started. That’s a whole ‘nother story.