RESTful Web Services, a book review
After hearing about it on an ITC podcast (also mentioned in previous post) I ordered the “RESTful Web Services” book from Amazon (and got into an argument with Sam Ruby regarding Python pain points; he seems to be a Ruby-convert these days). It arrived last week, and I spent a lot of this weekend browsing through the book, and this is a review of sorts.
This book has a star rating of 5 out of 5 on Amazon, and I agree it’s worth it. It covers a lot of ground and includes material rarely found elsewhere.
The first part covers REST basics, like the meaning and usage of the HTTP methods GET, PUT, DELETE, POST, as well as HEAD. I’m pretty familiar with that part of REST and HTTP, but people that don’t know the basic premise of REST, and haven’t heard of conditional GETs, partial GETs, caching, content negotiation, compression, and the various HTTP return codes should read the first part carefully; the whole premise of REST is to use HTTP the way it was designed, so you’d better know HTTP. The REST Wikipedia entry gives a good basic intro to REST. I find many web site optimization sites give a good introduction to HTTP features.
The authors also show how to use existing best-practice web services such as Amazon’s S3 and Google.
The meat of the book is in the middle where the authors show how to design a read-only and read-write web application in a RESTful way. There were a few new and interesting things for me:
- A step-by-step guide to designing a RESTful web services app
- A discussion of URI design conventions
- An overview of hypermedia formats (e.g., XML and microformats) that can be used to represent state
- Discussions about the tricky situations where REST just doesn’t seem to fit, e.g., when you need transactions
I’ll summarize the first three.
A guide for building RESTful web services apps
Or, as they say in the book, a procedure for building resource-oriented architectures:
- Figure out the data set
- Split the data set into resources
For each kind of resource: - Name the resources with URIs (see below)
- Expose a subset of the uniform interface (GET, PUT, DELETE, POST, HEAD)
- Design the representation(s) accepted from the client (see below)
- Design the representation(s) served to the client (see below)
- Integrate this resource into existing resources, using hypermedia links and forms
- Consider the typical course of events: what’s supposed to happen?
- Consider error conditions: what might go wrong?
URI design conventions
This is important, and fortunately easy to get right once you know how to align yourself with the biggest conventions:
- use path variables to encode hierarchy, e.g.,
/parent/child - use punctuation characters in path variables to avoid implying hierarchy where non exists
- Use semicolon when the order is not important, e.g.,
/parent/child1;child2 - Use comma where the order is important, e.g.,
/coordinate/123.12,42.5
- Use semicolon when the order is not important, e.g.,
- use query variables to imply inputs into an algorithm, e.g.,
/search?q=jellyfish&lang=zh - if you need
key=valuepairs in the middle of the URI you can use matrix URIs, e.g.,/demographic/age=30;gender=m/persons/
Google’s GData Reference explains how GData adds some more URI requirements:
- categories should be encoded in the URI prepending them with
/-/, for example/classifieds/-/jobs/available arguments in the query string that have special meaning include:
qfor full-text queriesauthorfor author namesupdated-min/maxto specify ranges which documents must have been updated inpublished-min/maxis similar toupdated-min/maxbut for the publication time instead of updated timemax-resultslimits number of documents per pagestart-indexsets which page to start from
Hypermedia formats
There are many data/representation formats in the world. XML lead the last revolution, going from obscure binary formats to standardized textual formats.
But although XML is good as an interchange format, it’s not exactly very readable by humans. For that we have a more specific XML document type, HTML. But HTML the way it’s been used before is mostly geared towards visual presentation for humans. Isn’t there a format that can satisfy both humans and computers?
Enter microformats, or the method of using specific HTML tags and CSS class names to mark up normal HTML so that it’s both easy to visually render nicely, and to parse and semantically understand by programs. Most formats you would like to represent are either defined or being defined already: events, contact information (people, organizations, companies, places), reviews, resumes, geographic locations, Atom (weblog) entries, voting links, and listings and classifieds.
Often standardization is just agreeing on names for concepts, their meaning, and their relationships. There is an important work called the Dublin Core that defines common attributes for all content objects. For lists of content objects, RSS and Atom have defined the terms.
I was intrigued by how the different standards built on top of each other, and how it would be possible to achieve maximum compatibility and richness.
- Atom seems to be the base for many initiatives
- OpenSearch is Atom with some more information useful for search results, like total number of results
- Google’s GData includes OpenSearch and adds even more attributes
- The Atom Publishing Protocol defines a web service for the typical case of listings and publishing Atom entries; any blog service that supports this could be used by 3rd party blog publishing applications
A normal web application consists of list pages and detail pages, and it should be possible to make most detail pages use microformats, and most list pages use microformats and provide a GData-compatible feed and query mechanisms.