Dave’s Blog

Because my handwriting is awful

To centralise or not to centralise 02 Oct 2005

Filed under: architecture, centralisation, essays, microformats, thoughts, web-services — David House @ 8:49 pm

There seems to be a simultaneous movement toward both centralising services and decentralising services.

Web apps like Gmail, Flickr and del.icio.us, ever more popular, are inherently centralised: everyone’s email and access to that email is kept on one network. In a similar vein, the rise of things like REST and the Google/Yahoo APIs mean that we’re relying on centralised ‘web services’ for our data.

Centralisation makes it easier for the centralised web service to provide statistics on everyone’s data. If the service provides an API, it makes it easier for anyone to get access to said statistics, which has some extremely expansive possibilities.

On the flip side we have microformats. The microformat principle is to make it easy to publish data, but do so in a standard way. In an ideal, microformat-filled world, anyone wanting to aggregate everyone’s data can just build a crawler and parser. Compare that to querying one centralised service’s API, and you have the essential difference between centralisation and decentralisation.

Initially, it seems that the microformat guys have got things the wrong way round. Querying one service is a lot easier than writing a crawler. The justification for the decentralised model is that there are a lot of people writing content less people writing parsers. Thus they’ve made microformats easier to write than to parse.

However, they’ve missed something here. Even though only a minority of people are writing parsers, the reading audience is generally much larger than the producers. Therefore we should make it as easy as possible for people to get access to as much data as possible. Centralised websites like Odeo do this: put the content in one place, then it’s easier for people to get at it.

But then the web wasn’t built that way. Imagine if all the web’s content was centralised into one giant index. Creating a new page wouldn’t be as easy as write-upload, it would be write-inform the index-wait-wait a bit longer-upload. Hmm. I don’t think so. The model the web has followed is a decentralised network. To find anything, you use a crawler. We could use this model for podcasts as well: write a microformat for saying ‘this is a podcast’, then Google writes a ‘Podcast search’ and that becomes everyone’s front door.

I mentioned in the last paragraph that centralising all content in an index makes it slower and more difficult to publish. This isn’t necessarily true: Flickr thrives because it’s easier to upload your photos to Flickr than it is to set up your own website and publish them there. It’s just the fact that the web wasn’t designed for sharing photos, so you’d need to write software if you wanted to share photos on your site. Flickr happens to be a very good system already in place.

So, it seems that centralisation and decentralisation both have their advantages. It seems that for sharing bookmarks and photos, a centralised system is generally easier and more natural. For just generic informational content, the decentralised system that is the web is the way forward.

Let me just finish by mentioning that I avoided using blogging as an example thus far because it’s an interesting case-in-point. Blogging has traditionally been available in two flavours, centralised systems like blogger (and now wordpress.com), or decentralised software like Movable Type and, of course, WordPress. Who knows where this will end up?

 

Microformats from a programmer 12 Sep 2005

Filed under: comfort, development, essays, microformats — David House @ 8:29 pm

If you’re interested in structured data or writing parsers for commonly occuring publishments on the web like blog posts, calendars, contact details and so on, then you should check out microformats. Specifically, download the MP3 of the microformats discussion at BarCamp. It serves as a great introduction to mircroformats from two people that really know what they’re talking about, Ryan King and Kevin Marks.

Even though I’d heard of microformats before, I still think I learned a lot from listening through the 70 minutes of audio. They talked a lot about the principles for microformat design, which make a lot of sense when explained well. Microformats are a good idea from a producer/publishers point of view because they’re designed to be easy to read and write, and to fit into the already existing software.

However, I’m not really a producer or publisher; I definately fit into the programmer category. What makes microformats really cool for programmers, though, is the fact that they’re standardising problems that every blogger solves every day in his or her own way. They’re saying (for example), ‘If you’re marking up a series of events, do it this way’. This convergence of implicit structure built on convention to explicit structure built on a formal specification is really exciting. If support for microformats really gets rolling, that means that everyone is marking up their events in the same way. As microformats are also designed to be machine-parseable, that means that everyone’s events are suddenly machine readable and, more excitingly, aggregatable. Just think through the cool things that could be built on a tool that crawls web pages for hCalendar snippets and aggregates them into a centralised list. And that’s just a single microformat. There are so many more. Score one for the openness of the web as a generic publishing platform.

You probably don’t know, but I’m working on a generic compound microformat parser, Comfort. The current release does fairly well at parsing the basic hCalendar spat out by Comfort’s parent project, bCal, however I’m working on it to be a more useful parser. I’ll be throwing a lot of data at it because I want this to grow into a really powerful parser. Microformats aren’t exactly easy to parse (as they’re designed for humans first, all the trade-offs and comprimises that are a natural part of specification design make microformats that bit more difficult to parse), but I think there’s a real need for a generalised compound microformat parser. It would have to be easy to extend, as the microformat project is and hopefully will continue to grow as unbelievable rates.

If you’ve never seen microformats before, do check out microformats.org and specifically read the introduction on their wiki. Whether you’re just Average Joe blogger or are interested in building tools that can pull from the biggest database of all, microformats should be the way of the future. Lets just hope they aren’t crushed by the corporate weight of the W3C and its Semantic Web.