?

Log in

No account? Create an account
Who, me? [userpic]

A Parcel of Penguins (geekery)

January 7th, 2004 (09:12 pm)
current mood: geeky
current song: Annwn - The Bard's Exhortation to the Salaryman

I've been thinking for the past few months about the possibility of doing a Linux storage cluster. Such a thing would serve the same sort of role as a disk array, but would be vastly cheaper, because it would be built out of commodity hardware. Companies like Hitachi charge something like $5,000 per terabyte; I think a cluster could be built for more like $1,600 per TB.

I'm calling it Parcel of Penguins, because "parcel of penguins" is (supposedly) a collective noun for penguins.

There are already a couple of companies who have announced such things; I'm interested in developing it as an open-source project. Of course, I'll eventually need an actual cluster, at least a small one; but I can start out by faking it with multiple sets of files on a single machine (even my handheld, if I want to get really silly).

The basic idea is a distributed hashtable: items stored in the cluster are indexed by a key, and the key gets hashed to decide which node stores the item. The hard part is reliability; I need to be able to assure that writes don't get lost, even if one of the nodes that should store the item is currently offline. Some sort of transaction log is probably called for.