Wednesday, July 20, 2005

The search for storage Part I (Editorial)

Storage... The hot topic which is nearly always vaporized into the future but never quite gets into the present. But what's so hard? It sound simple doesn't it? Why is it that so many object storages never get delivered (WinFS, Screens Object Storage, Cairo Object Store, Gnome Storage...)? The file system was a very basic design where it did x and y and nothing else. It had folders and files with basic meta-data and although you had NTFS and BeOS with extra attributes, they still were a file system. That is the success of the file system - that its simple. It might not be what people want, but its simple enough to explain and implement. Object storages on the other hand try to be the 'perfect file system' and promise to solve every file system problem you have like file location independent, description of file contents, search, filter, sort, slice and dice your information in any way you want where ever you are. And this search for perfection sadly never goes anywhere because nothing is perfect and designs that sound good not always work out when you need to write the implementation. You suddenly find huge performance bottlenecks, problems with compatibility and scalability over unknown systems and you get frustrated and in the end give up or push the date of release further and further away. No one wants to be the middle step between the file system and the perfect storage because of fear of obsolete and then we are still stuck after 30-40 years with the same basic file tree structure with basic meta-data. But is this really bad? Does the actual file padigram need to change? This editorial checks a bit of this out. Its too long for all of it, so hopefully I will break it down into multiple parts: Part I: Simplicity One of the basic things we first do with data is store it somewhere. So where do we store it? Well... in a file system we store it under a folder. But wait... what if I want to store it under multiple folders? That would be cool now, wouldn't it? The ability for your files to be location independent, allowing you to access your files from multiple locations is like being able to get your money from any bank. Sound wonderful, but then what's the problem? The problem is quite simple. Computers are very fixed, they do X and Y and if X fails, you must know how to handle it and the same with if Y fails, you must know what to do with it. So while storing a file in multiple locations might be handy for a user but if I want to open a document, where do I open it from? If I delete the document, does it delete it from a specific location or from all locations. While a user can decide on this, computers cannot. Computers do linear actions and cannot decide by themselves. You must tell them where to look. So, lets take the bank example... Where is your money REALLY stored? In your specific one local bank you have your account in. The other banks just reference your money and transfer it over to them but your whole account exists only in one location in your single local bank. Lets apply this to the file system... Your documents exist under a single location but you can reference them anywhere. Sound familiar? Shortcuts, Aliases, Hard links, Soft links are all solutions to referencing your data. So where is the problem? Well all these solutions fail on certain areas and transparency is one of them. There should be no real difference between using a reference and the original apart from three actions: copy, move and delete (move is actually copy to new and delete old). Apart from those actions, everything else from open to properties should work on the original. That sounds good, and its implementation is not that hard to implement since you are basically mapping all methods apart from copy, move and delete to be passed on to the original object. But... Lets take this a step further... If I have a document in a specific location, I might want to associate with it location specific information. For example: if I have music file in one playlist, I might want it to be listed in a different position then in another playlist and since the position is stored in meta-data, what to do? Well... you make the reference become a derivation from the original, so the reference shows the original plus any changes by the reference. So the reference becomes the original+reference. So now what's the problem? Well... if I add data to the reference, its not going to modify the original and then what's the point of the reference in the first place. This just shows you how a simple modification can kill the entire idea and that's one of the problems we face when creating the perfect system. We are not perfect people and therefore cannot understand perfect ideas. We like simple solutions, not complex ones and so far, these proposed object storage systems have been more complex and full-featured than simple and basically become a bloated design which grows until you just cant understand what's its advantage in the first place. Tune in for the next part when I can get round to writing it...

No comments: