Thursday, 10 February 2011

Cloud storage security

I read this article and was happy to see someone agreeing with me on the issue of cloud storage. The idea of having to trust the same number of companies as web applications that you use is silly. If you trust just one cloud storage provider, then yes, web application using that instead of their own storage means you just have to learn to trust one company. One thing to remember in that case is that you might well transmit your sensitive data to the application provider first before they pass it onto the storage provider. That somewhat defeats the point and is something our Google Docs storage implementation on mxgraph.com is currently guilty of.

But then, someone who works with me said "who the hell stores sensitive information" on the cloud. Well, we do in Google Docs and you probably want to assume information is sensitive, rather than deal with each document on a case by case basis. Cloud storage providers need to provide JavaScript library implementation of the their storage APIs so the client goes directly to the store for loads and saves. Oh, and if you're listening storage providers, how about a JS user interface implementation so we don't have to do it? .

So now whether or not the application provider is secure doesn't matter? Well, not quite, since very few applications don't go back to the server at all (let's face it, virtually none). We have to go back for image export and pdf export on mxgraph.com, of course to create these the server has to see the current data, you have to trust us once more (your mother would approve of Gaudenz, me, no chance). Our solution is going to be to create the images and pdfs on the client, using custom code. It's not quite as hard as it sounds, and removes the issue of the back-end server needing to implement any custom rendering that the JavaScript client receives. Then we'll completely client-side, with the load/save going straight to cloud storage provider, everyone is happy.

Then we add real-time diagram sharing to mxgraph.com. Oh. Now we need hanging requests on a server for everyone listening to a shared diagram, so they can receive updates. Even if the changes transmitted are just deltas, the original diagram can't go straight from storage to sharing users because they are not the account holder, so the web application servers are handling the data again.

And so on, you get the idea.

Can you be expected to trust the data handling of every web application provider, no. Is it easy develop client-side functionality that avoid the server ever seeing the data, even if it's not persisted there, no. The solution is virtualization and/or private clouds, private clouds really being a super-set of the former. Eucalyptus seem to be leading the way enabling you to create your own hardware resources safe behind your firewall and run Amazon machine images and VMWare images on the collective hardware resource. Will this become the norm? Hard to say, but it's catching on in the bigger IT departments. It does present web application providers with a new headache though, to control their server hidden behind a firewall, possibly unable to communicate with it. Let's not forget this has been the norm for long time with license keys and so on of desktop server installs, the control providers have in cloud solutions in spoiling them somewhat. This setup makes the users life easier and easier, but web application providers clearly prefer the current public way of doing things and you'll most probably find them fighting to argue the case that they are secure so they remain in control of the hardware.

No comments:

Post a Comment