We had some downtime on a large chunk of our applications today. I’ll be looking into it more tomorrow but it looks like it was caused by the failure of the root partition on one of our application servers.
We didn’t loose any data but it did take a good hour or more to rebuild and reconfigure the effected server.
Apologies to our users, it pains us to see all those unfulfilled requests come in! The one good thing to come out of these situations is to get us to hone our architecture and procedures so service is not interrupted when hardware fails.