Keeping a highly available web application online is no joke. Everything above 99% is extremely impressive; that means that you battled the forces of erosion and probably even deployed some pretty neat features without even a hiccup from your users’ perspective. I always feel great when I get our weekly New Relic status report email – it’s a good indication of how well I did my job in the previous week. And for a couple weeks now I’m happy to report I’ve been very proud indeed, with 100% uptime on the Hipstamatic web application.
How do you achieve numbers like these? Unfortunately getting to 100% isn’t an easy road, and I want to state up front that I also don’t think it’s a realistic goal. Issues you can’t control can ruin your uptime number, and you shouldn’t feel broken up about that. It happens to everybody. But it’s always good setting goals that are difficult to achieve, and this one is no different.
So what’s the secret to 100% uptime?