What's your maximum tolerable outage?

If the air conditioning breaks down in a hospital administration department in the height of summer, productivity starts to drop as the temperature rises. It becomes harder to stay focused on the task at hand, people get crabbier on the telephone with patients and suppliers and the “go the extra mile” motivation your organisation normally prides itself on wanes significantly. Maximum tolerable outage? A day or so, perhaps. On the other hand if the air conditioning stops functioning in your data centre, then your servers may stop functioning too. Maximum tolerable outage? For vital medical systems, perhaps one minute – if that.

In other words, maximum tolerable outage can vary considerably from one part of an organisation to another. So therefore can the appropriate solutions if things break. IT in an investment bank is another example that can go from one extreme to another: at the most critical level, systems for trading in financial markets may only be able to sustain a maximum of 10 – 15 minutes of outage; at the least critical level, it may be perfectly acceptable to simply plan to order new machines and have them operational in a week or so.

The current trend is that maximum tolerable outages are shrinking and so are recovery time objectives (RTOs). In a Gartner Group survey from 2010*, 63% of respondents declared their RTOs for mission critical businesses to be less than 24 hours. But the same survey also found that at the other end of the scale too few organisations were planning for an outage time frame longer than seven days. Some disasters can go on for a long time as floods and postal strikes in different countries have already demonstrated. Organisations still need to ask themselves the same question for disruptions with time windows of days or even weeks, but that can nonetheless finish off an organisation by slow strangulation if the outage is just too long.