The bigger IT systems get, the more complex they get, the more chance there is a failure somewhere inside and a need for disaster recovery. It’s mathematical – as you multiply the number of components or the number of computer procedures called, you multiply the possibilities for something to go wrong. Even the biggest guns in computing experience this. Google has experienced problems recently with its Google Drive data storage, and also with its mail and application services. American Airlines suffered a technology failure in April that forced it to ground all its planes in the US for several hours on end. Does this mean that failure will eventually become a certainty as systems get even more complex?
However, as systems get more and more sophisticated, so do the possibilities for identifying and preventing or dancing around problems before they occur. Predictive maintenance is already built into the IT server offerings of different vendors, allowing customers or their support providers to receive alerts before situations become critical. Research is underway to create possibilities of self-healing code in very complex systems, where the software does what’s necessary all by itself, even before humans find out what’s going on.
Yet large IT systems apparently had minds of their own before software engineers started to graft on self-healing capabilities. Decades ago, system designers were aware of a propensity for complex software applications to develop a kind of irrationality, once they became sufficiently complicated. Will self-healing code be any better in this respect, or will it only help to adjust probabilities of failure downwards without ever making them equal to zero? It looks like the need to plan for disaster recovery from internal systems problems will be with us for a while longer – or even forever, if software systems end up as emulations of the human minds that created them.