fallback theory


A page of technology

 

And one of my professional failures

 

By fallback, I mean the ability of a business application to run on an alternative platform when the primary platform is unavailable.

 

Fallback is commonly provided in business computing, even in these days of fault-tolerant systems and networks which provide a better business solution in many cases.

 

Some years ago I worked for a large financial institution which had one major data processing site. We had two high-end mainframes, one slightly larger than the other.

 

The big box was for production. The smaller box was for testing, and also to provided fallback for critical applications that ran on the production system. This was a typical setup in those days.

 

I never got anyone to see that this was crazy.

 

Let me backtrack a little. The smaller testing box ran the same security software as the production box, served by the same operators in the same computer room. There were many shared devices including many strings of hard disk drives (which we called DASD in those days). There was no reason that some production shouldn't run on the testing box, and from time to time it did.

 

Everyone knew that, in the case of a problem with the production box, there was every chance that not all of production would be able to run on the testing box, so there was a list of critical applications for which fallback to the testing box was supported. The owners of these applications (notionally) paid extra for this facility. It was also an open secret that there were some applications that weren't on the list because their canny owners knew that, in the event of such a catastrophe, they would be seen as essential anyway, so they kept their budgets down by not paying for fallback.

 

My proposal? Simply swap them over. Run critical production on the little box, and everything else (testing and non-critical production) on the big one.

 

The advantages:

 

 

 

 

The disadvantages?

 

 

Not long after I first had this discussion with my bosses, we built a second data centre. Again, the same mistake was made. The larger centre ran production. The smaller one ran testing, and was the fallback site for the larger one in the event that a jet landed on it. And we had expensive weekend-long tests to make sure that the fallback plan would work. 

 

The same rules apply. And they apply to fields outside of IT as well.