And one of my professional failures

By fallback, I mean the ability of a business application to run on an alternative platform when the primary platform is unavailable.

Fallback is commonly provided in business computing, even in these days of fault-tolerant systems and networks which provide a better business solution in many cases.

Some years ago I worked for a large financial institution which had one major data processing site. We had two high-end mainframes, one slightly larger than the other.

The big box was for production. The smaller box was for testing, and also to provided fallback for critical applications that ran on the production system. This was a typical setup in those days.

I never got anyone to see that this was crazy.

Let me backtrack a little. The smaller testing box ran the same security software as the production box, served by the same operators in the same computer room. There were many shared devices including many strings of hard disk drives (which we called DASD in those days). There was no reason that some production shouldn't run on the testing box, and from time to time it did.

Everyone knew that, in the case of a problem with the production box, there was every chance that not all of production would be able to run on the testing box, so there was a list of critical applications for which fallback to the testing box was supported. The owners of these applications (notionally) paid extra for this facility. It was also an open secret that there were some applications that weren't on the list because their canny owners knew that, in the event of such a catastrophe, they would be seen as essential anyway, so they kept their budgets down by not paying for fallback.

My proposal? Simply swap them over. Run critical production on the little box, and everything else (testing and non-critical production) on the big one.

The advantages:

The list of critical production would be realistic. Everyone would know that, in the event of a calamity to the big box, there is no fallback for it or its applications. Nobody could possibly plead that they thought that there was. If you agree to run your application on the big box, which is notionally cheaper, then you agree by that decision that it's not a critical application.

Far better fallback in the event of a calamity to the production box. You instantly commandeer the big box and start up critical production on it, and only critical production. And again, everyone knows that this is what will happen. The appeal of recovering from an emergency by having more capacity for critical production surely needs no selling!

Fallback from unforeseen capacity emergencies to critical production. If your capacity planners get it wrong, you can fall back to the bigger box. It's not an attractive option, but it's an option you don't even have otherwise.

The disadvantages?

You'll find out the real capacity your critical production needs. Ummm, is that a problem?

Not long after I first had this discussion with my bosses, we built a second data centre. Again, the same mistake was made. The larger centre ran production. The smaller one ran testing, and was the fallback site for the larger one in the event that a jet landed on it. And we had expensive weekend-long tests to make sure that the fallback plan would work.

fallback theory

By fallback, I mean the ability of a business application to run on an alternative platform when the primary platform is unavailable.

I never got anyone to see that this was crazy.

The same rules apply. And they apply to fields outside of IT as well.