In data centre design, a key requirement is to make sure that the operating servers contained within are always working. This is given in a number of terms which, from the outside, must look like they are interchangeable. What is resilience, redundancy, and availability? Are these calculated? How do they apply to data centre facilities?
To start with the easy wins. Redundancy, this is the installation of additional equipment to pick up should another item fail. So, for the system to work, it requires a multiple of 5 units (this could be any number and is referred to as ‘N’). So N is the number of units (items of equipment) required to make the system work, N+1 would have N=5 example, the total number of units of equipment is 6. The brings us to our next term, Resilience, in this case the resilience of our system is N+1.
As Engineers, we are interested in numbers! So how can we express the resilience of a system numerically. In this case we are now dealing with an Availability calculation.
Availability is a term given to the percentage of time….
The availability of a data centre or system is typically expressed as a percentage of time, in data centres, it is stated as ‘uptime’ quantified by the number of ‘9s’ as in 5 9’S (or 99.999%) which equates to 5.3 minutes per year downtime. This is impressive when compared with an ‘uptime’ per year of 8,760 hours. The calculation is a projection based on its level of redundant equipment (generators, cooling systems, UPS (Uninterruptible Power Supply) and back-up systems, etc) and the sophistication of its fault tolerant control systems. The Availability Calculation is usually completed in the early design phase as an advisory for the final system (N+1, 2N+N, etc) keeping the project costs down and offering best value to the client.
Refer to IEEE Std 493-2007 (Gold Book) – MDT (Mean Down Time), MTTF (Mean Time to Failure), MTBF (Mean Time Between Failure) & MTTR (Mean Time to Repair).
It is important to note that an inadequate design (limited redundancy or ineffective fault tolerant control) may result in excessive downtime with a slow response to failures. The National Archives and Records Administration in Washington D.C. shows that 93% * of businesses that have lost availability in their Data Centre for 10 days or more have filed for bankruptcy within one year. The cost of a single episode of downtime can cripple an organisation specifically an e-business where their competitors are one click away.
JDA (Mission Critical Systems) with over 100 years of experience and specialised skills can assess your current or future Data Centre system and advise with the required calculations and design guidance. If you have an application or query, please get in touch, we would be happy to help.
Excerpt of typical Availability block diagram