Data Center Power During Outages and Heatwaves

keeping the internet going during a power outage

    1. Determine How the Loss Happened

    When tragedy strikes, one of the best things we can do is learn from it. Knowing what caused your outage can prevent it from happening again. For instance, if an employee error caused the outage, that’s a sign that it’s time for a training update.

    Planning Your Backup Systems

    During the design of your data center – whether it’s just a small computer room with a few servers and a single air conditioner or a large multi-level facility with supercomputers – the necessary backup systems have to be in place. At a minimum, the backup power system should be in the form of UPS to allow for temporary power outages, and fluctuations and provide sufficient time for a graceful shutdown. A UPS is however just a short-term backup power. It provides sufficient power during the transition between mainline power and generators. Power outages usually last between 1 and 6 hours, and this is too long a time for the servers to be working from batteries alone.

    If you have a small server room and no generator backup, then the switching of the UPS to battery mode should signal the servers to enter a controlled shutdown after a few minutes, if the main power doesn’t return during this time.

    A mainline power failure has serious implications for the cooling systems of the Data Center. HVAC and chilled water plant systems cannot function on UPS battery power. Temperature can quickly elevate to levels that can result in server failures and thermal shutdown. Operating IT equipment at elevated temperatures will also affect its longevity. National Instruments tests noted that just a 5°C increase in temperature can reduce a hard drive’s life by as much as two years.

    Properly sized backup power generators should be installed to ensure the continuous operation of Data Center cooling systems during power outages. In addition, an environmental monitoring system (EMS) in the Data Center hall will monitor and alert should temperatures be outside of the prescribed range. Power monitoring and remote generator monitoring systems ensure that backup power systems are ready when called upon and capable of maintaining the loads put upon them.

    In larger Data Centers, generators are an important component of the backup system. After a mainline power failure, the UPS devices take over providing the power. There is a period where power is transferred to the generator. The generator needs to start up and stabilize to provide power. Some larger Data Centers have a rotary flywheel-powered generator, which in itself can fully power the infrastructure for a few seconds until the diesel-powered generators can startup.

    During this power transfer period, it is important to monitor the temperature and humidity levels in the Data Center. There is a risk of a generator failing to start, which will leave the cooling system offline, and at the end of UPS runtime lead to a total Data Center outage. Generators can also fail after they have started and run for a few hours. This could be a result of a poor generator maintenance schedule or a lack of sufficient fuel. Both of these scenarios can be negated by a suitable generator remote monitoring system. If such an event occurs, where generators fail to start data centers need automated server shut down solutions to gracefully shut down the servers to avoid data loss.

    HVAC and Chilled Water cooling systems can also fail even if there is no power outage. If the cooling system was sized for the typical ambient temperature and heat loads of the Data Center equipment, the added stress of high than normal ambient temperatures of a heatwave can result in system breakdowns. It is typical therefore in large data centers to have failover backup systems. Redundant compressors can take over in the event of failure, or be turned on during times of increased demand for example.

    Depending on the tier classification of your Data Center the backup power systems should be designed with redundancy in mind; if one piece of equipment fails, there should be an alternative to take its place. Since this type of redundancy increases costs, this kind of redundancy is mostly for larger Data Centers. If you are aiming for Tier IV Data Center classification only 0.8 hours of annual downtime is allowed. This requires all power and cooling components to be 2N redundant, meaning there should be two independent systems each capable of carrying the full Data Center load. Other Tier classes require varying levels of backup power and cooling systems.

    Uptime Institute’s Global Data Center Survey gathered responses from nearly 900 Data Center operators and IT practitioners from both major Data Center providers and from private, company-owned Data Centers. It has been found that the Power Usage Effectiveness (PUE) of Data Centers hit an all-time low of 1.58 in recent years. By contrast, the average PUE in 2007 was 2.5, dropping to 1.98 in 2011, and down to 1.65 in the 2013 survey. By 2018 PUE had hit 1.58 but then slightly increased in 2019 to 1.67.

    PUE is a measure of energy efficiency in the Data Center. It not only is an indicator of the carbon footprint and operational cost but is also used to calculate the power needed to operate and cool a Data Center. A PUE of 2 will mean that for every Watt of power to run the Data Center, another Watt is needed to cool it. A PUE of 1.5 means for every Watt the IT systems use, half of a Watt is needed for cooling. So, lowering the PUE is an important measure of the amount of power required. It should be factored into the backup power requirements for the Data Center.

