Kriegbaum Hall Data Center Power Outage

Dear FPU Community,

 

Both the Facilities department and the IT Services department would like to let you know what happened regarding the FPU data center power outage on March 2 and what is being done to reduce the risk of issues like these happening in the future. 

 

Saturday, March 2, the university suffered from a power outage in the data center of Kriegbaum Hall (KBH).  Something caused a power loss at the data center which engaged the backup generator but power from the generator was not flowing to the data center.  IT staff responded as soon as notifications were received from the systems we monitor.  The root cause was not yet known and it took several hours working with Facilities and Quinn Power Systems to work through the systems from the generator down through the building to the data center until we found the problem was that was stopping the power from flowing back to the data center.  In the meantime, local battery backups in the data center were depleted causing systems to shut down.  Whatever had caused the power issue to occur had tripped several breakers throughout the system.  Once this was corrected, power began to flow and IT staff began bringing systems back up until everything was back online late Saturday evening.  There were some lingering issues later the next week with the telephone system that were addressed as IT Services was made aware.  Thank you to those IT and Facilities staff that worked on Saturday afternoon and beyond to help get things back up and running.

 

Monday morning March 4,  Facilities determined that the root causes were a defective breaker and an air conditioning unit for the data center.  When checking this breaker it took down the entire campus for several minutes by tripping the 2000 amp main all the way back at the central plant.

 

In the time since the incident occurred, we have taken several steps to mitigate the risk of another incident like this and are forming plans to address power for the data center moving forward in addition to continuing our plan to move services to the cloud.  Below is a list of initiatives that were identified and are currently being addressed:

 

  1. Facilities will map the electrical flow to the data center as it related to the generator.  This includes identifying all of the components related to this system, creating a flow chart, and clearly labeling all panels and breakers used for this system. 
  2. Quinn noted that the generator was in need of repair and Facilities will schedule the needed repairs
  3. The generator needs to be on a dedicated circuit and not tied into any other systems, Facilities will address this
  4. Several older battery backups in the data center failed to handle switching power sources appropriately and will be replaced by IT Services
  5. IT Services and Facilities are looking at options to replace the aging generator and associated infrastructure with a more modern solution
  6. IT Services will continue working with our system providers to move authentication of those systems to use Office365 credentials (Azure AD) eliminating the need for authentication services to be provided from systems physically in our data center.

 

Though we don’t wish for Issues like these to occur, they do serve to highlight areas of the FPU physical plant and the IT infrastructure that need to be addressed and we are addressing them as quickly possible.