How tempting it is to use the benefit of hindsight to criticize those in charge of BP PLC’s Deepwater Horizon oil rig in the Gulf of Mexico. The Wall Street Journal’s (WSJ) coverage even informs readers that BP and its drilling contractor Transocean celebrated 7 years without a serious injury. The introspective reader, especially IT/Telecom professionals, can learn several valuable lessons from this incident.
Gushing Revenue Can Change into Huge Loses in Minutes.
In the course of a few hours multiple warning signs were not fully appreciated. To use an IT/Telecom term, “alarms” quickly built up before they could be assessed. As a result, liquid gold (“revenue”) turned into a fire ball then a toxic sludge that will cost BP billions and is possibly the largest man-made environmental event in recorded history.
Monitoring for Event and Resolution Management
WSJ and other narratives pointed to various warning signs (“alarms”) of potential problems hours prior to vast amounts of methane lifting to the surface and igniting. The lesson learned is not only the need to monitor and set off alarms but the importance of correlating alarms to determine the underlying event proactively before it becomes a major issue. A telecom/IT example is that several warning “alarms” of low severity may mean nothing individually but viewed together, they can point to a power outage. Accurately and quickly identifying the event is necessary to resolve the underlying issue.
In summary, commercial grade infrastructure management systems should correlate alarms into events (based on past history and solid design). Accurate identification of events leads to the right resolution.
Communication is Key
Dual lines of authority and a lack of event/resolution planning reportedly contributed to missed opportunities to stop the chain of events that led to the Deepwater Horizon catastrophe. The take away for IT/Telecom professionals, is to figure out what you can do to improve communications in such a manner that takes advantage of the tools on our desks, smart phones and various management systems, especially with the gradual roll out of Unified Communications features such a presence. The challenge is how to communicate the right information (alerts/events/resolution) to the right resources and decision makers as quickly and efficiently as possible.
What is your experience with proactively identifying and resolving events? Do you have a way of seeing all of your alarms in a “single pane of glass” in order to correlate them?