LIPA task force: Computer breakdown led to PSEG’s Isaias failuresSeptember 22, 2020
The breakdown of a computer system central to PSEG Long Island’s storm response set the stage for a series of cascading failures for more than 500,000 customers after Tropical Storm Isaias, far more than the 420,000 the utility initially reported.
A preliminary report from a task force said PSEG was struggling with the breakdown of the $30.6 million computer system as it tried to field calls, assign work and predict restoration times.
Among the report’s findings:
- The storm resulted in 645,000 customer outages, including some customers who saw multiple outages. Some 513,000 customers lost power on the first day of the storm, Aug. 4.
- The outage management computer system experienced “multiple issues” all tied to a massive stream of data sent to it during the early hours after the storm, “rendering it effectively non-functional at times” while “negatively impacting all communication channels and field management activities.”
- The failure of that computer system blocked text messages from being processed by PSEG’s outage map, preventing it from refreshing as customers sought restoration times.
- The breakdown of the computer system was a “significant cause” of the “substantially exceeded” restoration estimates for customers, pushing the total restoration to eight days from the initially announced 48 hours.
- The Long Island Power Authority had previously found “significant weaknesses” in PSEG’s process for maintaining a list of critical-care customers to be contacted in the event of a storm. PSEG, which operates the Long Island grid under a long-term contract with LIPA, maintains the list of customers on life-support equipment with only a single letter sent annually, with only a 43% response rate. LIPA has called for an acceleration of that audit, starting this month.
The report confirmed widespread problems of inaccurate, inconsistent and frequently changing estimated restoration times, issues that likely compounded problems by leading customers to contact PSEG again and again to find out why, and to get new estimates. PSEG had initiated a project that would have addressed the problem by issuing blanket messages to customers telling them that the company was assessing the situation, but the project “has not been completed at this time,” the report said. Including this ability in the future is the “single-most important technology change that would have made a notable difference in the customer’s experience and their trust in the information provided by PSEG,” the report found.
The task force that produced the report is made up of senior LIPA oversight and computer system executives, heads of investigations, storm preparedness and enforcement for the state Department of Public Service, and a team of independent utility consultants, LIPA said.
With the system malfunctioning, PSEG communicated outage estimates to customers based on a mathematical calculation that proved to be unreliable. Using it “undermined PSEG Long Island’s credibility as one- and two-day estimated restoration times were subsequently revised up to eight days,” the report says.
The computer problems also hampered the utility’s ability to “assign work to and receive updates from field workers in a timely manner.”
Days after the storm, PSEG abandoned a recently installed version of the system and went back to an older version, then stress-tested it at high call volumes. It worked better but still contained bugs.
The report said if an Isaias-like storm were to hit today, the computer system “would likely function better” because of several fixes already implemented, but it added, “There is insufficient evidence that without implementing the [task force’s] recommendations, the [outage management system] would function at an acceptable level.”
“There’s still more work to do,” LIPA chief executive Tom Falcone said on Monday.
Some problems may require new fixes from the software vendor or an upgrade of the software, the report says.
Among the recommendations LIPA wants in place by Oct. 15: provide “realistic” communications to customers at the outset of storms by omitting estimated restoration times for major storms altogether for the first 24 to 48 hours, replacing them on the customer outage map with a message that says, “Assessing damage, will update expected restoration times in 24 hours.”
The task force also recommends PSEG design and implement a “manual work-around” process to make sure that the current outage management computer system performs in near-term storms.
Because they rely on timely and accurate information from the computer system, PSEG’s text messaging, outage map and municipal portal used by local governments experienced problems. Up to 300,000 text messages failed in the first 24 hours, with many customers receiving “error” messages.
The PSEG outage map, which saw some 629,000 page views per day for each of the 10 days of the restoration, also experienced problems. On Aug. 5, the day after the storm, the vendor Kubra began to manually enter data into the system from PSEG. It helped, but ongoing problems led to continued confusion among customers and officials “who could not see outage status changes for their community in a timely manner.”
Customers who turned to PSEG’s MyAccount portal to report information or find out what was wrong also experienced problems. Volume to the site increased twenty-fold, the report found, and it wasn’t back in full operation until Aug. 6 at 8 p.m.