honoluluadvertiser.com

Sponsored by:

Comment, blog & share photos

Log in | Become a member
The Honolulu Advertiser

Posted on: Tuesday, October 5, 2004

Software disasters often have human touch

By Matthew Fordahl
Associated Press

SAN JOSE, Calif. — New software at Hewlett-Packard Co. was supposed to get orders in and out the door faster at the computer giant. Instead, a botched deployment cut into earnings in a big way in August and executives got fired.

Some passengers played cards to pass the time at Las Vegas' McCarran International Airport after a Sept. 14 radio failure at an FAA control facility forced some Western airports to temporarily ground some flights. The system failure was caused by a software shutdown.

Associated Press library photo

Last month, a system that controls communications between commercial jets and air traffic controllers in Southern California shut off because some maintenance had not been performed. A backup also failed, triggering potential peril.

Computer code foul-ups also recently held Tacoma, Wash.'s budget hostage, delayed financial aid to university students in Indiana and caused retailer Ross Stores Inc.'s profits to plummet 40 percent after a merchandise-tracking system failed.

Such disasters are often blamed on bad software, but the cause is rarely bad programming. As systems grow more complicated, failures instead have far less technical explanations: Bad management, communication or training.

"In 90 percent of the cases, it's because the implementer did a bad job, training was bad, the whole project was poorly done," said Joshua Greenbaum, principal analyst at Enterprise Applications Consulting in Berkeley. "At which point, you have a real garbage in, garbage out problem."

As governments, businesses and other organizations become more reliant on technology, the consequences of software failures are rarely trivial. Entire businesses — sometimes even lives — are at stake.

Many experts believe the situation will only worsen as software automates new tasks and more systems interconnect with and rely on other computers. Technical challenges may be surmounted, but managing people never gets easier.

"The limit we're hitting is the human limit, not the limit of software," Greenbaum said. "Technology has gotten ahead of our organizational and command capabilities in many cases. It's amazing when you go into companies and see the kinds of battles that go on."

Big software projects — whether to manage supply chains, handle payroll, track inventory or prepare finances — tend to begin with high expectations and the best intentions. They're all about efficiency, reliability, cost-savings, competitiveness.

Companies might develop their own programs internally, outsource the job or buy from a company such as SAP AG, Oracle Corp. or PeopleSoft Inc. Regardless of the route, it's usually a major undertaking to get things right.

Often, however, the first step toward total disaster is taken before the first line of code is drawn up. Organizations must map out exactly how they do business, refining procedures along the way. All this must be clearly explained to a project's technical team.

"The risk associated with these projects is not around software but is around the actual business process redesign that takes place," said Bill Wohl, an SAP spokesman. "These projects require very strong executive leadership, very talented consulting resources and a very focused effort if the project is to be successful and not disruptive."

A 2002 study commissioned by the National Institute of Standards and Technology found that software bugs cost the U.S. economy about $59.5 billion annually. The same study found that more than a third of that cost — about $22.2 billion — could be eliminated by improving testing.

The lack of robust testing likely contributed to the Sept. 14 radio system failure over the skies of parts of California, Nevada and Arizona. Though there were a handful of close calls, all 403 planes in the air during the incident managed to land safely, said FAA spokesman Donn Walker. A handful violated rules that dictate how close they are allowed to fly to each other — but the FAA maintains there were no "near misses."

The genesis of the problem was the transition in 2001 by Harris Corp. of the Federal Aviation Administration's Voice Switching Control System from Unix-based servers to Microsoft Corp.'s off-the-shelf Windows Advanced Server 2000.

By most accounts, the move went well except the new system required regular maintenance to prevent data overload. When that wasn't done, it turned itself off as it was designed to do. But the backup also failed. In all, the southern California system was down for three hours, though other FAA centers restored communications within seconds, Walker said.

The FAA's investigation is continuing.