Friday, February 27, 2009

My Favorite Bug

Michael Hunter asked me to write something for his blog about "my favorite bug." At first I thought it was a silly assignment, but, as usual, once I sat down to write, I learned a few things. I don't know when (or if) it will appear on his blog, so I'm putting it here because I think it's too good not to share. (If you think I lack humility, consider the subject: how stupid I can be.)

My First Commercial Program

My favorite bug is also my first bug. It's my favorite because it started me on the road to learning the most important things about programming and testing.

It was the first commercial program I had ever written--a pipeline network analyzer for municipal water systems. (note 1) Essentially, it solved systems of non-linear algebraic equations, by iteration.

It passed all my prepared tests (test first, back in 1956), so we brought in civil engineers with data from a city near San Francisco (not San Jose, which barely existed back then). We set up and started to run the first set of data. Then we waited while the IBM 650 ground away. Thirty minutes--surpassing my largest test case. One hour--surpassing my wildest imagination. Two hours--making me suspect the program was in a closed loop.

I got up from the table where we'd been checking the input data. I was about to stop the machine and find out where the program had gone wrong, but before I reached the machine, it began to punch cards. (We had no on-line printing in those days.) The cards contained the solution, exactly according to spec.

So, what was the bug?

There was nothing wrong with the computation, but the design of the human interface was bad, bad, bad. I had not estimated the performance characteristics as the number of equations grew. I had not accounted for human patience (or impatience) and tolerance (or intolerance) for uncertainty. The case that "looped" was the smallest of our cases. The largest would have run something like twenty hours.

I fixed the bug by having the program punch out a "progress card" after every complete iteration. The card contained a set of numbers that characterized the convergence so far. From the sequence of cards, we could observe the rate of convergence and decide if we wanted the program to continue. If we wanted to stop, we could set a switch on the console and force the program to punch out the full set of values so far--in the input format, so we could resume later, if we wished. (Having no breakpoint in a program that could run twenty hours was another design bug.)

Learnings

After than experience, I've never failed to consider

a. the relationship between a program and its user

b. the potential performance characteristics of the program

c. the possibility that an error might occur (hardware or software or human) that could cost us thousands of dollars of wasted computer and human time.

d. that I ought to be more humble about my programming skills. Before that, I had never made an error in one of perhaps ten small programs I had written, and I thought I was pretty terrific. This humiliating experience made me appreciative of a good testing process (We didn't have "testers" in those days. We didn't even have "programmers." My title was "Applied Science Representative."

This was the beginning of a long career of learning about programmers, programming, and testing. Many of those learnings are captured in my book, Perfect Software and Other Illusions About Testing (note 2)

Well, it wasn't my last bug, but it was first, and my best.

Notes

(note 1) Weinberg, Gerald M., and Lyle N. Hoag. "Pipeline Network Analysis by Electronic Digital Computer." Journal of the American Water Works Association 49, no. 5 (1957).

(note 2) Perfect Software (And Other Illusions About Testing)
http://www.dorsethouse.com/savings.html#perf

2 comments:

Brian said...

A couple of tales.

In 1966, I had written programs in FORTRAN and assembler for the CDC 1604 in an academic environment. That got me a gig with a small computer manufacturer using similar architecture.

I was handed a "95% complete" program to bring to completion - the time frame was 2 weeks. 6 weeks later, I realized that when my predecessor had coded from a listing of his old program, the keypunch operator had misread "KAH" for another valid mnemonic, "KXH". Solved, went home for the day.

On to my own code. At around 11 PM on December 30, 1966, I was feverishly trying to find out why my program just wasn't working. Stepping through my code it miraculously dawned on me - the true meaning of "load operand address"! Damn glad to be able to attend the New Year's Eve party where it had been rumored that one of the girl programmers would appear nude - she did, in body paint to look like the (U of) Minnesota Golden Gopher!

Two weeks later, I had my "own" oomputer on which to check out my program. I was assured that it was OK, they had already corrected 88 back-plane wiring errors (an early "jobs" program had trained basically uneducated women to use wire-wrap guns, but somehow left out the part about reading diagrams).

I was stumped and my boss Bob stoked up his briar with cherry blend to have a look-see. In minutes he had found wiring error #89 - in the adder (yes, the Noah log table joke), a carry from bit 9 automatically propagated to bit 11, which meant that about half of the time, something would go wrong.

Dwayne said...

I have your error repeated often. It is nice to have some progress reported and allow the user the chance to stop the program and keep the hard-earned results.

I don't remember many of my programming errors, I guess there are too many of them to recall. I do recall one error about errors. I was working on my dissertation research at home. I sternly told myself that I did not have the time to make and correct programming errors. Therefore, I would have to program error free from that moment forward.

I failed. I still made programming errors and consumed hours finding and correcting them. I just couldn't will myself to be perfect no matter how hard I tried.