Showing posts with label faults. Show all posts
Showing posts with label faults. Show all posts

Sunday, October 29, 2017

My most challenging experience as a software developer

Here is my detailed answer to the question, "What is the most challenging experience you encountered as a software developer?:

We were developing the tracking system for Project Mercury, to put a person in space and bring them back alive. The “back alive” was the challenging part, but not the only one. Some other challenges were as follows:

- The system was based on a world-wide network of fairly unreliable teletype connections. 

- We had to determine the touchdown in the Pacific to within a small radius, which meant we needed accurate and perfectly synchronized clocks on the computer and space capsule.

- We also needed to knew exactly where our tracking stations were, but it turned out nobody knew where Australia's two stations were with sufficient precision. We had to create an entire sub-project to locate Australia.

- We needed information on the launch rocket, but because it was also a military rocket, that information was classified. We eventually found a way to work around that.

- Our computers were a pair of IBM 7090s, plus a 709 at a critical station in Bermuda. In those days, the computers were not built for on-line real-time work. For instance, there was no standard interrupt clock. We actually built our own for the Bermuda machine.

- Also, there were no disk drives yet, so everything had to be based on a tape drive system, but the tape drives were not sufficiently reliable for our specs. We beat this problem by building software error-correcting codes into the tape drive system.

We worked our way through all these problems and many more smaller ones, but the most challenging problem was the “back alive” requirement. Once we had the hardware and network reliability up to snuff, we still had the problem of software errors. To counter this problem, we created a special test group, something that had never been done before. Then we set a standard that any error detected by the test group and not explicitly corrected would stop any launch.

Our tests revealed that the system could crash for unknown reasons at random times, so it would be unable to bring down the astronaut safely at a known location. When the crash occurred in testing, the two on-line printers simultaneously printed a 120-character of random garbage. The line was identical on the two printers, indicating that this was not some kind of machine error on one of the 7090s. It could have been a hardware design error or a coding error. We had to investigate both possibilities, but the second possibility was far more likely.

We struggled to track down the source of the crash, but after a fruitless month, the project manager wanted to drop it as a “random event.” We all knew it wasn’t random, but he didn’t want to be accused of delaying the first launch.

To us, however, it was endangering the life of the astronaut, so we pleaded for time to continue trying to pinpoint the fault. “We should think more about this,” we said, to which he replied (standing under an IBM THINK sign), “Thinking is a luxury we can no longer afford.”

We believed (and still believe) that thinking is not a luxury for software developers, so we went underground. After much hard work, Marilyn pinpointed the fault and we corrected it just before the first launch. We may have saved an astronaut’s life, but we’ll never get any credit for it.

Moral: We may think that hardware and software errors are challenging, but nothing matches the difficulty of confronting human errors—especially when those humans are managers willing to hide errors in order to make schedules.



Wednesday, January 11, 2017

Foreword and Introduction to ERRORS book

Foreword


Ever since this book came out, people have been asking me how I came to write on such an unusual topic. I've pondered their question and decided to add this foreword as an answer.

As far as I can remember, I've always been interested in errors. I was a smart kid, but didn't understand why I  made mistakes. And why other people made more.

I yearned to understand how the brain, my brain, worked, so I studied everything I could find about brains. And then I heard about computers.

Way back then, computers were called "Giant Brains." Edmund Berkeley wrote a book by that title, which I read voraciously.

Those giant brains were "machines that think" and "didn't make errors." Neither turned out to be true, but back then, I believed them. I knew right away, deep down—at age eleven—that I would spend my life with computers.

Much later, I learned that computers didn't make many errors, but their programs sure did.

I realized when I worked on this book that it more or less summarizes my life's work, trying to understand all about errors. That's where it all started.

I think I was upset when I finally figured out that I wasn't going to find a way to perfectly eliminate all errors, but I got over it. How? I think it was my training in physics, where I learned that perfection simply violates the laws of thermodynamics.

Then I was upset when I realized that when a computer program had a fault, the machine could turn out errors millions of times faster than any human or group of humans.

I could actually program a machine to make more errors in a day than all human beings had made in the last 10,000 years. Not many people seemed to understand the consequences of this fact, so I decided to write this book as my contribution to a more perfect world.

Not perfect, of course, but more perfect. I hope it helps.


Introduction


For more than a half-century, I’ve written about errors: what they are, their importance, how we think about them, our attempts to prevent them, and how we deal with them when those attempts fail. People tell me how helpful some of these writings have been, so I felt it would be useful to make them more widely known. Unfortunately, the half-century has left them scattered among several dozen books, so I decided to consolidate some of the more helpful ones in this book.

I’m going to start, though, where it all started, with my first book where Herb Leeds and I made our first public mention of error. Back in those days, Herb and I both worked for IBM. As employees we were not allowed to write about computers making mistakes, but we knew how important the subject was. So, we wrote our book and didn’t ask IBM’s permission.

Computer errors are far more important today than they were back in 1960, but many of the issues haven’t changed. That’s why I’m introducing this book with some historical perspective: reprinting some of that old text about errors along with some notes with the perspective of more than half a century.

1960’s Forbidden Mention of Errors
From: CHAPTER 10
Leeds and Weinberg,
Computer Programming Fundamentals PROGRAM TESTING
When we approach the subject of program testing, we might almost conclude the whole subject immediately with the anecdote about the mathematics professor who, when asked to look at a student’s problem, replied, “If you haven’t made any mistakes, you have the right answer.” He was, of course, being only slightly facetious. We have already stressed this philosophy in programming, where the major problem is knowing when a program is “right.”

In order to be sure that a program is right, a simple and systematic approach is undoubtedly best. However, no approach can assure correctness without adequate testing for verification. We smile when we read the professor’s reply because we know that human beings seldom know immediately when they have made errors—although we know they will at some time make them. The programmer must not have the view that, because he cannot think of any error, there must not be one. On the contrary, extreme skepticism is the only proper attitude. Obviously, if we can recognize an error, it ceases to be an error.

If we had to rely on our own judgment as to the correctness of our programs, we would be in a difficult position. Fortunately the computer usually provides the proof of the pudding. It is such a proper combination of programmer and computer that will ultimately determine the means of judging the program. We hope to provide some insight into the proper mixture of these ingredients. An immediate problem that we must cope with is the somewhat disheartening fact that, even after carefully eliminating clerical errors, experienced programmers will still make an average of approximately one error for every thirty instructions written.

We make errors quite regularly
This statement is still true after half a century—unless it’s actually worse nowadays. (I have some data from Capers Jones suggesting one error in fewer than ten instructions may be typical for very large, complex projects.) It will probably be true after ten centuries, unless by then we’ve made substantial modifications to the human brain. It’s a characteristic of humans would have been true a hundred centuries ago—if we’d had computers then.

1960’s Cost of errors
These errors range from minor misunderstandings of instructions to major errors of logic or problem interpretation. Strangely enough, the trivial errors often lead to spectacular results, while the major errors initially are usually the most difficult to detect.

“Trivial” errors can have great consequences
We knew about large errors way back then, but I suspect we didn’t imagine just how much errors could cost. For examples of some billion dollar errors along with explanations, read the chapter “Some Very Expensive Software Errors.”

Back to 1960 again
Of course, it is possible to write a program without errors, but this fact does not obviate the need for testing. Whether or not a program is working is a matter not to be decided by intuition. Quite often it is obvious when a program is not working. However, situations have occurred where a program which has been apparently successful for years has been exposed as erroneous in some part of its operation.

Errors can escape detection for years
With the wisdom of time, we now have quite specific examples of errors lurking in the background for thirty years or more. For example, read the chapter on “predicting the number of errors.”

How was it tested in 1960
Consequently, when we use a program, we want to know how it was tested in order to give us confidence in—or warning about—its applicability. Woe unto the programmer with “beginner’s luck” whose first program happens to have no errors. If he takes success in the wrong way, many rude shocks may be needed to jar his unfounded confidence into the shape of proper skepticism.

Many people are discouraged by what to them seems the inordinate amount of effort spent on program testing. They rightly indicate that a human being can often be trained to do a job much more easily than a computer can be programmed to do it. The rebuttal to this observation may be one or more of the following statements:
  1. All problems are not suitable for computers. (We must never forget this one.)
  2. The computer, once properly programmed, will give a higher level of performance, if, indeed,
    the problem is suited to a computer approach.
  3. All the human errors are removed from the system in advance, instead of distributing them
    throughout the work like bits of shell in a nutcake, In such instances, unfortunately, the human errors will not necessarily repeat in identical manner. Thus, anticipating and catching such errors may be exceedingly difficult. Often in these eases the tendency is to overcompensate for such errors, resulting in expense and time loss.
  4. The computer is often doing a different job than the man is doing, for there is a tendency– usually a good one—to enlarge the scope of a problem at the same time it is first programmed for a computer. People are often tempted to “compare apples with houses” in this case.
  5. The computer is probably a more steadfast employee, whereas human beings tend to move on to other responsibilities and must be replaced by other human beings who must, in turn, be trained.
In other words, if a job is worth doing, it is worth doing right.

Sometimes the error is creating a program at all.
Unfortunately, the cost of developing, supporting, and maintaining a program frequently exceeds the value it produces. In any case, no amount of fixing small program errors can eliminate the big error of writing the program in the first place. For examples and explanations, read the chapter on “it shouldn’t even be done.”

The full process, 1960
If a job is a computer job, it should be handled as such without hesitation. Of course, we are obligated to include the cost of programming and testing in any justification of a new computer application. Furthermore we must not be tempted to cut costs at the end by skimping on the testing effort. An incorrect program is indeed worth less than no program at all because the false conclusions it may inspire can lead to many expensive errors.

We must not confuse cost and value.
Even after all this time, some managers still believe they can get away with skimping on the testing effort. For examples and explanations, read the section on “What Do Errors Cost?”

Coding is not the end, even in 1960
A greater danger than false economy is ennui. Sometimes a programmer, upon finishing the coding phase of a problem, feels that all the interesting work is done. He yearns to move on to the next problem.

Programs can become erroneous without changing a bit.
You may have noticed the consistent use of “he” and “his” in this quoted passage from an ancient book. These days, this would be identified as “sexist writing,” but it wasn’t called “sexist” way back then. This is an example of how something that wasn’t an error in the past becomes an error with changing culture, changing language, changing hardware, or perhaps new laws. We don’t have to do anything to make an error, but we have to do a whole lot not to make an error.

We keep learning, but is it enough?
Thus as soon as the program looks correct—or, rather, does not look incorrect—he convinces himself it is finished and abandons it. Programmers at this time are much more fickle than young lovers.
Such actions are, of course, foolish. In the first place, we cannot so easily abandon our programs and relieve ourselves of further obligation to them. It is very possible under such circumstances that in the middle of a new problem we shall be called upon to finish our previous shoddy work—which will then seem even more dry and dull, as well as being much less familiar. Such unfamiliarity is no small problem. Much grief can occur before the programmer regains the level of thought activity he achieved in originally writing the program. We have emphasized flow diagramming and its most important assistance to understanding a program but no flow diagram guarantees easy reading of a program. The proper flow diagram does guarantee the correct logical guide through the program and a shorter path to correct understanding.

It is amazing how one goes about developing a coding structure. Often the programmer will review his coding with astonishment. He will ask incredulously, “How was it possible for me to construct this coding logic? I never could have developed this logic initially.” This statement is well-founded. It is a rare case where the programmer can immediately develop the final logical construction. Normally programming is a series of attempts, of two steps forward and one step backward. As experience is gained in understanding the problem and applying techniques—as the programmer becomes more immersed in the program’s intricacies—his logic improves. We could almost relate this logical building to a pyramid. In testing out the problem we must climb the same pyramid as in coding. In this case, however, we must take care to root out all misconstructed blocks, being careful not to lose our footing on the slippery sides. Thus, if we are really bored with a problem, the smartest approach is to finish it as correctly as possible so we shall never see it again.

In the second place, the testing of a program, properly approached, is by far the most intriguing part of programming. Truly the mettle of the programmer is tested along with the program. No puzzle addict could experience the miraculous intricacies and subtleties of the trail left by a program gone wrong. In the past, these interesting aspects of program testing have been dampened by the difficulty in rigorously extracting just the information wanted about the performance of a program. Now, however, sophisticated systems are available to relieve the programmer of much of this burden.

Testing for errors grows more difficult every year.
The previous sentence was an optimistic statement a half-century ago, but not because it was wrong. Over all these years, hundreds of tools have been built attempting to simplify the testing burden. Some of them have actually succeeded. At the same time, however, we’ve never satisfied our hunger for more sophisticated applications. So, though our testing tools have improved, our testing tasks have outpaced them. For examples and explanations, read about “preventing testing from growing more difficult.” 

If you're as interested in errors as I am, you can obtain a copy of Errors here:

ERRORS, bugs, boo-boos, blunders
SaveSave

Wednesday, August 17, 2016

How does a tester help with requirement gathering?



The question was posed: "How does a tester help with requirement gathering?" A number of good, but conventional, answers were given, so I decided to take my answer to a meta-level, like "what is the tester's job, generally, and how does that relate to this question?"
The tester’s job is to test, that is, to provide information about the state of a system being built or repaired. Therefore, the tester should help with requirement gathering or any other phase of development where the job of testing might be affected.
A professional tester will involve her/himself in all phases, not such much to “help” others do their job, but to assure that s/he will be able to do a professional job of testing. So, for example, in the requirements work, the tester should obviously monitor any requirements to ensure that s/he can test to see if they’re fulfilled. The tester should block any vague statements such as “user-friendly” or “efficient” or “fast”—to take just one example.
Moreover, requirements are “gathered” in many ways besides an official requirements gathering phase, so the tester must always be on the alert for any way requirments can creep into the project. For instance, developers frequently make assumptions based on their own convenience or preferences, and such assumptions are not usually documented. Or, salespeople make promises to important customers, or put promises in advertisements, so a tester must spend some time speaking (at least informally) to customers—and also reading ads.
In short, a tester must be involved right from the very first moment a project is imagined, with eyes, ears, and, yes, nose always on the alert for what must be tested, how it must be tested, and whether or not in can actualliy be tested. That's the way a tester "helps" the requirements gathering and all other phases of a project's life, on alert for anything that could affect accurately documenting the state of the system.

As a consequence of this broad view of the tester's job, a tester has to resist any assertion that "you don't have to be involved in this phase," regardless of what phase it is. Obstacles to testing can arise anywhere.


Sunday, August 07, 2016

What Is Quality? Why Is It Important?


(This post is taken from the beginning of Chapter 1 of Book 1 of my Quality Sofware Series: How Software is Built.)



"You can fool all of the people some of the time, and some of the people all of the time, but you can't fool all of the people all of the time."- Abraham Lincoln
People in the software business put great stress on removing ambiguity, and so do writers. But sometimes writers are intentionally ambiguous, as in the title of this book. "Quality Software Management" means both "the management of quality software" and "quality management in the software business," because I believe that the two are inseparable. Both meanings turn on the word "quality," so if we are to keep the ambiguity within reasonable bounds, we first need to address the meaning of that often misunderstood term.
1.1 A Tale Of Software Quality
My sister's daughter, Terra, is the only one in the family who has followed Uncle Jerry in the writer's trade. She writes fascinating books on the history of medicine, and I follow each one's progress as if it were one of my own. For that reason, I was terribly distressed when her first book, Disease in the Popular American Press, came out with a number of gross typographical errors in which whole segments of text disappeared (see Figure 1-1). I was even more distressed to discover that those errors were caused by an error in the word processing software she used—CozyWrite, published by one of my clients, the MiniCozy Software Company.
                                                                                                   
The next day, too, the Times printed a letter from "Medicus," objecting to the misleading implication in the microbe story that diphtheria could ever be inoculated against; the writer flatly asserted that there would never be a vaccine for this disease because, unlike smallpox, diphtheria re-

Because Times articles never included proof—never told how people knew what they claimed—the uninformed reader had no way to distinguish one claim from another.
                                                                                                   
Figure 1-1. Part of a sample page from Terra Ziporyn's book showing how the CozyWrite word processor lost text after "re-" in Terra's book.

Terra asked me to discuss the matter with MiniCozy on my next visit. I located the project manager for CozyWrite, and he acknowledged the existence of the error.
"It's a rare bug," he said.
"I wouldn't say so," I countered. "I found over twenty-five instances in her book."
"But it would only happen in a book-sized project. Out of over 100,000 customers, we probably didn't have 10 who undertook a project of that size as a single file."
"But my niece noticed. It was her first book, and she was devastated."
"Naturally I'm sorry for her, but it wouldn't have made any sense for us to try to fix the bug for 10 customers."
"Why not? You advertise that CozyWrite handles book-sized projects."
"We tried to do that, but the features didn't work. Eventually, we'll probably fix them, but for now, chances are we would introduce a worse bug—one that would affect hundreds or thousands of customers. I believe we did the right thing."
As I listened to this project manager, I found myself caught in an emotional trap. As software consultant to MiniCozy, I had to agree, but as uncle to an author, I was violently opposed to his line of reasoning. If someone at that moment had asked me, "Is CozyWrite a quality product?" I would have been tongue-tied.
How would you have answered?
1.2 The Relativity of Quality
The reason for my dilemma lies in the relativity of quality. As the MiniCozy story crisply illustrates, what is adequate quality to one person may be inadequate quality to another.
1.2.1. Finding the relativity
If you examine various definitions of quality, you will always find this relativity. You may have to examine with care, though, for the relativity is often hidden, or at best, implicit.
Take for example Crosby's definition:
"Quality is meeting requirements."
Unless your requirements come directly from heaven (as some developers seem to think), a more precise statement would be:
"Quality is meeting some person's requirements."
For each different person, the same product will generally have different "quality," as in the case of my niece's word processor. My MiniCozy dilemma is resolved once I recognize that
a. To Terra, the people involved were her readers.
b. To MiniCozy's project manager, the people involved were (the majority of) his customers.
1.2.2 Who was that masked man?
In short, quality does not exist in a non-human vacuum.
Every statement about quality is a statement about some person(s).
That statement may be explicit or implicit. Most often, the "who" is implicit, and statements about quality sound like something Moses brought down from Mount Sinai on a stone tablet. That's why so many discussions of software quality are unproductive: It's my stone tablet versus your Golden Calf.
When we encompass the relativity of quality, we have a tool to make those discussions more fruitful. Each time somebody asserts a definition of software quality, we simply ask,
"Who is the person behind that statement about quality."
Using this heuristic, let's consider a few familiar but often conflicting ideas about what constitutes software quality:
a. "Zero defects is high quality."
1. to a user such as a surgeon whose work would be disturbed by those defects
2. to a manager who would be criticized for those defects
b. "Lots of features is high quality."
1. to users whose work can use those features—if they know about them
2. to marketers who believe that features sell products
c. "Elegant coding is high quality."
1. to developers who place a high value on the opinions of their peers
2. to professors of computer science who enjoy elegance
d. "High performance is high quality."
1. to users whose work taxes the capacity of their machines
2. to salespeople who have to submit their products to benchmarks
e. "Low development cost is high quality."
1. to customers who wish to buy thousands of copies of the software
2. to project managers who are on tight budgets
f. "Rapid development is high quality."
1. to users whose work is waiting for the software
2. to marketers who want to colonize a market before the competitors can get in
g. "User-friendliness is high quality."
1. to users who spend 8 hours a day sitting in front of a screen using the software

2. to users who can't remember interface details from one use to the next