Gerald Weinberg's Secrets of Writing and Consulting: perfection

Showing posts with label perfection. Show all posts

Tuesday, November 07, 2017

When do I know I'm not a beginning programmer any more?

I was asked, "When do I know I'm not a beginning programmer any more?"

I wouldn’t answer this question, because it’s the wrong question.

You should not ever want to know you’re not a beginner, because a true professional is always a beginner. The world in general, and the world of programming in particular, is so complex, so huge, that one lifetime is not long enough to stop being a beginner.

Your beginner’s mind is one of your most valuable tools. It requires you to look at each situation afresh, and to innovate. (Fundamentally, that's what the Agile movement is all about.) If you know children, watch how they use their beginner’s minds to conquer their world.

I’m very suspicious of people in the programming field who think they are no longer beginners. Myself, I’ve been programming for about 70 years, and I still consider myself a beginner.

The Psychology of Computer Programming: Silver Anniversary eBook Edition

Monday, October 02, 2017

Can they charge me for bugs?

How likely is it that you can create 0 software bugs?

A contract programmer told us, "For years, my client has aimed for 0 bugs on every software release. However we can't control the bugs that closely. Now the client has come out with an idea of charging me a penalty—a cost refund as much as 3% per bugs from what I charge them. What can I do?"

First of all, stop calling them “bugs.” They are not independently reproducing life forms. They are made by us humans, and there are no perfect humans.

Next, listen to what experienced S/W developers will tell you. Perfect software is a myth, an illusion.

But suppose you did produce a piece of zero-error software. How would you know that’s what you had? I’ve known software that was thought to be error-free for 30+ years, then an error turned up. Are they still going to be charging you penalties thirty years from now?

Quite simply, perfect software violates the Second Law of Thermodynamics. Then, too, software that might be perfect yesterday can become imperfect because of changes in the world today.

But, if they want to charge you for errors detected in software you built, that’s okay. What you need to do is charge them more for the software to begin with, to account for what you will eventually have to pay back. Just set a time limit—maybe a year or so, or until someone else modifies the code. And be sure you have an agreed definition of what constitutes an “error.”

This is not a simple question. I’ve written at least two books on the subject, and ultimately they don't cover every possible variation. But at least give your client a copy of the books so you can begin your negotiation with some intelligent information, not just myths and illusions:

Errors: Bugs, Boo-Boos, and Blunders

Perfect Software And Other Illusions About Testing

Monday, September 25, 2017

Dealing With Failure as a Developer

He asked, "How do I not feel like a failure when I went to one of the best schools and got one of the top internships, only to be a bad developer in the end?"

And here's what I told him:

First of all, tell yourself how lucky you are that you found out that you don’t happen to be good at development. Lots of people are good at other things, but aren’t good at development, don’t know it, and persist in doing a bad job. You should be extremely happy you’re not one of those clueless people.

Tell yourself that you failed at one thing, so far. Most people in their lives fail at many things. It’s perfectly normal.

The few people who never fail at anything are generally those who never try anything new, or risky. Tell yourself how lucky you are that you’re not one of those jerks.

When we try things, sometimes we succeed, sometimes we fail. But succeed or fail, we always have the possibility to LEARN. Many of the people who do fail at things never take up the possibility to learn, so ask yourself “What did I learn from this failure.” Keep asking like that for each failure, and you will become a very smart person.

It would also be a good idea to learn to use a different way of speaking about yourself. You are not “a failure.” You are a person who failed at something. Once. Therefore, you are a real human being. That’s pretty good, isn’t it?

For some tools to help you work through this feeling of failure, read: More Secrets of Consulting: The Consultant's Tool Kit

Wednesday, January 11, 2017

Foreword and Introduction to ERRORS book

Foreword

Ever since this book came out, people have been asking me how I came to write on such an unusual topic. I've pondered their question and decided to add this foreword as an answer.

As far as I can remember, I've always been interested in errors. I was a smart kid, but didn't understand why I made mistakes. And why other people made more.

I yearned to understand how the brain, my brain, worked, so I studied everything I could find about brains. And then I heard about computers.

Way back then, computers were called "Giant Brains." Edmund Berkeley wrote a book by that title, which I read voraciously.

Those giant brains were "machines that think" and "didn't make errors." Neither turned out to be true, but back then, I believed them. I knew right away, deep down—at age eleven—that I would spend my life with computers.

Much later, I learned that computers didn't make many errors, but their programs sure did.

I realized when I worked on this book that it more or less summarizes my life's work, trying to understand all about errors. That's where it all started.

I think I was upset when I finally figured out that I wasn't going to find a way to perfectly eliminate all errors, but I got over it. How? I think it was my training in physics, where I learned that perfection simply violates the laws of thermodynamics.

Then I was upset when I realized that when a computer program had a fault, the machine could turn out errors millions of times faster than any human or group of humans.

I could actually program a machine to make more errors in a day than all human beings had made in the last 10,000 years. Not many people seemed to understand the consequences of this fact, so I decided to write this book as my contribution to a more perfect world.

Not perfect, of course, but more perfect. I hope it helps.

Introduction

For more than a half-century, I’ve written about errors: what they are, their importance, how we think about them, our attempts to prevent them, and how we deal with them when those attempts fail. People tell me how helpful some of these writings have been, so I felt it would be useful to make them more widely known. Unfortunately, the half-century has left them scattered among several dozen books, so I decided to consolidate some of the more helpful ones in this book.

I’m going to start, though, where it all started, with my first book where Herb Leeds and I made our first public mention of error. Back in those days, Herb and I both worked for IBM. As employees we were not allowed to write about computers making mistakes, but we knew how important the subject was. So, we wrote our book and didn’t ask IBM’s permission.

Computer errors are far more important today than they were back in 1960, but many of the issues haven’t changed. That’s why I’m introducing this book with some historical perspective: reprinting some of that old text about errors along with some notes with the perspective of more than half a century.

1960’s Forbidden Mention of Errors
From: CHAPTER 10
Leeds and Weinberg, Computer Programming Fundamentals PROGRAM TESTING
When we approach the subject of program testing, we might almost conclude the whole subject immediately with the anecdote about the mathematics professor who, when asked to look at a student’s problem, replied, “If you haven’t made any mistakes, you have the right answer.” He was, of course, being only slightly facetious. We have already stressed this philosophy in programming, where the major problem is knowing when a program is “right.”

In order to be sure that a program is right, a simple and systematic approach is undoubtedly best. However, no approach can assure correctness without adequate testing for verification. We smile when we read the professor’s reply because we know that human beings seldom know immediately when they have made errors—although we know they will at some time make them. The programmer must not have the view that, because he cannot think of any error, there must not be one. On the contrary, extreme skepticism is the only proper attitude. Obviously, if we can recognize an error, it ceases to be an error.

If we had to rely on our own judgment as to the correctness of our programs, we would be in a difficult position. Fortunately the computer usually provides the proof of the pudding. It is such a proper combination of programmer and computer that will ultimately determine the means of judging the program. We hope to provide some insight into the proper mixture of these ingredients. An immediate problem that we must cope with is the somewhat disheartening fact that, even after carefully eliminating clerical errors, experienced programmers will still make an average of approximately one error for every thirty instructions written.

We make errors quite regularly
This statement is still true after half a century—unless it’s actually worse nowadays. (I have some data from Capers Jones suggesting one error in fewer than ten instructions may be typical for very large, complex projects.) It will probably be true after ten centuries, unless by then we’ve made substantial modifications to the human brain. It’s a characteristic of humans would have been true a hundred centuries ago—if we’d had computers then.

1960’s Cost of errors
These errors range from minor misunderstandings of instructions to major errors of logic or problem interpretation. Strangely enough, the trivial errors often lead to spectacular results, while the major errors initially are usually the most difficult to detect.

“Trivial” errors can have great consequences
We knew about large errors way back then, but I suspect we didn’t imagine just how much errors could cost. For examples of some billion dollar errors along with explanations, read the chapter “Some Very Expensive Software Errors.”

Back to 1960 again
Of course, it is possible to write a program without errors, but this fact does not obviate the need for testing. Whether or not a program is working is a matter not to be decided by intuition. Quite often it is obvious when a program is not working. However, situations have occurred where a program which has been apparently successful for years has been exposed as erroneous in some part of its operation.

Errors can escape detection for years
With the wisdom of time, we now have quite specific examples of errors lurking in the background for thirty years or more. For example, read the chapter on “predicting the number of errors.”

How was it tested in 1960
Consequently, when we use a program, we want to know how it was tested in order to give us confidence in—or warning about—its applicability. Woe unto the programmer with “beginner’s luck” whose first program happens to have no errors. If he takes success in the wrong way, many rude shocks may be needed to jar his unfounded confidence into the shape of proper skepticism.

Many people are discouraged by what to them seems the inordinate amount of effort spent on program testing. They rightly indicate that a human being can often be trained to do a job much more easily than a computer can be programmed to do it. The rebuttal to this observation may be one or more of the following statements:

All problems are not suitable for computers. (We must never forget this one.)
The computer, once properly programmed, will give a higher level of performance, if, indeed,
the problem is suited to a computer approach.
All the human errors are removed from the system in advance, instead of distributing them
throughout the work like bits of shell in a nutcake, In such instances, unfortunately, the human errors will not necessarily repeat in identical manner. Thus, anticipating and catching such errors may be exceedingly difficult. Often in these eases the tendency is to overcompensate for such errors, resulting in expense and time loss.
The computer is often doing a different job than the man is doing, for there is a tendency– usually a good one—to enlarge the scope of a problem at the same time it is first programmed for a computer. People are often tempted to “compare apples with houses” in this case.
The computer is probably a more steadfast employee, whereas human beings tend to move on to other responsibilities and must be replaced by other human beings who must, in turn, be trained.

In other words, if a job is worth doing, it is worth doing right.

Sometimes the error is creating a program at all.
Unfortunately, the cost of developing, supporting, and maintaining a program frequently exceeds the value it produces. In any case, no amount of fixing small program errors can eliminate the big error of writing the program in the first place. For examples and explanations, read the chapter on “it shouldn’t even be done.”

The full process, 1960
If a job is a computer job, it should be handled as such without hesitation. Of course, we are obligated to include the cost of programming and testing in any justification of a new computer application. Furthermore we must not be tempted to cut costs at the end by skimping on the testing effort. An incorrect program is indeed worth less than no program at all because the false conclusions it may inspire can lead to many expensive errors.

We must not confuse cost and value.
Even after all this time, some managers still believe they can get away with skimping on the testing effort. For examples and explanations, read the section on “What Do Errors Cost?”

Coding is not the end, even in 1960
A greater danger than false economy is ennui. Sometimes a programmer, upon finishing the coding phase of a problem, feels that all the interesting work is done. He yearns to move on to the next problem.

Programs can become erroneous without changing a bit.
You may have noticed the consistent use of “he” and “his” in this quoted passage from an ancient book. These days, this would be identified as “sexist writing,” but it wasn’t called “sexist” way back then. This is an example of how something that wasn’t an error in the past becomes an error with changing culture, changing language, changing hardware, or perhaps new laws. We don’t have to do anything to make an error, but we have to do a whole lot not to make an error.

We keep learning, but is it enough?
Thus as soon as the program looks correct—or, rather, does not look incorrect—he convinces himself it is finished and abandons it. Programmers at this time are much more fickle than young lovers.
Such actions are, of course, foolish. In the first place, we cannot so easily abandon our programs and relieve ourselves of further obligation to them. It is very possible under such circumstances that in the middle of a new problem we shall be called upon to finish our previous shoddy work—which will then seem even more dry and dull, as well as being much less familiar. Such unfamiliarity is no small problem. Much grief can occur before the programmer regains the level of thought activity he achieved in originally writing the program. We have emphasized flow diagramming and its most important assistance to understanding a program but no flow diagram guarantees easy reading of a program. The proper flow diagram does guarantee the correct logical guide through the program and a shorter path to correct understanding.

It is amazing how one goes about developing a coding structure. Often the programmer will review his coding with astonishment. He will ask incredulously, “How was it possible for me to construct this coding logic? I never could have developed this logic initially.” This statement is well-founded. It is a rare case where the programmer can immediately develop the final logical construction. Normally programming is a series of attempts, of two steps forward and one step backward. As experience is gained in understanding the problem and applying techniques—as the programmer becomes more immersed in the program’s intricacies—his logic improves. We could almost relate this logical building to a pyramid. In testing out the problem we must climb the same pyramid as in coding. In this case, however, we must take care to root out all misconstructed blocks, being careful not to lose our footing on the slippery sides. Thus, if we are really bored with a problem, the smartest approach is to finish it as correctly as possible so we shall never see it again.

In the second place, the testing of a program, properly approached, is by far the most intriguing part of programming. Truly the mettle of the programmer is tested along with the program. No puzzle addict could experience the miraculous intricacies and subtleties of the trail left by a program gone wrong. In the past, these interesting aspects of program testing have been dampened by the difficulty in rigorously extracting just the information wanted about the performance of a program. Now, however, sophisticated systems are available to relieve the programmer of much of this burden.

Testing for errors grows more difficult every year.
The previous sentence was an optimistic statement a half-century ago, but not because it was wrong. Over all these years, hundreds of tools have been built attempting to simplify the testing burden. Some of them have actually succeeded. At the same time, however, we’ve never satisfied our hunger for more sophisticated applications. So, though our testing tools have improved, our testing tasks have outpaced them. For examples and explanations, read about “preventing testing from growing more difficult.”

If you're as interested in errors as I am, you can obtain a copy of Errors here:

ERRORS, bugs, boo-boos, blunders

Monday, October 31, 2016

What's the most complex thing about software development?

Interesting question.

So far, on Quora.com, there have been four excellent answers to this question: discussing

- the confusing role of people,

-the requirements problems,

-the interactions with the physical world.

Each of these factors certainly makes software development more complex, and processes such as Agile are designed to cope with this complexity. But, the ultimate complexity factor is software testing.

Why testing? In the software development literature, testing is not usually treated as a glamorous part of development, but when we're testing, we're up against the Second Law of Thermodynamics, which warns us that perfection is ultimately unobtainable.

So, even if we absolutely knew all the requirements (which we can't, of course), kept all the human factors under control (also impossible), and knew exactly all the physical properties of the real world (once more, impossible), we would still never be able to perform the infinite number of tests to cover all possible situations.

In other words, the software could still surprise us at any time. That's what I call complexity.

Of course, we can still work hard to solve these other problems. On requirements, for instance, see our Exploring Requirements books.

But no matter how hard you try, you'll still be faced with the testing problem. To understand this problem and what you can do to reduce (but not eliminate) it, take a look at Perfect Software and Other Illusions about Testing.

Friday, January 20, 2012

WIGGLE Charts—A Sketching Tool for Designers

There's no sense being precise about something when you don't even know what you're talking about. - John von Neumann

For systems designers, it is the best of times and the worst of times. For years we muddled through with a few simple graphic tools for design and documentation—flowcharts, block diagrams, and perhaps decision tables. Then came the diagram explosion, with HIPO, HIPO/DB, Warnier-Orr diagrams, Softech's SADT, Nassi-Shneiderman charts, Petri nets, Constantine structure charts and data flow diagrams, Jackson data structure diagrams, and coding schemes. And for each of these diagrams, you need only bend a line or add a symbol to become known as the inventor of yet another graphic design tool.

Although the choice is large, it is really not very wide. Each of these diagrammatic schemes shares the characteristic of precision—wonderful when you know what you're talking about, but time-consuming and thought-stifling when you don't. And, since most design work is spent thinking roughly, few of these diagrams are of much help through large parts of the design process.

In other design fields, such as architecture, the rough sketch is the most frequently used graphic device, and precise detailed drawings are rarely used at all until the creative part of the design work is finished. The rough sketch has several advantages over the precise drawing:

1. It can be drawn much faster, thus using less time.

2. It represents less investment of time, so we're not afraid to throw it away and try something else.

3. It's very roughness conveys important information about where we are in the design process.

In information processing, rough sketches have always existed, but have never been glorified by a name or by favorable publicity. Schools of architecture offer courses in sketching. The student architect who makes clear quick sketches is much admired by faculty and peers alike. It's time we learned from more mature disciplines and put sketching up on a pedestal.

For many years, I've taught a method of sketching usable with most of the diagrammatic techniques now used in information processing. Although it's been received with enthusiasm, it's never received much publicity, perhaps because:

1. It doesn't require a template.

2. It doesn't have a name.

Although I'll continue to resist the template forces, I've decided to bring the baby to life with a catchy acronym, WIGGLE Charts, for Weinberg's Ideogram for Generating Graphics Lacking Exactitude.

A WIGGLE is merely a box, or block, or line, with one or more rough edges. The rough edges indicate what parts represented by the box or line are imprecisely known. For instance, the following figure is a sketch of a system using a block diagram form

A WIGGLE block diagram

Each box represents input coming from the left, processing inside, and output going to the right. Box 1 has a straight line at its left side, indicating the input to Box 1 is clearly defined somewhere. The right side, however, is rough, indicating we haven't decided what its output will be. As indicated in the diagram, some output will be passed to a second box. but we don't know exactly what. The top and bottom of Box 1 are rough lines, indicating we don't know exactly what this process will be.

Box 2 has undefined input and output, but its process is well known to us, and clearly delimited in scope. Perhaps we have decided to use an off-the-shelf sort, though we don't know which one, so we haven't decided upon a record format.

Box 3 takes the unknown output of Box 2 as its unknown input. By a process that's not yet well defined, it produces two outputs, one well defined and one known only roughly. Perhaps the first report is defined by legal requirements, or by input needs of another system, while the second output is an error report whose format is left open at this stage of the design process. The rough arrows between the boxes indicate we haven't yet decided how control will pass from one box to another. They could be subroutines of the same master routine, or steps in the same job, or separate steps manually coordinated.

Taken together, these three WIGGLE boxes and their arrows give a sketch of the overall design we have in mind. Perhaps more important is what they don't do:

1. They don't give us or any reader an unjustified feeling of precision.

2. They don't intimidate anyone who has an idea about changing something to improve the design.

3. They haven't wasted a lot of time drawing with templates.

Perhaps the nicest feature of WIGGLE charts is the way they can be used with just about anybody's diagrammatic technique. In the second half of this blog post, we'll look at a few more examples of how WIGGLE charts can be used.

(to be continued)

Source

This material on WIGGLE charts is adapted from my book, Rethinking Systems Analysis and Design.

Tuesday, November 22, 2011

Who is Right, and What is to Be Done About It?

A management consultant, whose client was an international manufacturer, was asked to evaluate an inventory management procedure that the client had used with stunning success in their French operations. As part of his study, he wanted to compare the performance of the French procedure with procedures used in other locations, using historical data from several countries. A programmer with a strong management science background was given the job of programming the simulation of the French procedure.

When the consultant received the results he could not reconcile them with the ﬁgures supplied by the French company. After extensive checking he initiated a series of long telephone calls to France, suggesting that perhaps the procedure had not actually performed as well as they had claimed. The French management took offense at the implication of incompetence.

The French manager complained to the manager who had hired the consultant. Tempers mounted and international relations were strained to the breaking point.

By sheer chance, someone examined the programmer's simulation program and noticed that one term was missing and a second term was negative rather than positive. These findings led to a full technical review of the formula as translated. The review showed that the programmer's formula did not match the formula supplied by the French.

The consultant, much relieved, took the program back to the programmer and showed him the error. "That's not an error," the programmer protested. "Actually, the formula was in error, so I corrected it. The formula I programmed is correct, whereas the original formula was simply wrong."

That's the end of Part 1.

Note to Readers
Now, for you readers, the question is this:

"If you were the consultant, how would you handle this situation going forward?"

If I receive a few comments, I'll publish the rest of the story—what actually happened.

And please note: I don't accept anonymous comments. They're automatically rejected. By all means, use a pseudonym, but don't waste your effort trying to post anonymous comments.

Wednesday, September 21, 2011

Why English Will Never Be 100% Automated: Example

One of the nice features of the Kindle eBook service is they way they copy-edit some of their better-selling books. This can be a particularly important service for print books that have been scanned to make eBooks. For instance, Amazon recently wrote about my Kindle book, An Introduction to General Systems Thinking:

Typo issues exist that may have been caused by an Optical Character Recognition (OCR) problem. Few examples are given below:

loc 1561 - "T call them" should be " I call them".
loc 2946 - "But 1 still can't" should be "But I still can't".
loc 3351 - "Shasta the liger" should be "Shasta the tiger".

Please look for the same kind of errors throughout the book.

The first two instances are common OCR (Optical Character Recognition) errors: "T" for "I" and "1" (the numeral) for "I" (the capital letter).

The third example is a not quite so common OCR error, "l" (the letter) for "t". And, in this case, it's not an error at all.

The sentence in question was:

We do not often have our excessively sharp view of the world challenged by phenomena like Shasta the liger at the Salt Lake City Zoo, whose father was an African lion and whose mother was a Bengal tiger.

Still, the sentence contains a more subtle error: the failure of the author (me) to account for the mental state of the typical reader, for whom the term "liger" may be unfamiliar. (Even though it is found in at least 16 on-line dictionaries.)

I corrected this much more subtle error by redrafting the questionable sentence as:

We do not often have our excessively sharp view of the world challenged by phenomena like Shasta the "liger" at the Salt Lake City Zoo, whose father was an African lion and whose mother was a Bengal tiger.

As I dealt with this situation, I kept sighing and thinking, "English will never be entirely automated."

And then I took a deep breath and thought, "I suppose I prefer it that way."

How about you? Would you like English to be entirely clear and logical?

An Introduction to General Systems Thinking

Tuesday, January 04, 2011

The Universal Pattern of Huge Software Losses

What Do Failures Cost?
Some perfectionists in software engineering are overly preoccupied with failure, and most others don't rationally analyze the value they place on failure-free operation. Nonetheless, when we do measure the cost of failure carefully, we generally find that great value can be added by producing more reliable software. In Responding to Significant Software Events, I give five examples that should convince you.

The national bank of Country X issued loans to all the banks in the country. A tiny error in the interest rate calculation added up to more than a billion dollars that the national bank could never recover.

A utility company was changing its billing algorithm to accommodate rate changes (a utility company euphemism for "rate increases"). All this involved was updating a few numerical constants in the existing billing program. A slight error in one constant was multiplied by millions of customers, adding up to X dollars that the utility could never recover. The reason I say "X dollars" is that I've heard this story from four different clients, with different values of X. Estimated losses ranged from a low of $42 million to a high of $1.1 billion. Given that this happened four times to my clients, and given how few public utilities are clients of mine, I'm sure it's actually happened many more times.

I know of the next case through the public press, so I can tell you that it's about the New York State Lottery. The New York State legislature authorized a special lottery to raise extra money for some worthy purpose. As this special lottery was a variant of the regular lottery, the program to print the lottery tickets had to be modified. Fortunately, all this involved was changing one digit in the existing program. A tiny error caused duplicate tickets to be printed, and public confidence in the lottery plunged with a total loss of revenue estimated between $44 million and $55 million.

I know the next story from the outside, as a customer of a large brokerage firm:
One month, a spurious line of $100,000.00 was printed on the summary portion of 1,500,000 accounts, and nobody knew why it was there. The total cost of this failure was at least $2,000,000, and the failure resulted from one of the simplest known errors in COBOL coding: failing to clear a blank line in a printing area.

I know this story, too, from the outside, as a customer of a mail-order company, and also from the inside, as their consultant. One month, a new service phone number for customer inquiries was printed on each bill. Unfortunately, the phone number had one digit incorrect, producing the number of a local doctor instead of the mail-order company. The doctor's phone was continuously busy for a week until he could get it disconnected. Many patients suffered, though I don't know if anyone died as a result of not being able to reach the doctor. The total cost of this failure would have been hard to calculate except for the fact that the doctor sued the mail-order company and won a large settlement.

The Pattern of Large Failures
Every such case that I have investigated follows a universal pattern:

1. There is an existing system in operation, and it is considered reliable and crucial to the operation.

2. A quick change to the system is desired, usually from very high in the organization.

3. The change is labeled "trivial."

4. Nobody notices that statement 3 is a statement about the difficulty of making the change, not the consequences of making it, or of making it wrong.

5. The change is made without any of the usual software engineering safeguards, however minimal, that the organization has in place.

6. The change is put directly into the normal operations.

7. The individual effect of the change is small, so that nobody notices immediately.

8. This small effect is multiplied by many uses, producing a large consequence.

Whenever I have been able to trace management action subsequent to the loss, I have found that the universal pattern continues. After the failure is spotted:

9. Management's first reaction is to minimize its magnitude, so the consequences are continued for somewhat longer than necessary.

10. When the magnitude of the loss becomes undeniable, the programmer who actually touched the code is fired—for having done exactly what the supervisor said.

11. The supervisor is demoted to programmer, perhaps because of a demonstrated understanding of the technical aspects of the job. [not]

12. The manager who assigned the work to the supervisor is slipped sideways into a staff position, presumably to work on software engineering practices.

13. Higher managers are left untouched. After all, what could they have done?

The First Rule of Failure Prevention
Once you understand the Universal Pattern of Huge Losses, you know what to do whenever you hear someone say things like:

• "This is a trivial change."

• "What can possibly go wrong?"

• "This won't change anything."

When you hear someone express the idea that something is too small to be worth observing, always take a look. That's the First Rule of Failure Prevention:

Nothing is too small to be unworthy of observing.

It doesn't have to be that way
Disaster stories always make good news, but as observations, they distort reality. If we consider only software engineering disasters, we omit all those organizations that are managing effectively. But good management is so boring! Nothing ever happens worth putting in the paper. Or almost nothing. Fortunately, we occasionally get a heart-warming story such as Financial World telling about Charles T. Fisher III of NBD Corporation, one of their award-winning CEO's for the Eighties:

"When Comerica's computers began spewing out erroneous statements to its customers, Fisher introduced Guaranteed Performance Checking, promising $10 for any error in an NBD customer's monthly statement. Within two months, NBD claimed 15,000 new customers and more than $32 million in new accounts."

What the story doesn't tell is what happened inside the Information Systems department when they realized that their CEO, Charles T. Fisher III, had put a value on their work. I wasn't present, but I could guess the effect of knowing each prevented failure was worth $10 cash.

The Second Rule of Failure Prevention
One moral of the NBD story is that those other organizations do not know how to assign meaning to their losses, even when they finally observed them. It's as if they went to school, paid a large tuition, and failed to learn the one important lesson—the First Principle of Financial Management, which is also the Second Rule of Failure Prevention:

A loss of X dollars is always the responsibility of an executive whose financial responsibility exceeds X dollars.

Will these other firms ever realize that exposure to a potential billion dollar loss has to be the responsibility of their highest ranking officer? A programmer who is not even authorized to make a long distance phone call can never be responsible for a loss of a billion dollars. Because of the potential for billion dollar losses, reliable performance of the firm's information systems is a CEO level responsibility.

Of course I don't expect Charles T. Fisher III or any other CEO to touch even one digit of a COBOL program. But I do expect that when the CEOs realize the value of trouble-free operation, they'll take the right CEO-action. Once this happens, this message will then trickle down to the levels that can do something about it—along with the resources to do something about it.

Learning from others
Another moral of all these stories is that by the time you observe failures, it's much later than you think. Hopefully, your CEO will read about your exposure in these case studies, not in a disaster report from your office. Better to find ways of preventing failures before they get out of the office.

Here's a question to test your software engineering knowledge:
What is the earliest, cheapest, easiest, and most practical way to detect failures?

And here's the answer that you may not have been expecting:

The earliest, cheapest, easiest, and most practical way to detect failures is in the other guy's organization.

Over my half-century in the information systems business, there have been many unsolved mysteries. For instance, why don't we do what we know how to do? Or, why don't we learn from our mistakes? But the one mystery that beats all the others is why don't we learn from the mistakes of others?

Cases such as those cited above are in the news every week, with strong impact on the general public's attitudes about computers. But they seem to have no impact at all on the attitudes of software engineering professionals. Is it because they are such enormous losses that the only safe psychological reaction is, "It can't happen here (because if it did, I would lose my job, and I can't afford to lose my job, therefore I won't think about it)"?

(Adapted from Responding to Significant Software Events )
http://www.smashwords.com/books/view/35783

Wednesday, September 22, 2010

Have S/W Projects Hit a Wall?

I recently received an interesting set of questions about software projects from a French science journalist. I thought my readers would like to see those questions and my answers, so here goes:

Q: Do large software projects fail at a rate significantly higher than other engineering projects in physical world (also quite complex!)

A: Well, as far as total failure, yes, I think s/w projects fail totally more than, say, building ships.

OTOH, the US Navy reported a few years ago that every ship built since World War II has been late and over budget, so that type of "failure" is 100%, even though we've been building ships for hundreds of years. The Wasa (or Vasa) Ship in Sweden is a good historical example of one reason for failure: the piling on of requirements until complexity is too great.

See http://en.wikipedia.org/wiki/Vasa_(ship)

Q: Do we know exactly why ? Is it a management problem or theoretical problem ?

All the failures I have studied have been management failures. In some cases, the theory might have been wrong, but management failed to notice the signs of impending failure, or noticed them but failed to act in time to prevent the project from becoming a death march.

Q: Have we reached now a critical size of dependable verifiable code, something like a "wall of complexity" ?

A: Such a wall definitely exists, though its thickness is somewhat fuzzy. Some well-managed projects can surpass the "wall" that stops poorly managed projects. But eventually, at any given state of the art, there will be a "wall," and as the project approaches its particular wall, progress becomes sluggish and expensive. When that happens, good managers recognize what's happening and take action--generally pulling back on some of the excess "requirements."

Q: Does it mean that "Internet of things", or other big things "real time" (like FAA air traffic) we would like to build with high reliability, are not really possible for the moment?

I first worked with the FAA in the late 1950s, trying to help them build the air traffic control system of their dreams. It wasn't possible then to implement their dreams, and it's still not possible. Why? One reason is that their dreams keep growing faster than our ability to implement them. They could build a better system, but when they try to build "the best system for all time," they collapse like the Wasa Ship.

Q: Is there a lack of a theory of building large-scale software (like rules governing civil engineering in physical world) ? Is it because computer science is still a relatively young science ?

A: There is a total lack of theory, but there are some empirical principles gained from experience. I've tried to catalog these principles in my Quality Software Management series (see below).

The trouble with computer science is that it's not a science, but generally a kind of mathematics without connection with the empirical world.

Rules governing civil engineering come largely from real-world experience, followed up by some scientific work in a few areas, like the properties of building materials. In computing, much of what we do know is simply not know to most developers, who are too busy trying to salvage poorly managed projects.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So, how do my answers compare with yours. Am I feeling too pessimistic, or optimistic?

Tuesday, August 17, 2010

Attendance Too Regular? Try This!

Inspired by Ajay Balamurugadas's blog at

http://enjoytesting.blogspot.com/2010/08/aware-of-other-side-of-your-application.html

The title was Enjoy Testing

which starts with:

"For the past few months, I left office at sharp 6 p.m. I felt I should not invest more hours just because someone's estimate was wrong. So, I always took the 6 p.m. cab to home instead of the 8 p.m. or 10 p.m. cab."

And Michael Bolton commented:

"...I perceive that resolution to the trickiest part of the problem starts with recognizing people."

Michael is right. It starts with people. And guess who is the first person to recognize?

Yourself, of course.

The very first thing that struck me about the (quite fine) post was the regularity with which you come and go to work. Ordinarily, such regularity is a highly valued trait. For example, people can count on knowing when you'll be there and when you won't. Very good contribution to communication--and thus very high on every tester's list.

However, as an experienced tester, you already know that too regular, too predictable, behavior is a way to miss a great many bugs--and that's true of the regularity in attendance, too.

I would suggest you come in a couple of hours early on some random day next month, and (on a different day, probably) leave quite late. And, if you have people who work night shifts, arrange to be around for one or two of those.

I probably didn't have to explain why, but some of Ajay's readers may be less experienced than others. Experienced testers can probably all tell stories of when they came in early or left late (or were somewhere they weren't usually expected to be, or even prohibited to be) and because of that noticed something that led to a bug they never would have seen otherwise. (Perhaps something they were totally unaware of.)

I myself can tell many such stories, including one that may well have saved astronauts' lives, so I regularly practice being somewhat irregular in my behavior as a consultant (yes, I know that's a paradox).

Friday, February 27, 2009

My Favorite Bug

Michael Hunter asked me to write something for his blog about "my favorite bug." At first I thought it was a silly assignment, but, as usual, once I sat down to write, I learned a few things. I don't know when (or if) it will appear on his blog, so I'm putting it here because I think it's too good not to share. (If you think I lack humility, consider the subject: how stupid I can be.)

My First Commercial Program

My favorite bug is also my first bug. It's my favorite because it started me on the road to learning the most important things about programming and testing.

It was the first commercial program I had ever written--a pipeline network analyzer for municipal water systems. (note 1) Essentially, it solved systems of non-linear algebraic equations, by iteration.

It passed all my prepared tests (test first, back in 1956), so we brought in civil engineers with data from a city near San Francisco (not San Jose, which barely existed back then). We set up and started to run the first set of data. Then we waited while the IBM 650 ground away. Thirty minutes--surpassing my largest test case. One hour--surpassing my wildest imagination. Two hours--making me suspect the program was in a closed loop.

I got up from the table where we'd been checking the input data. I was about to stop the machine and find out where the program had gone wrong, but before I reached the machine, it began to punch cards. (We had no on-line printing in those days.) The cards contained the solution, exactly according to spec.

So, what was the bug?

There was nothing wrong with the computation, but the design of the human interface was bad, bad, bad. I had not estimated the performance characteristics as the number of equations grew. I had not accounted for human patience (or impatience) and tolerance (or intolerance) for uncertainty. The case that "looped" was the smallest of our cases. The largest would have run something like twenty hours.

I fixed the bug by having the program punch out a "progress card" after every complete iteration. The card contained a set of numbers that characterized the convergence so far. From the sequence of cards, we could observe the rate of convergence and decide if we wanted the program to continue. If we wanted to stop, we could set a switch on the console and force the program to punch out the full set of values so far--in the input format, so we could resume later, if we wished. (Having no breakpoint in a program that could run twenty hours was another design bug.)

Learnings

After than experience, I've never failed to consider

a. the relationship between a program and its user

b. the potential performance characteristics of the program

c. the possibility that an error might occur (hardware or software or human) that could cost us thousands of dollars of wasted computer and human time.

d. that I ought to be more humble about my programming skills. Before that, I had never made an error in one of perhaps ten small programs I had written, and I thought I was pretty terrific. This humiliating experience made me appreciative of a good testing process (We didn't have "testers" in those days. We didn't even have "programmers." My title was "Applied Science Representative."

This was the beginning of a long career of learning about programmers, programming, and testing. Many of those learnings are captured in my book, Perfect Software and Other Illusions About Testing (note 2)

Well, it wasn't my last bug, but it was first, and my best.

Notes

(note 1) Weinberg, Gerald M., and Lyle N. Hoag. "Pipeline Network Analysis by Electronic Digital Computer." Journal of the American Water Works Association 49, no. 5 (1957).

(note 2) Perfect Software (And Other Illusions About Testing)
http://www.dorsethouse.com/savings.html#perf

Monday, February 23, 2009

Three Lessons from a Thirty-Year Bug

Reader Michael Bolton writes:

I'm reading General Principles of Systems Design, and enjoying it. I'm confused by something, and I think it's because of an error in the text.

On page 106, there's a matrix that is intended to describe the bathtubs illustrated in Figure 5.1 and diagrammed in figure 5.2. My interpretation is that the last row of the matrix should read

0 0 1 1 0

The text suggests

0 1 1 1 0

I interpret this as meaning that bathtub 1 could supply water directly to K, the sink; but neither Figure 5.1 nor 5.2 suggest that. Am I misunderstanding, or is there an error?

My response

It's an error in the text, previously unreported.

Seems as though it's been sitting there for 30 years and tens of thousands of readers.

Moral Number One:

Several morals, but to me, the most important one is about testability. Figures 5.1 and 5.2 are in the previous chapter, and difficult to look at while you're looking at the matrix. This makes testing quite difficult. I should have repeated one of the figures so it appeared on the same page as the matrix.

Moral Number Two:

In writing, as in software development, there's no such thing as perfection. (For more on this subject, see my book, Perfect Software, and other illusions about testing.) Just because nobody's found a bug for 29 years doesn't mean one won't turn up in year 30. If you start believing in perfection, you may be in for a nasty shock.

Moral Number Three

: It would have been easy to blame my readers for being careless, inattentive, or just plain dumb not to have detected and reported this bug (which actually appears twice). If I did that, however, I would have shielded myself from learning the first moral, which puts the responsibility squarely on me. If you don't take responsibility for your mistakes, learning doesn't happen.

Slow Learning is Better than No Learning

So, that's three lessons in thirty years. I'm a pretty slow learner, but at least three is better than zero. So, I'm going to be proud of myself for learning at all.