Showing posts with label systems. Show all posts
Showing posts with label systems. Show all posts

Saturday, January 06, 2018

New: #System #Design #Heuristics

You'd think that after publishing books for half a century, I'd know how to write a book. If that's what you think, you'd be wrong.

Sure, I've even written a book on writing books (Weinberg on Writing, the Fieldstone Method), and I've applied those methods to dozens of successful books. But way back around 1960, I started collecting notes on the process of design, thinking I would shortly gather them into a book. Back then, I didn't call these bits and pieces "fieldstones," but that's what they turned out to be: the pieces that, when assembled properly, would ultimately become my design book.

Ultimately? Assembled properly? Aye, there's the rub!

Building walls from randomly found fieldstones requires patience. So does writing books by the Fieldstone Method. My Introduction to General Systems Thinking took fourteen years to write. But a writer only lives one lifetime, so there's a limit to patience. I'm growing old, and I'm beginning to think that fifty years is as close to "ultimately" as I'm going to get.

So, I've begun to tackle the task of properly assembling my collection of design fieldstones. Unfortunately, it's a much larger collection that I'd ever tackled before. My Mac tells me I have more than 36,000,000 digitized bytes of notes. My filing cabinets told me I had more than twenty-five pounds of paper notes, but I've managed to digitize some of them and discard others, so there's only a bit more than ten pounds left to consider.

For the past couple of years, I've periodically perused these fieldstones and tried to assemble them "properly." I just can't seem to do it. I'm stuck.

Some writers would say I am suffering from "writer's block," but I believe "writer's block" is a myth. I've published three other books in these frustrating years, so I can't be "blocked" as a writer, but just over this specific design book. You can hear me talk more about the Writer's Block myth on YouTube 

[https://www.youtube.com/watch?v=77xrdj9YH3M&t=7s]

but the short version is that "blocking" is simply a lack of ideas about how to write. I finally decided to take my own advice and conjure up some new ideas about how to write this design book.

Why I Was Stuck

To properly assemble a fieldstone pile, I always need an "organizing principle." For instance, my recent book, Do You Want to Be a (Better) Manager? is organized around the principle of better management. Or, for my book, Errors, the principle is actually the title.  So, I had been thinking the organizing principle for a book on design ought to be Design

Well, that seemed simple enough, but there was a problem. Everybody seemed to know what design is, but nobody seemed able to give a clear, consistent definition that covered all my notes. I finally came to the conclusion that's because "design" is not one thing, but many, many different things.

In the past, I ran a forum (SHAPE: Software as a Human Activity Practiced Effectively) whose members were among the most skilled software professionals in the world. We held a number of threads on the subject, "What is Design?" The result was several hundred pages of brilliant thoughts about design, all of which were correct in some context. But many of them were contradictory.

Some said design was a bottom-up process, but others asserted it was top-down. Still others talked about some kind of sideways process, and there were several of these. Some argued for an intuitive process, but others laid out an algorithmic, step-by-step process. There were many other variations: designs as imagined (intentional designs), designs as implemented, and designs as evolved in the world. All in all, there were simply too many organizing principles—certainly too many to compress into a title, let alone organize an entire book.

After two years of fumbling, I finally came up with an idea that couldn't have been implemented fifty years ago: the book will be composed of a variety of those consulting ideas that have been most helpful to my clients' designers. I will make no attempt (or very little) to organize them, but release them incrementally in an ever-growing ebook titled Design Heuristics.

How to Buy System Design Heuristics

My plan for offering the book is actually an old one, using a new technology. More than a century ago Charles Dickens released many of his immortal novels one chapter at a time in the weekly newspaper. Today, using the internet, I will release System Design Heuristics a single element at a time to subscribing readers.

To subscribe to the book, including all future additions, a reader will make a one-time payment. The price will be quite low when the collection is small, but will grow as the collection grows. That way, early subscribers will receive a bargain in compensation for the risk of an unknown future. Hopefully, however, even the small first collection will be worth the price. (If not, there will be a full money-back guarantee.)

Good designs tend to have unexpected benefits. When I first thought of this design, I didn't realize that it would allow readers to contribute ideas that I might incorporate in each new release. Now I aware of that potential benefit, and look forward to it.

Before I upload the first increment of System Design Heuristics, I'll wait a short while for feedback on this idea from my readers. If you'd like to tell me something about the plan, email me or write a comment on this blog.

Thanks for listening. Tell me what you think.

Sunday, November 26, 2017

How Do I Decide Between appX and appY?

Hardly a day goes by without some developer or tester asking me about some tools or applications. These could be any tools or apps, so let's call them X and Y.

Usually, the question is simple, but asked with heart-stopping urgency:

"Is X better than Y?"

Rather than provide an answer, I tell them they would be better off not asking such "better than?" questions.

Software apps and tools are complex systems. Consequently any X-Y pair will differ on a number of dimensions. X will be better on some; Y will be better on others. Or both will be useless or poor for your needs.

If you're choosing a tool or an app, start with assessing your needs. Then, instead of asking which is better, ask

"Which fits my needs better, X or Y?"

If neither one fits you needs, then look for a third alternative, or a fourth.

In the rare case when both X and Y fit your needs, you might meaningfully ask, "Which is better—for me, at this moment?"

If X and Y still seem equal, then flip a coin. Heads, take X. Tails, take Y.

Then, while the coin is in the air, your mind will usually make the decision, not willing to allow the coin drop to make the decision for you.

But, if your mind doesn't decide, then let the coin drop decide. At that point, it shouldn't matter.

But if you reach this point, wait a moment before you choose X or Y. During that moment, consider the following two questions:

Can I take both X and Y?


What about Z? Is there some third alternative I haven't considered?


Indeed, instead of asking "which is better" questions, ask, "What is the problem I'm trying to solve?"

Are Your Lights On?: How to know what the problem really is?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          

Sunday, April 02, 2017

Complexity: Why We Need General Systems Thinking


It isn’t what we don’t know that gives us trouble, it’s what we know that ain’t so. - Will Rogers

The first step to knowledge is the confession of ignorance. We know far, far less about our world than most of us care to confess. Yet confess we must, for the evidences of our ignorance are beginning to mount, and their scale is too large to be ignored!

If it had been possible to photograph the earth from a satellite 150 or 200 years ago, one of the conspicuous features of the planet would have been a belt of green extending 10 degrees or more north and south of the Equator. This green zone was the wet evergreen tropical forest, more commonly known as the tropical rain forest. Two centuries ago it stretched almost unbroken over the lowlands of the humid Tropics of Central and South America, Africa, Southeast Asia and the islands of Indonesia.

... the tropical rain forest is one of the most ancient ecosystems ... it has existed continuously since the Cretaceous period, which ended more than 60 million years ago. Today, however, the rain forest, like most other natural ecosystems, is rapidly changing. ... It is likely that, by the end of this century very little will remain. - Karl Deutsch 

This account may be taken as typical of hundreds filling our books, journals, and newspapers. Will the change be for good or evil? Of that, we can say nothing—that is precisely the problem. The problem is not change itself, for change is ubiquitous. Neither is the problem in the man-made origin of the change, for it is in the nature of man to change his environment. Man’s reordering of the face of the globe will cease only when man himself ceases.

The ancient history of our planet is brimful of stories of those who have ceased to exist, and many of these stories carry the same plot: Those who live by the sword, die by the sword. The very source of success, when carried past a reasonable point, carries the poison of death. In man, success comes from the power that knowledge gives to alter the environment. The problem is to bring that power under control.

In ages past, the knowledge came very slowly, and one man in his life was not likely to see much change other than that wrought by nature. The controlled incorporation of arsenic into copper to make bronze took several thousand years to develop; the substitution of tin for the more dangerous arsenic took another thousand or two. In our modern age, laboratories turn out an alloy a day, or more, with properties made to order. The alloying of metals led to the rise and fall of civilizations, but the changes were too slow to be appreciated. A truer blade meant victory over the invaders, but changes were local and slow enough to be absorbed by a million tiny adjustments without destroying the species. With an alloy a day, we can no longer be sure.

Science and engineering have been the catalysts for the unprecedented speed and magnitude of change. The physicist shows us how to harness the power of the nucleus; the chemist shows us how to increase the quantity of our food; the geneticist shows us how to improve the quality of our children. But science and engineering have been unable to keep pace with the second-order effects produced by their first-order victories. The excess heat from the nuclear generator alters the spawning pattern of fish, and, before adjustments can be made, other species have produced irreversible changes in the ecology of the river and its borders. The pesticide eliminates one insect only to the advantage of others that may be worse, or the herbicide clears the rain forest for farming, but the resulting soil changes make the land less productive than it was before. And of what we are doing to our progeny, we still have only ghastly hints.

Some have said the general systems movement was born out of the failures of science, but it would be more accurate to say the general systems approach is needed because science has been such a success. Science and technology have colonized the planet, and nothing in our lives is untouched. In this changing, they have revealed a complexity with which they are not prepared to deal. The general systems movement has taken up the task of helping scientists unravel complexity, technologists to master it, and others to learn to live with it.

In this book, we begin the task of introducing general systems thinking to those audiences. Because general systems is a child of science, we shall start by examining science from a general systems point of view. Thus prepared, we shall try to give an overview of what the general systems approach is, in relation to science. Then we begin the task in earnest by devoting ourselves to many questions of observation and experiment in a much wider context. 

And then, having laboriously purged our minds and hearts of “things we know that ain’t so,” we shall be ready to map out our future general systems tasks, tasks whose elaboration lies beyond the scope of this small book.

[Thus begins the classic, An Introduction to General Systems Thinking]

Wednesday, November 09, 2016

How do I choose the right career?

The question was, "How do I choose the right career?"

My answer was, "You can’t."

Other responders told you things about how to choose your right JOB, but a job is not a career. Maybe before the 21st century, the world of work was sufficiently stable that one could choose a career, but not longer.

For instance, I’m an old guy so I’ve had sort of a career—in computing. But back in the 1940s, when I asked this question, computers didn’t even exist. At least, none of my career counselors knew of them.

And even for the 20th century, I’ve had a rather stable career. My wife, on the other hand, started out to be a concert pianist, then became a musicologist, then a piano teacher, then an anthropologist, then a management consultant, then a world-class dog trainer, and right now is an animal behavior specialist. She works primarily with canines, but until she was 33 years old, she was deathly afraid of dogs.


In other words, don’t try to choose the right career, but prepare yourself for choosing many careers throughout your working life. Learn the fundamental skills that will serve you well in all your future careers, whatever you choose, whenever you choose them—people skills, problem solving, and systems thinking are what come to my mind as things you'd need in all careers. 

That's why I've studied these things, teach them in workshops, and write books about them.

Tuesday, September 06, 2016

Preventing a Software Quality Crisis



Abstract
Many software development organizations today are so overloaded with quality problems that they are no longer coping with their business of developing software. They display all the classic symptoms of overloaded organizations—lack of problem awareness, lack of self-awareness, plus characteristic behavior patterns and feelings. Management may not recognize the relationship between this overload and quality problems stemming from larger, more complex systems. If not, their actions tend to be counterproductive. In order to cure or prevent such a crisis, management needs to understand the system dynamics of quality.

Symptoms of Overload Due to Poor Quality
In our consulting work, we are often called upon to rescue software development operations that have somehow gotten out of control. The organization seems to have slipped into a constant state of crisis, but management cannot seem to pin the symptoms down to one central cause. Quite often, that central cause turns out to be overload due to lack of software quality, and lack of software quality due to overload.

Our first job as consultants is to study symptoms. We classify symptoms of overload into four general categories—lack of problem awareness, lack of self-awareness, plus characteristic patterns of behavior and feelings. Before we  describe the dynamics underlying these symptoms, lets look at some of them as they my be manifest in a typical, composite organization, which we shall call the XYZ corporation.

Lack of Problem Awareness
All organizations have problems, but the overloaded organization doesn't have time to define those problems, and thus has little chance of solving them:

1. Nobody knows what's really happening to them.

2. Many people are not even aware that there is a system-wide problem.

3. Some people realize that there is a problem, but think it is confined to their operation.

4. Some people realize that there is a problem, but think it is confined to somebody else's operation.

5. Quality means meeting specifications. An organization that is experiencing serious quality problems may ignore those problems by a strategy of changing specifications to fit what they actually happen to produce. They can then believe that they are "meeting specifications." They may minimize parts of the specification, saying that it's not really important that they be done just that way. Carried to an extreme, this attitude leads to ignoring certain parts of the specification altogether. Where they can't be ignored, they are often simply
forgotten.

6. Another way of dealing with the overload is to ignore quality problems that arise, rather than handling them on the spot, or at least recording them so others will handle them. This attitude is symptomatic of an  organization that needs a top-to-bottom retraining in quality.

Lack of Self-Awareness
Even when an organization is submerged in problems, it can recover if the people in the organization are able to step back and get a look at themselves, In the chronically overloaded organization, people no longer have the means to do this. They are ignorant of their condition, and they have crippled their means of removing their ignorance:

7. Worse than not knowing what is going on is thinking you know, when you don't, and acting on it. Many managers at XYZ believe they have a grip on what's going on, but are too overloaded to actually check. When the reality is investigated, these managers often turn out to be wrongs For instance, when quizzed about testing methods used by their employees, most managers seriously overestimate the quality of testing, when compared with the programmers' and testers' reports.

8. In XYZ) as in all overloaded organizations, communication within and across levels is unreliable and slow. Requests for one kind of information produce something else, or nothing at all. In attempting to speed up the work, people fail to take time to listen to one another, to write things down, or to follow through on requested actions.

9. Many individuals at XYZ are trying to reduce their overload by isolating themselves from their co-workers, either physically or emotionally. Some managers have encouraged their workers to take this approach, instructing them to solve problems by themselves, so as not to bother other people.

10. Perhaps the most dangerous overload reaction we observed was the tendency of people at XYZ to cut themselves off from any source of information that might make them aware of how bad the overload really is. The instant reaction to any new piece of information is to deny it, saying there are no facts to substantiate it. But no facts can be produced because the management has studiously avoided building or maintaining information systems that could contradict their claims that "we.just know what's going on." They don't know, they don't know they don't know, and they don't want to know they don't know. They're simply too busy.

Typical Behavior Patterns
In order to recognize overload, managers don't have to read people's minds. They can simply observe certain characteristic things they do:

11 . The first clear fact that demonstrates overload is the poor quality of the products being developed. Although it's possible to deny this poor quality when no measurements are made of the quality of work in progress, products already delivered have shown this  poor quality in an undeniable way.

12. All over the organization, people are trying to save time by short-circuiting those procedures that do exist. 'This tactic may occasionally work in a normal organization faced with a short-term crisis, but in XYZ, it has been going on for so long it has become part of standard operating procedure.

13. Most people are juggling many things at one time, and thus adding coordination time to their overload. In the absence of clear directives on what must be done first, people are free to make their own choices. Since they are generally unaware of the overall goals of the organization, they tend to suboptimize, choosing whatever looks good to them at the moment.

14. In order to get some feeling of accomplishment, when people have a choice of tasks to do, they tend to choose the easiest task first, so as to "do something." This decision process gives a short-term feeling of relief, but in the long term results in an accumulation of harder and harder problems.

15. Another way an individual can relieve overload is by passing problems to other people. As a result, problems don't get solved, they merely circulate. Some have been circulating for many months.

16. Perhaps the easiest way to recognize an overloaded organization is by noticing how frequently you hear People say, in effect, that they recognize that the way they're working is wrong, but they "have no time to do it right." This seems almost to be the motto of the XYZ organization.

Typical Feelings
If you wait for measurable results of overload, it may be too late. But its possible to recognize an overloaded organization through various expressions of peoples' feelings:

17. An easy way to recognize an overloaded organization is by the general atmosphere in the workplace. In many areas at XYZ there was no enthusiasm, no commitment, and no intensity. People were going through the motions of working, with no hope of really accomplishing their tasks.

18. Another internal symptom of overload is the number of times people expressed the wish that somehow the problem would just go away. Maybe the big customer will cancel the contract. Maybe the management will Just slip the schedules by a year. Maybe the sales force will stop taking more orders. Maybe the company will fail and be purchased by a larger company.

19. One common way of wishing the problem would go away is to choose a scapegoat, who is the personified source of all the difficulties, and then wish that this person would get transferred, get fired, get sick, or quit. At XYZ, there are at least ten different scapegoats—some of whom have long gone, although the problems still remain.

20. Perhaps the ultimate emotional reaction to overload is the intense desire to run away. When there are easy alternatives for employees, overload is followed by people leaving the organization, which only increases the overload. The most perceptive ones usually leave first. When there are few attractive opportunities outside, as at XYZ, then people "run away" on the job. They fantasize about other jobs, other places, other activities, though they don't act on their fantasies. Their bodies remain, but their hearts do not.

The Software Dynamics of Overload
There are a number of reasons for the overload situation at organizations like XYZ, but underlying everything is the quality problem, which in turn arises from the changing size and complexity of the work. This means that simple-minded solutions like adding large numbers of people will merely make the problems worse. In order for management to create a manageable organization, they will have to understand the dynamics of quality. In particular, they will have to understand how quality deteriorates, and how it has deteriorated in their organization over the years. The XYZ company makes an excellent case study.

The quality deterioration at XYZ has been a gradual effect that has crept up unnoticed as the size and complexity of systems has increased. The major management mistake has been lack of awareness of software dynamics, and the need for measurement if such creeping deterioration is to be prevented.

The quality deterioration experienced at XYZ is quite a common phenomenon in the software industry today, because management seems to make the same mistakes everywhere—they assume that the processes that would produce quality small systems will also produce quality large systems. But the difficulty of producing quality systems is exponentially related to system size and complexity, so old solutions quickly become inadequate. These dynamics have been studied by a number of software researchers, but it is not necessary to go fully into them here. A few examples will suffice to illustrate specifically what has been happening at XYZ and the kind of actions that are needed to reverse the situation.

NOTE: The remaining two-thirds of this article describes these dynamics and corrective actions, and can be found as a new chapter in the book, 


The book also details a number of the most common and distressing management problems, along with dozens of positive responses available to competent managers.

Saturday, March 19, 2016

Tale of the Recent Gravity Wave Discovery GW150914

[This is a guest blog by Mark G. Gray, a physicist who understands people and many other things. Reading it put a whole lot of perpective in my life.]




Thirteen hundred million years ago in a galaxy thirteen billion trillion kilometers away, a small dark sphere looms ominously near a slightly larger dark sphere.

The small sphere is one hundred thirty kilometers in diameter.  Its surface is neither solid, nor liquid, nor gas, nor plasma.  Nor is it visible except by absence; no light passes through, nor is emitted from, nor reflected by it.  But its presence is felt throughout the universe: any unfortunate structure within a few hundred kilometers of it would be torn apart by tidal forces, with the pieces glowing X-rays as they fall into the surface and disappear from the universe.  Even the light from distant stars streaming around its edges is twisted to form a distorted halo.

The large sphere is nearly identical to the small sphere, but is one hundred seventy kilometers in diameter.

Although they pass within a few hundred kilometers of each other, the spheres do not tear each other apart.  Instead they twirl around a point approximately midway between them, moving closer and faster on each orbit, past the point where their diameters overlap, finally rotating seventy-five times per second.





















When the smaller's center collides with the larger's surface, there is no sound, no light, no X-rays, no ejection of debris, nothing to indicate a collision in the conventional sense, but instead just a wobble in the merged dark shapes and a ripple in space-time that alternately doubles and halves nearby lengths relative to widths as it passes.

The remnant of the encounter is a single dark sphere three hundred seventy kilometers in diameter, a spherical wave in space-time expanding at the speed of light, and perhaps the explosion of a nearby star triggered as the wave passes through it.

Meanwhile, only two hundred forty thousand trillion kilometers from the center of the Milky Way galaxy on its Orion arm, the planet Earth is in the middle of its Mesoproterozoic era.  The super-continent Rodinia has just formed from three pre-existing continents. Eukaryotes, cells with a well defined nucleus and organelles have emerged, but not yet evolved into multi-cellular life.  The Moon, which is still geologically active, orbits the Earth in a little over three weeks.

Two hundred thousand years ago the wave front reaches the Small Magellanic Cloud, a dwarf galaxy in the Milky Way's neighborhood.  The planet Earth is in the late Pleistocene epoch of its Cenozoic era. The seven contemporary continents are in place, the glaciers are in retreat, and modern humans have just emerged and invented agriculture. The Moon, geologically dead for over a billion years, orbits the Earth in a little under four weeks.





One hundred years ago, as the wave front passes through the stars in the Milky Way's Hydrus constellation, the human Albert Einstein uses his theory of General Relativity to show that accelerating masses can produce gravitational waves in space-time.  Karl Schwarzschild publishes the first solution to Einstein's General Relativity for a spherical mass.

Seventy-seven years ago J. Robert Oppenheimer uses S. Chandrasenkahr's work on stellar deaths to predict massive stars that have exhausted their nuclear fuel would collapse under their own weight to form a singularity.

Fifty-eight years ago David Finkelstein uses Schwarzschild's solution to show Oppenheimer's singularity would be surrounded by a spherical event horizon, a black hole, from which nothing, not even light, can escape.

Fifty-four years ago M. E. Gertsenshtein and V. I. Pustoviot describe how interfering perpendicular beams of correlated light can detect gravitational waves.

Thirty-two years ago Kip Thorne, Ronald Drever, and Rainier Weiss establish the Laser Interferometer Gravitational-wave Observatory (LIGO).

Twenty-eight years ago they secure funding for LIGO.

Twenty-two years ago LIGO construction begins.

Fourteen years ago LIGO becomes operational.  LIGO operates for eight years without seeing a gravitational wave.

Six years ago LIGO is shut down for improvements.  The gravitational wave moves among our sun's nearest neighbor stars.

On September 12, 2015 the Advanced LIGO starts its first operational run, with just enough sensitivity to detect the gravitational wave that is now about four times further from Earth than Voyager 1.

At 09:50:45 UTC on September 14, 2015 the Advanced LIGO at Livingston, Louisiana detects the gravitational wave when its four kilometer length oscillates relative to its four kilometer width by a fraction of the size of a subatomic particle.  Several thousandths of a second later and three thousand two kilometers away, the Advanced LIGO at Hanford, Washington detects the gravitational wave.  The signal, designated GW150914, cycles eight times, increasing in both intensity and frequency, until it reaches an intense chirp at its crescendo.

On February 11, 2016 the Advanced LIGO team announces their detection of a gravitational wave.  The coincidence of the signal at the two detectors implies a non-local source.  The similarity of the two signals implies the detection of a single, real event.  The time difference between the signals triangulates a direction, and the red-shift of the signal gives a distance to the source.  The spectrum of the signal matches general relativity's prediction for the inspiral and merger of binary black holes and lets them reconstruct what happened:

    A black hole twenty-nine times the mass of our sun encounters
    another black hole thirty-six times the mass of our sun.  As the
    black holes scatter around their center-of-mass by mutual
    gravitational attraction, they lose kinetic energy radiated away
    as gravitational waves.  The binary black holes, now trapped in
    orbit, centripetally accelerate around their center, radiating
    more gravitational waves, losing more energy, moving ever
    closer together and orbiting ever faster.  They finally merge,
    emitting a blast of gravitational waves, to form a single black
    hole about sixty-two times the mass of the sun, with three solar
    masses converted entirely to gravitational wave energy in a
    spherical front moving outward at the speed of light.  At its peak
    the merger produces several times more power than all the stars
    in the observable universe.

p.s. No, the GW doesn't stand for Gerald Weinberg, nor for anything as small as our Earth or even our little corner of the Universe.

Sunday, October 11, 2015

One Little Change

Note: From time to time, I will be adding material to my new book, Errors: Bugs, Boo-boos, Blunders. All purchasers of the book from Leanpub.com will receive all of the new material free of additional charge. The following chapter will be added soon, along with some feedback from my earliest readers.

Dani, my wife, is an anthropologist by profession, but now has become a world-class dog trainer. <http://home.earthlink.net/~hardpretzel/DaniDogPage.html> The combination of the two produces some interesting ideas. For instance, she told me about the way attack dogs are trained to keep them from being dangerous. As usual, the big problem with attack dogs is not the dogs, but the people.

When an untrained person hears that a dog is attack-trained, chances are about one in three that they'll turn to the dog and command, "Kill!" As a joke. Or just to see what the dog will do. To protect against this idiotic human behavior, trainers never use command words like "kill." Instead, they use innocent words, like "breathe," that would never be given in jest in a command voice.

This kind of protection is needed because a trained dog is an information processing machine, in some ways very much like a computer. A single arbitrary command could mean anything to a dog, depending how it was trained—or programmed. The arbitrariness does not matter much if it's not an attack dog. The owner may be embarrassed when Rover heels on the Stay command, but nothing much is lost. If, however, Rover is trained to go for the throat, it's an entirely different matter.

It's the same with computers. Because they are programmed, and because the many meanings in a program are arbitrary, a single mistake can turn a helpful computer into one that can attack and kill an entire enterprise. That's why I've never understood managers who take a casual approach to software maintenance. Time and again I hear managers explain that the maintenance can be done by less intelligent people operating without all the formal controls of development—because it's not very critical. And no amount of argument seems able to convince them differently—until they have a costly maintenance blunder.

Fortunately, costly maintenance blunders are rather common, so some managers are learning—but the tuition is enormous. I keep a list of the world's most expensive programming errors, and all of the top ten are maintenance blunders. Some have cost over a billion dollars each, and some have lead to deaths. Often, the blunder involved changing a single digit in a previously functioning program.

In those horrendous losses, the change was deemed "so trivial" it was instituted casually by a supervisor telling a low-level maintenance programmer to "change that digit"—with no written instructions, no test plan, nobody to review the change, and, indeed, no controls whatsoever between the one programmer and the organization's day-to-day operations. It was exactly like having an attack dog trained to respond to KILL—or even HELLO.

I've done some studies, confirmed by others, about the chances of maintenance changes being done incorrectly. Contrary to simple-minded intuition, it turns out that "tiny" changes are more likely than larger ones to be flawed. Roughly, a one-line change has about a 50/50 chance of producing an error, while a 20-line change is similarly wrong only about one-third of the time.

Developers are often shocked to see this high one-line rate, for two reasons. In the first place, development changes are simpler because they are being made to cleaner, smaller, better structured code—code that has not been changed many times before so does not have unexpected linkages. Such linkages were involved in many of my top-ten disasters.

Secondly, the consequences of an erroneous change during development are smaller because the error can be corrected without affecting real operations. Thus, developers don't take that much notice of their errors, so they tend to underestimate their frequency. In development, you simply fix errors and go on your merry way. Not so in maintenance, where you must mop up the damage the error causes. Then you spend countless hours in meetings explaining why such an error will never happen again—until the next time.

For these two reasons, developers interpret such high rates of maintenance errors as indications of the ignorance or inexperience of maintenance programmers. They're wrong. Maintenance programmers are perfectly capable of doing better work than their record with tiny changes seems to indicate. Their competence is proved by the decrease in error rates as the size of the change increases.

If tiny changes are not taken seriously, they are done carelessly and without proper controls. A higher rate of error is an inevitable consequence.

How many times have you heard a developer say, "No problem! It's just a small change. All I have to do is change one line!"?

That statement would be sensible if "small" changes were truly small—if software maintenance were actually like maintenance of an apartment building. The janitor can change one washer in a dripping sink with much risk of causing the building to collapse. It's not safe to make the same assumption for a program once it's in production.

Whoever coined the term "maintenance" for computer programs was as careless and unthinking as the person who trains an attack dog to kill on the command, KILL. With the wisdom of hindsight, I would suggest that a maintenance programmer is more like a brain surgeon than a janitor. Would maintenance be easier to manage well if it were called "software brain surgery"?


Think about it this way. Suppose you had a bad habit—like saying KILL to attack dogs. Would you go to a brain surgeon and say, "Just open up my skull, Doc, and remove that one little habit. It's just a small change! Just a little maintenance job!"

Wednesday, October 05, 2011

Medium-number Systems

A reader recently wrote asking about Medium Number systems. I thought other readers of An Introduction to General Systems Thinking might be interested in my answers to his questions:

Reader: I'm interested in learning more about the "Law of Medium Numbers" described in your book of An Introduction to General Systems Thinking. I feel the "Law of Medium Numbers" sheds light on many puzzles I have run into in dealing with nature and society. Thus I wonder whether I may ask you a few related questions? -

1. As I searched the literature trying to learn more about this subject and the "Law of Medium Numbers", I was surprised by how little I could find. I wonder whether I was not doing right searches or you may have more insights into this?

Jerry: It's not a popular subject with many scientists, because it shows how science—though amazingly successful in some areas—it's awfully limited in dealing with some of the most interesting problems. As someone pointed out years ago, "Physics is merely the study of those systems for which the methods of physics work."

Reader: 2. I wonder whether the "Law of Medium Numbers" might imply a kind of defiance of classical Western science and technology (which emphasize certainty and perfection)?

Jerry: That's well put. We'd like things to be different, but they aren't.

Reader: 3. Perhaps human individuals are medium number systems and thus may explain why we all have certain limitations (in other words, no one is perfect). So medium number systems may also be called imperfect systems. Am I right on this?

Jerry: Precisely right, except it's not the systems that are imperfect, but our understanding of them. You can see this idea worked out in detail in my book on software testing: Perfect Software and Other Illusions about Testing. It's extremely popular among software testers. They use it to explain to their customers what amounts to the Law of Medium Numbers applied to software. And, a few software developers have shown violent reactions to the claim that perfection is simply not possible.