Showing posts with label crisis. Show all posts
Showing posts with label crisis. Show all posts

Sunday, October 29, 2017

My most challenging experience as a software developer

Here is my detailed answer to the question, "What is the most challenging experience you encountered as a software developer?:

We were developing the tracking system for Project Mercury, to put a person in space and bring them back alive. The “back alive” was the challenging part, but not the only one. Some other challenges were as follows:

- The system was based on a world-wide network of fairly unreliable teletype connections. 

- We had to determine the touchdown in the Pacific to within a small radius, which meant we needed accurate and perfectly synchronized clocks on the computer and space capsule.

- We also needed to knew exactly where our tracking stations were, but it turned out nobody knew where Australia's two stations were with sufficient precision. We had to create an entire sub-project to locate Australia.

- We needed information on the launch rocket, but because it was also a military rocket, that information was classified. We eventually found a way to work around that.

- Our computers were a pair of IBM 7090s, plus a 709 at a critical station in Bermuda. In those days, the computers were not built for on-line real-time work. For instance, there was no standard interrupt clock. We actually built our own for the Bermuda machine.

- Also, there were no disk drives yet, so everything had to be based on a tape drive system, but the tape drives were not sufficiently reliable for our specs. We beat this problem by building software error-correcting codes into the tape drive system.

We worked our way through all these problems and many more smaller ones, but the most challenging problem was the “back alive” requirement. Once we had the hardware and network reliability up to snuff, we still had the problem of software errors. To counter this problem, we created a special test group, something that had never been done before. Then we set a standard that any error detected by the test group and not explicitly corrected would stop any launch.

Our tests revealed that the system could crash for unknown reasons at random times, so it would be unable to bring down the astronaut safely at a known location. When the crash occurred in testing, the two on-line printers simultaneously printed a 120-character of random garbage. The line was identical on the two printers, indicating that this was not some kind of machine error on one of the 7090s. It could have been a hardware design error or a coding error. We had to investigate both possibilities, but the second possibility was far more likely.

We struggled to track down the source of the crash, but after a fruitless month, the project manager wanted to drop it as a “random event.” We all knew it wasn’t random, but he didn’t want to be accused of delaying the first launch.

To us, however, it was endangering the life of the astronaut, so we pleaded for time to continue trying to pinpoint the fault. “We should think more about this,” we said, to which he replied (standing under an IBM THINK sign), “Thinking is a luxury we can no longer afford.”

We believed (and still believe) that thinking is not a luxury for software developers, so we went underground. After much hard work, Marilyn pinpointed the fault and we corrected it just before the first launch. We may have saved an astronaut’s life, but we’ll never get any credit for it.

Moral: We may think that hardware and software errors are challenging, but nothing matches the difficulty of confronting human errors—especially when those humans are managers willing to hide errors in order to make schedules.



Sunday, September 10, 2017

False Urgency


What should I do with a client or boss who insists that a certain task is urgent, but it turns out to be likely a false alarm?

In your mind, subtract 10% from this persons trust account, then watch for the second occurrence. It might be a one-time mistake or it might be a person who thinks every little thing is "urgent."

If it happens again, tell him or her that you charge double (or triple) for urgent tasks. If he or she isn’t willing to pay, then find another client.

If you're an employee and this is your boss, you obviously can't charge them with money, so you have to find another way to make them pay. My favorite way was simply to ignore them and proceed at my normal pace, in priority order. I never got fired for doing that, particularly when it became evident to everyone that the urgency was false.


This is just one of the ways you have to train your clients and your managers if you want to be a successful employee, contractor, or consultant.

Sunday, April 02, 2017

Complexity: Why We Need General Systems Thinking


It isn’t what we don’t know that gives us trouble, it’s what we know that ain’t so. - Will Rogers

The first step to knowledge is the confession of ignorance. We know far, far less about our world than most of us care to confess. Yet confess we must, for the evidences of our ignorance are beginning to mount, and their scale is too large to be ignored!

If it had been possible to photograph the earth from a satellite 150 or 200 years ago, one of the conspicuous features of the planet would have been a belt of green extending 10 degrees or more north and south of the Equator. This green zone was the wet evergreen tropical forest, more commonly known as the tropical rain forest. Two centuries ago it stretched almost unbroken over the lowlands of the humid Tropics of Central and South America, Africa, Southeast Asia and the islands of Indonesia.

... the tropical rain forest is one of the most ancient ecosystems ... it has existed continuously since the Cretaceous period, which ended more than 60 million years ago. Today, however, the rain forest, like most other natural ecosystems, is rapidly changing. ... It is likely that, by the end of this century very little will remain. - Karl Deutsch 

This account may be taken as typical of hundreds filling our books, journals, and newspapers. Will the change be for good or evil? Of that, we can say nothing—that is precisely the problem. The problem is not change itself, for change is ubiquitous. Neither is the problem in the man-made origin of the change, for it is in the nature of man to change his environment. Man’s reordering of the face of the globe will cease only when man himself ceases.

The ancient history of our planet is brimful of stories of those who have ceased to exist, and many of these stories carry the same plot: Those who live by the sword, die by the sword. The very source of success, when carried past a reasonable point, carries the poison of death. In man, success comes from the power that knowledge gives to alter the environment. The problem is to bring that power under control.

In ages past, the knowledge came very slowly, and one man in his life was not likely to see much change other than that wrought by nature. The controlled incorporation of arsenic into copper to make bronze took several thousand years to develop; the substitution of tin for the more dangerous arsenic took another thousand or two. In our modern age, laboratories turn out an alloy a day, or more, with properties made to order. The alloying of metals led to the rise and fall of civilizations, but the changes were too slow to be appreciated. A truer blade meant victory over the invaders, but changes were local and slow enough to be absorbed by a million tiny adjustments without destroying the species. With an alloy a day, we can no longer be sure.

Science and engineering have been the catalysts for the unprecedented speed and magnitude of change. The physicist shows us how to harness the power of the nucleus; the chemist shows us how to increase the quantity of our food; the geneticist shows us how to improve the quality of our children. But science and engineering have been unable to keep pace with the second-order effects produced by their first-order victories. The excess heat from the nuclear generator alters the spawning pattern of fish, and, before adjustments can be made, other species have produced irreversible changes in the ecology of the river and its borders. The pesticide eliminates one insect only to the advantage of others that may be worse, or the herbicide clears the rain forest for farming, but the resulting soil changes make the land less productive than it was before. And of what we are doing to our progeny, we still have only ghastly hints.

Some have said the general systems movement was born out of the failures of science, but it would be more accurate to say the general systems approach is needed because science has been such a success. Science and technology have colonized the planet, and nothing in our lives is untouched. In this changing, they have revealed a complexity with which they are not prepared to deal. The general systems movement has taken up the task of helping scientists unravel complexity, technologists to master it, and others to learn to live with it.

In this book, we begin the task of introducing general systems thinking to those audiences. Because general systems is a child of science, we shall start by examining science from a general systems point of view. Thus prepared, we shall try to give an overview of what the general systems approach is, in relation to science. Then we begin the task in earnest by devoting ourselves to many questions of observation and experiment in a much wider context. 

And then, having laboriously purged our minds and hearts of “things we know that ain’t so,” we shall be ready to map out our future general systems tasks, tasks whose elaboration lies beyond the scope of this small book.

[Thus begins the classic, An Introduction to General Systems Thinking]

Tuesday, September 06, 2016

Preventing a Software Quality Crisis



Abstract
Many software development organizations today are so overloaded with quality problems that they are no longer coping with their business of developing software. They display all the classic symptoms of overloaded organizations—lack of problem awareness, lack of self-awareness, plus characteristic behavior patterns and feelings. Management may not recognize the relationship between this overload and quality problems stemming from larger, more complex systems. If not, their actions tend to be counterproductive. In order to cure or prevent such a crisis, management needs to understand the system dynamics of quality.

Symptoms of Overload Due to Poor Quality
In our consulting work, we are often called upon to rescue software development operations that have somehow gotten out of control. The organization seems to have slipped into a constant state of crisis, but management cannot seem to pin the symptoms down to one central cause. Quite often, that central cause turns out to be overload due to lack of software quality, and lack of software quality due to overload.

Our first job as consultants is to study symptoms. We classify symptoms of overload into four general categories—lack of problem awareness, lack of self-awareness, plus characteristic patterns of behavior and feelings. Before we  describe the dynamics underlying these symptoms, lets look at some of them as they my be manifest in a typical, composite organization, which we shall call the XYZ corporation.

Lack of Problem Awareness
All organizations have problems, but the overloaded organization doesn't have time to define those problems, and thus has little chance of solving them:

1. Nobody knows what's really happening to them.

2. Many people are not even aware that there is a system-wide problem.

3. Some people realize that there is a problem, but think it is confined to their operation.

4. Some people realize that there is a problem, but think it is confined to somebody else's operation.

5. Quality means meeting specifications. An organization that is experiencing serious quality problems may ignore those problems by a strategy of changing specifications to fit what they actually happen to produce. They can then believe that they are "meeting specifications." They may minimize parts of the specification, saying that it's not really important that they be done just that way. Carried to an extreme, this attitude leads to ignoring certain parts of the specification altogether. Where they can't be ignored, they are often simply
forgotten.

6. Another way of dealing with the overload is to ignore quality problems that arise, rather than handling them on the spot, or at least recording them so others will handle them. This attitude is symptomatic of an  organization that needs a top-to-bottom retraining in quality.

Lack of Self-Awareness
Even when an organization is submerged in problems, it can recover if the people in the organization are able to step back and get a look at themselves, In the chronically overloaded organization, people no longer have the means to do this. They are ignorant of their condition, and they have crippled their means of removing their ignorance:

7. Worse than not knowing what is going on is thinking you know, when you don't, and acting on it. Many managers at XYZ believe they have a grip on what's going on, but are too overloaded to actually check. When the reality is investigated, these managers often turn out to be wrongs For instance, when quizzed about testing methods used by their employees, most managers seriously overestimate the quality of testing, when compared with the programmers' and testers' reports.

8. In XYZ) as in all overloaded organizations, communication within and across levels is unreliable and slow. Requests for one kind of information produce something else, or nothing at all. In attempting to speed up the work, people fail to take time to listen to one another, to write things down, or to follow through on requested actions.

9. Many individuals at XYZ are trying to reduce their overload by isolating themselves from their co-workers, either physically or emotionally. Some managers have encouraged their workers to take this approach, instructing them to solve problems by themselves, so as not to bother other people.

10. Perhaps the most dangerous overload reaction we observed was the tendency of people at XYZ to cut themselves off from any source of information that might make them aware of how bad the overload really is. The instant reaction to any new piece of information is to deny it, saying there are no facts to substantiate it. But no facts can be produced because the management has studiously avoided building or maintaining information systems that could contradict their claims that "we.just know what's going on." They don't know, they don't know they don't know, and they don't want to know they don't know. They're simply too busy.

Typical Behavior Patterns
In order to recognize overload, managers don't have to read people's minds. They can simply observe certain characteristic things they do:

11 . The first clear fact that demonstrates overload is the poor quality of the products being developed. Although it's possible to deny this poor quality when no measurements are made of the quality of work in progress, products already delivered have shown this  poor quality in an undeniable way.

12. All over the organization, people are trying to save time by short-circuiting those procedures that do exist. 'This tactic may occasionally work in a normal organization faced with a short-term crisis, but in XYZ, it has been going on for so long it has become part of standard operating procedure.

13. Most people are juggling many things at one time, and thus adding coordination time to their overload. In the absence of clear directives on what must be done first, people are free to make their own choices. Since they are generally unaware of the overall goals of the organization, they tend to suboptimize, choosing whatever looks good to them at the moment.

14. In order to get some feeling of accomplishment, when people have a choice of tasks to do, they tend to choose the easiest task first, so as to "do something." This decision process gives a short-term feeling of relief, but in the long term results in an accumulation of harder and harder problems.

15. Another way an individual can relieve overload is by passing problems to other people. As a result, problems don't get solved, they merely circulate. Some have been circulating for many months.

16. Perhaps the easiest way to recognize an overloaded organization is by noticing how frequently you hear People say, in effect, that they recognize that the way they're working is wrong, but they "have no time to do it right." This seems almost to be the motto of the XYZ organization.

Typical Feelings
If you wait for measurable results of overload, it may be too late. But its possible to recognize an overloaded organization through various expressions of peoples' feelings:

17. An easy way to recognize an overloaded organization is by the general atmosphere in the workplace. In many areas at XYZ there was no enthusiasm, no commitment, and no intensity. People were going through the motions of working, with no hope of really accomplishing their tasks.

18. Another internal symptom of overload is the number of times people expressed the wish that somehow the problem would just go away. Maybe the big customer will cancel the contract. Maybe the management will Just slip the schedules by a year. Maybe the sales force will stop taking more orders. Maybe the company will fail and be purchased by a larger company.

19. One common way of wishing the problem would go away is to choose a scapegoat, who is the personified source of all the difficulties, and then wish that this person would get transferred, get fired, get sick, or quit. At XYZ, there are at least ten different scapegoats—some of whom have long gone, although the problems still remain.

20. Perhaps the ultimate emotional reaction to overload is the intense desire to run away. When there are easy alternatives for employees, overload is followed by people leaving the organization, which only increases the overload. The most perceptive ones usually leave first. When there are few attractive opportunities outside, as at XYZ, then people "run away" on the job. They fantasize about other jobs, other places, other activities, though they don't act on their fantasies. Their bodies remain, but their hearts do not.

The Software Dynamics of Overload
There are a number of reasons for the overload situation at organizations like XYZ, but underlying everything is the quality problem, which in turn arises from the changing size and complexity of the work. This means that simple-minded solutions like adding large numbers of people will merely make the problems worse. In order for management to create a manageable organization, they will have to understand the dynamics of quality. In particular, they will have to understand how quality deteriorates, and how it has deteriorated in their organization over the years. The XYZ company makes an excellent case study.

The quality deterioration at XYZ has been a gradual effect that has crept up unnoticed as the size and complexity of systems has increased. The major management mistake has been lack of awareness of software dynamics, and the need for measurement if such creeping deterioration is to be prevented.

The quality deterioration experienced at XYZ is quite a common phenomenon in the software industry today, because management seems to make the same mistakes everywhere—they assume that the processes that would produce quality small systems will also produce quality large systems. But the difficulty of producing quality systems is exponentially related to system size and complexity, so old solutions quickly become inadequate. These dynamics have been studied by a number of software researchers, but it is not necessary to go fully into them here. A few examples will suffice to illustrate specifically what has been happening at XYZ and the kind of actions that are needed to reverse the situation.

NOTE: The remaining two-thirds of this article describes these dynamics and corrective actions, and can be found as a new chapter in the book, 


The book also details a number of the most common and distressing management problems, along with dozens of positive responses available to competent managers.