Monday, January 05, 2009

Testability and Reproducibility

Doug Szabo has asked me two interesting questions, and we thought we would share my answers.

Doug: Can you point me to some guiding literature that explains how to make code testable?

Explanation: I noticed at least one mention in Perfect Software: and Other Illusions about Testing of making code testable. I have seen that language in a couple of other software testing books, including Software Testing Techniques. I have even been told by more than one developer that they needed to "make the code more testable" before they wanted me to start testing. Since the developers who told me this didn't have the faintest clue what testing was about, I really don't know what they intended to do with their code. Apparently, neither did they - I eventually asked, "what do you need to do to make it more testable?", to which they replied that they didn't know. I was and still am confounded by the lack of explanation for what it is that makes code testable. I understand when code is very difficult to test, like when you have a "horrible loop" (Software Testing Techniques), or when multithreaded code is not first tested in single thread contexts, but I don't understand what needs to be done to code to make it testable. Are we talking about instrumenting the code with Debug statements, assert statements, or some symbol that a test tool can detect?
Jerry: I share your dismay at the lack of publications about how to make code testable. There are many, many little techniques (like initialization on entry, not exit; eliminating as many special cases as possible; and general simplification for a reader), but they're not as important as three things:

1. All code should be open code that anyone in the project can read and critique.

2. All code must be reviewed in a technical review (see my The Handbook of Walthroughs, Inspections, and Technical Reviews)--in which at one professional tester is present and fully participating.

3. Same as 2, except for design reviews and requirements reviews).

If you do these thing, the organization will quickly learn how to make code testable.

But, yes, someone should write the book and start with these three things.

Q2: Do you have some strategies for triaging bugs that do not reproduce consistently?

Explanation: I was a developer before my current role of tester. Hey. Don't roll your eyes at me. I was an engineer (a real one, with a professional license and all that hoopla) before I became a developer. So I already knew the value of testing, even if I didn't know what software testing really was. Well, as an engineer it was crucial that tests be designed such that the results would be reproducible, and measurable against a control. When I got into software development, I tried to stick to the engineering principle with respect to testing. Unfortunately, as I worked on larger software projects, particularly those where my code talked to other processes, and also where I had multiple threads running, I found that bugs were starting to occur where steps to reproduce did not consistently reproduce the bug. Oh oh. I knew that meant there was something wrong, I just didn't have something that I could run under a debugger and know where to set a breakpoint. As a developer, it seemed like there were always enough reproducible bugs that I had lots of excuses to avoid trying to solve those that might not reproduce. Now, as a tester, I am empathetic to developers and have a self-imposed guideline to make an entry of any non-reproducible issue, but at the same time I don't assign it to the pool of issues to fix until a set of steps to consistently reproduce is found. What I got out of Perfect Software is that perhaps I should be passing the tough to reproduce issues over to the pool to fix, but then what would you recommend for a triage approach, to convince stakeholders to take those issues as seriously as the ones that do consistently reproduce?

Jerry: Great question, Doug.

First answer. Try changing "not reproducible" to "I am unable to reproduce."

Then parse the second category into sub-categories like:

a. I saw this one, but was never able to make it happen again.

b. I see this when I run with this setup, but sometimes I don't.

c. This is the same anomaly I see under several setups, but not all the time.

d. Under X setup, this happens sometimes and not other times. There may be something different from one X to the other, but I' not seeing it.

In each case, you are unable to pinpoint the fault underlying the failure. There may be several faults producing the same failure, or different failures appearing from the same underlying fault. Since you don't really know the frequency of this failure, the way I use to triage the failure is to apply the question: "What would it cost us every time this failure appears in the field?"

If that number is high, then get a team working on the bug. If it's low, let the bug rest while (perhaps) new data accumulates or the bug disappears mysteriously as other changes are made to the code.


Unknown said...

"What would it cost us every time this failure appears in the field?"

Excellent question. I have used this thought for years in many different types of software. It never made any sense to me to spend thousands of person hours to fix an error that showed itself once a year and cost the user two minutes to restart the application.

Money isn't everything in life, but sometimes it is a good idea to look at the cost in money (and the cost in other things) when testing software.

Brian said...

Another important "testability" strategy would be to require that each module be specified *in English* before coding begins.

Describe inputs, computations, validity tests, error conditions and messages, and finally outputs.

I realize that such standards are the exception, rather that the rule. Which makes for expensive, buggy software and meaningless testing - you can't test when you don't know what to expect.

Michael Bolton said...

Hi, Brian...

I like the sentiment, but I'm not sure I agree entirely with the idea "that each module be specified *in English* before coding begins." I used to think that way too, but these days, I might be more inclined to say "that each module be specified *in some comprehensible form*", and not necessarily before coding begins, but perhaps in parallel with it. Why? First, English (or the local vernacular) will certainly be important, but a working prototype, a sketch, a table, or a diagram might be more less expensive, quicker to produce, and more comprehensible than a written narrative. The key, to me, is in the ability for it to be reviewable by the people who matter--and to make sure that they do review it.

Second, it might be unreasonable and restrictive to expect that we can get it all right the first time, so cycles of experiments and learning might be a more reasonable approach. I think in the end, we certainly want to be able to describe inputs, computations, validity tests, error conditions and messages, and finally outputs. An important function of testing might be to aid in discovering and reporting that information, not merely in confirming it to be correct.

You can't test when you don't know what to expect.

That might be true to some degree--but Jerry has helped me to learn that sometimes you don't know what to expect until you've tested.


---Michael B.

Michael Bolton said...

By the way, speaking of testing--does it bother anyone else that Blogger shows the time of a posting, but not the date? That seems to apply both for blog posts and for comments.

---Michael B.

Gerald M. Weinberg said...

Michael wrote: "By the way, speaking of testing--does it bother anyone else that Blogger shows the time of a posting, but not the date? "

Bothers me. It did right from the beginning. I've tried to imagine what could have possessed someone to think this is all right. Maybe they don't want you to realize that a post is three years old with zero responses.

But you see, this is a type of decision that can't easily be caught by most kinds of testing. It is testing the design, I think, not the code. To do this kind of testing, the testers have to understand how the product will be used.

Michael Bolton said...

But you see, this is a type of decision that can't easily be caught by most kinds of testing. It is testing the design, I think, not the code. To do this kind of testing, the testers have to understand how the product will be used.

Implicit in this is the idea that most kinds of testing aren't useful to the people who are actually using the product. I hope we can change that. How do we do it—other than tying them to a chair and reading Perfect Software and Other Illusions About Testing or Lessons Learned in Software Testing to them? :)

---Michael B.

Michael Bolton said...

One more thing for Doug, who asked the original question: here are a couple of references to testability that you might find useful. One is a PDF on James Bach's Web site, called Heuristics of Software Testability. Another, mostly compiled by me, is on Adam Goucher's blog, here. This latter list suffers from a couple of omissions, the most serious of which Jerry pointed out in the SHAPE Forum (may it rest in peace): One of the most important dimensions of testability is, paradoxically, bug-free code. In a given unit of time, we can accomplish much more testing if we're not spending time investigating, recording, and reporting bugs that we find. Thus when a programmer carefully tests her code before giving it to someone to test, the tester can obtain different and presumably broader and deeper test coverage.

Maybe I'll write that book. :)

---Michael B.

Matisse Enzer said...

> the lack of publications about how to make code testable.

There are, I believe MANY publications that address this from the "make it testable from the start" point of view - virtually every publication of "extreme programming" or "test driven development" takes this approach.

On the other hand, when we wonder about code that was not written with tests in mind then I also perceive a dearth. There is to my knowledge one really good book on this exact issue though:

Michael Feathers' "Working Effectively with Legacy Code" - in fact, he defines "Legacy Code" not by age but as any code without good test coverage. The book provides many very practical techniques for making code testable and I recommend the book very much.

Safari - Read Online


Matisse Enzer said...

By the way - you can change how Blogger shows comments to include the date: It's a settings in the "comments" section of your blog setting. Look for "Comments Timestamp Format"
There are about two dozen formats to choose from.

DrummerDaveF said...

Late to the party, I know, but this may help someone one day.
Regarding the question of logging defects that you saw once and can not reproduce or are inconsistently replicable... Yes, log them.

Part of the benefits of testing is reducing unknown risks for the stakeholder. The more defects they are aware of, the better they can assess their risks in accepting the software.

In those instances where a defect is difficult to get to reproduce, additional commentary from the tester to explain the overall impact of the defect occurring should be added to the defect record, again, to allow the stakeholder to make a risk assessment and (hopefully) accept the defect into their UAT, pre-prod and/or production environment(s).

Michael Bolton said...

It occurred to me that I hadn't posted this blog post by James Bach, which might be helpful to answering the first of the original two questions.

---Michael B.