testing early for the first time

windtunnel Testing early for the first time is a real-life story from Mike Kelly, telling what happened in his group when they introduced testing during the normal development cycle.

“On schedule, several of the developers indicated their code was complete at a status meeting. I configured my local server to run the code and began my testing. Surprisingly (or perhaps not) this caused some problems.”

It showed up problems in the code, half-truths in project tracking and problems with configuration management. But as the initial shock wore off, and the developers became used to the idea, some powerful transformations occurred.

“We had automated hundreds of tests at the component level. We had validated many of the mapping documents. We had become active participants in design meetings (not all of them, but enough that we felt like we had a small victory). And by the end of our testing, we had developers (again not all of them) coming over to us and asking us to review their work. After the initial growing pains of the added visibility to the status of the code, most of the emotions died down, and we were able to release some great software.”

It’s a great story, and well worth a read. And please tell me about other stories like this, in which development groups learn to be agile by dipping a toe in the water…

Advertisements

use tests as a failsafe

Yesterday I wrote about tolerating the red bar, when a few scary tests fail only occasionally. And today it strikes me that one of the contributors to the persistence of this situation is our tool-set.

As I said, we run the tests, get a red bar (sometimes), and then open up the test list to check that only the expected tests have red dots against them. If we see nothing untoward we just get on with the next task. But of course we aren’t really looking at the reason for the failure. Perhaps the tool itself is making it too easy for us. Or perhaps we interpret those dots too liberally?

So here’s the thought: What would life be like if our tools actually prevented check-ins while there are any failing tests? This would effectively “stop the line” until the problem was sorted out. And it would force us to address each problem while that part of the codebase was fresh in our minds.

I also suspect that peer pressure (“hey, I can’t check anything in now!”) might quickly cause us develop a culture in which we tried to eradicate the root causes of test failures. Instead of relying on CruiseControl to “deodorise” our stinky practices…

(If you’ve tried this I’d love to hear your experiences. Drop me a line.)

open quality

Today on the Yahoo XP list, Kent Beck posted this link to Agitar’s open quality initiative. I applaud their openness, and would definitely encourage all other development groups to follow suit. (There’s a small danger, of course, that publication of such “dashboards” can be manipulated for the purposes of chest-thumping. I’m sure that isn’t the case with Agitar.)

It seems to me that the mere act of putting together the dashboard publication scheme would provide a group with important insights and impetus. And being able to “compare” numbers across the community offers both security (“Phew! most teams are as bad at UI testing as us”) and challenges (“Blimey, most folks test over 95% of their classes”). Perhaps every group that publishes a dashboard page should make it easy to Google – maybe we could agree on standard phrases to include on the page…? (with a link from the C2 wiki to the Google search, so that the standard is enshrined in a working implementation)

Update, 10 nov 05
Agitar’s Mark DeVisser has commented on this post.

a second pair of eyes

I’ve just been working with a team which has a pairing policy: every item of code must have been seen by two pairs of eyes before it can be checked in. It doesn’t work.

The effect of the policy is to replace pair programming – instead developers do a “pair check-in” at the end of each development episode. So a developer will beaver away working on a feature for a day or so, getting it right, making it work, passing all the tests. And then he’ll call over to another team member to request a “pair check-in”. The other team member comes to the developer’s station and is walked through the changes in the version control tool. And then the code is checked in and the two team members part company again.

The problem here is that the process sets the two people up to be in opposition: the developer is effectively asking for approval, instead of asking for help. It’s natural for the developer to feel a sense of ownership, because he’s worked hard to get that code complete and correct. Not many people can graciously accept negative feedback after all that hard work.

It can also be hard for the reviewer – the “second pair of eyes” – to come up to speed quickly enough. The developer knows these changes intimately, but the reviewer is being asked to understand them cold. He has little chance of being effective in that situation.

So this process has all of the demerits of Inspections, with none of the advantages. The team would be more effective adopting true pair programming, I feel.

better tester, worse code

I’ve recently been observing a couple of very similar development teams who had one major difference: The tester in Team 1 was very good at his job, whereas the tester in Team 2 wasn’t. And as a result, the developers in Team 1 produced significantly poorer code than those in Team 2! It turns out that the very good tester was highly trusted by the rest of his team – so much so that they were happy to delegate complete responsibility for product quality to him. In turn, this freed them to churn out code at an alarming rate, without regard to whether it worked particularly well.

Team 2 ended up hardly using their tester, preferring to rely on TDD to catch most defects before they made them. But they were able to release product at the drop of a hat, because they knew and trusted the quality of their code at all times. On the other hand, Team 1 required over a week of full-team manual testing and defect fixing before they were prepared to believe they were ready to release.

Team 1 were applauded for their speed of coding, and for the obviously great work of their tester. Defects? Rework? Manual tests? They’re a fact of life in software development aren’t they? Just an overhead we have to live with. But look how fast we go!

By comparison, Team 2 were castigated for their slowness. They did very little fire-fighting. Releases were a non-event, an anti-climax almost. Unnoticed, unheralded, they produced working product on a weekly basis.

I’m sure none of this is a revelation to you, but to see it in action is quite impressive.

what is “quality”?

Just what is software quality? I hear talk of software being of “high” or “low” quality, as if there is one Quality that something can possess to varying degrees. Then I see Specifications that break that Quality down into Maintainability, Supportability etc. Each of these is a boolean attribute than is either present or not – high Quality is the possession of all of these “ilities”.

Back in the days of TQM I recall the mantra “quality is fitness for purpose.” At the time this struck us fresh graduates as a radical and deep thought, but it now seems merely a more bullish version of Gerry Weinberg’s “quality is value to some person.” Of course, every person – every stakeholder – will assign different values to a software system. A developer will want different things than a user, or the user’s manager, or the support engineer. I guess we’re back to the “ilities” – an expression of how the design must take account of each stakeholder’s needs.

Last week I suggested that non-functional requirements could usually be rewritten as user stories. Am I now disagreeing with that statement? Hopefully not. It occurs to me now that such non-functional requirements can probably be translated into the form ‘if user A does X, Y will happen after at most time Z.’ This is pure conjecture – I’ll report back here when I’ve tested it in real life. I also conjecture that all of the “ilities” for a system can be expressed in these same terms. This would have clear
advantages:

  • the stakeholder is clearly identified
  • his values are clearly enumerated
  • the qualities are close to being testable

(By the way, I’m proposing a thinking tool here, not necessarily a document format.)

So what do people mean when they talk about “good” or “bad”, “high” or “low”
Quality? I think I’m going to duck the question altogether. Henceforward I’ll try to avoid the Q-word. Instead I’ll try to talk only about each of the “ilities” as a small-q quality, and look for measures of that quality to reflect the needs of each of the system’s stakeholders. It will be a real challenge to find testable measures appropriate to expressing Maintainbility, Adaptability etc. But I genuinely believe these will be more useful to everyone in the long run.