The second session I attended at SPA2006 was Designing metrics for agile process improvement, run by Jason Gorman and Duncan Pierce. Their thesis is that metrics are difficult to design, and making a mistake can lead an organisation down the wrong path entirely.
To demonstrate this, they divided us into four groups and assigned us each a “goal”. My group’s goal was “more frequent releases“. We had to brainstorm metrics that might act as measures of an organisation’s progress towards that goal, and then pick one to carry forward to the remainder of the exercise. We spent some time thrashing around wondering why an organisation might have such a goal, and finally we plumped for the metric average time in days between releases.
At this point, one person in each group took his group’s metric to the next table, where that group (minus one) would attempt to “game” it. The objective of this part of the afternoon was to try to get the metric looking right, but such that the organisation was clearly doing something very undesirable behind the scenes. Our table was joined by Sid, bringing along a metric whose stated goal was “fewer bugs in the product” and whose metric was number of bugs per function point per release. During the next half-hour we had great fun trying to find ways to write more bugs, while still having fewer per function point. These included testing previous release really hard and fill up the bugs database, then hardly test future releases at all; and making it really difficult for users to report bugs. Meanwhile another group was gaming our own metric, by producing zero-change releases and by saving up releases into rapid-fire batches all at once.
Next, if I recall, we had to re-design our original metrics in the light of the panning they had received in the hands of the gamers. We ended up measuring the goal of “more frequent releases” using the metric days per story point between customer feature request and customer acceptance. We tried to phrase it so that it averaged out over easy and difficult features, and so that the count forced us to release the feature to an end user. This neatly side-steps all attempts at gaming (that we could discover), and is roughly equivalent to throughput – or to Kent Beck’s software in process metric.
For the workshop’s final segment, we recognised that a single metric by itself is rarely sufficient to direct process improvement. Change one thing for the better and something else may well slip out of line. Jason and Duncan’s thesis is that we need a system of metrics, each of which balances the others in some way. So as a final exercise, we began an attempt to take the metrics of all four groups and weave them into a system, showing which measures reinforced which others etc. For me this is the key observation from the whole afternoon, and I would like to have spent more time here. I had never really seen the attraction of Systems Thinking, but these models derived from the need for metrics do seem to be a good application of the technique. Food for thought indeed…