rocks into gold

My very talented friend Clarke Ching has self-published his second novel Rocks into Gold (his first, Rolling Rocks Downhill, is due out later this year).

Rocks into Gold is a “parable for software developers who want to survive — and then thrive — through the Credit Crunch”. If you’re a subscriber to this blog you’ll probably know the book’s main message already; but read it anyway, because you probably also know someone who’s project, or job, or organisation, might just benefit from Clarke’s excellent re-telling — buy them a copy.

inherent simplicity

This week I’ve been doing a lot of reading around Goldratt’s latest big idea: inherent simplicity. The premise is that any complex system (Goldratt typically considers companies here) always has an “inherent simplicity”. Once found, this simple model of the complex organism can be used to reason about it and create breakthrough analyses of, say, the route to exponential growth in profits.

The more I read, the more I realised that this idea has formed the basis of much of my career. Having a PhD in mathematics, I have always looked for — and enjoyed looking for — the elegant and simple solution at the heart of any problem. And in my work life I’ve applied the same aesthetic to solving business problems. Here’s a neat example from a consulting gig in the 1990′s…

A group of us (designers, developers, business analysts) had been tasked with re-engineering an old suite of applications used by a parcel delivery firm. The idea was to replace a bunch of disparate applications with a single, distributed enterprise solution, and at the same time evolve from “parcel tracking” to “parcel management” (whatever that means). We had a 200-page spec describing the business rules for parcel management, and it was unbelievably complex. A parcel might have a barcode, or a hand-written label, or no legible markings at all. It might be waiting to be loaded on a van to its final destination, or waiting to be sorted at the hub, or in the wrong place entirely. If it was in the wrong place, it may have been labelled correctly or incorrectly, legibly or illegibly. And so on. We were becoming bogged down in the detail, and everywhere we looked there was more. The object model for the new application was looking like spaghetti, and the mess got larger each day.

So one day, frustrated by the complexity, I took a break from facilitating the analysis sessions and took the problem home; peace and quiet. And I found the system’s inherent simplicity: Look at everything from the parcel’s point of view. Where I’ve been doesn’t matter; whether I’m in the right or wrong place now doesn’t matter. All that matters is where I should go next. And for each parcel, that’s unique. Given any parcel located anywhere in the system, there’s a “best” route to its destination, and therefore a single next place to which it should be moved. The means of transport (hand, van, plane, forklift, …) is also unimportant at this level. All of the business rules lived inside Parcel.get_next_location(), and everything else was implementation detail.

It took the other members of the analysis team a couple of days to grasp this simplicity and peel away the layers of complexity. And then the project was canned, so this elegant solution was never implemented.

Anyroadup, you get the idea…

downstream testing implies a policy constraint

As usual, it takes me multiple attempts to figure out what I really want to say, and how to express myself. Here’s a bit more discussion of what I believe is implied by downstream testing:

The very fact that downstream testing occurs, and is heavily consuming resources, means that management haven’t understood that such activity is waste. (If management had understand that, then they would re-organise the process and put the testing up front — prevention of defects, instead of detection.) No amount of tinkering with analysis or development will alter that management perception, and therefore the process will always be wasteful and low in quality. So the constraint to progress is management’s belief that downstream testing has value.

why YAGNI acts to EXPLOIT the bottleneck

Clarke asked me to explain my earlier throw-away remark that YAGNI forms part of the EXPLOIT step in moving the bottleneck away from development, so here goes…

YAGNI (You Aren’t Gonna Need It) is an exhortation from the early days of XP. It has been discussed and misunderstood a great deal, so I’m not going to get into the finesses of meaning here. For our purposes, it reminds the developer not to work on features or generalisations that may be needed, telling him instead to focus his present efforts on delivering only what he knows is the current requirement. (In the interests of brevity, I’ll refer below to YAGNI only in terms of added behaviour, and I’ll use the word “feature” for any fragment of any kind of behaviour; all other forms of YAGNI are assumed.)

(In my practice I use a similarly attention-grabbing soundbite. Whenever I see a developer do something “because it may be needed in the future” I accuse him of crystal ball gazing. I remind the whole team that it can be risky and dangerous to get your balls out, and that seems to help the message stick. Other times there’s an embarrassed silence.)

Writing crystal ball code has three effects: In the present moment, it means that the developer is spending current time investing in one of many possible futures; in the period from now until that possible future, it means that there is code in the system that doesn’t need to be there; and when the future arrives, it may look different than that which the developer predicted.

First, then, crystal ball code uses up current development time. This is bad when development is the bottleneck and when batch sizes are relatively small and when development order has been defined in terms of business value and when feature cycle time is a KPI. The time spent developing a crystal ball feature will delay the current batch and all batches upto the imagined future. There is a tiny chance that development of that future batch will be faster (see below), but all interim ROI (for example) will be reduced by the delay introduced right now.

Second, the crystal ball code represents inventory, and it has a carrying cost. This code, which may never be required by the end user, must always build, integrate and pass all tests; if ever it doesn’t, time must be spent fixing it. Furthermore, a larger codebase will always require more time and effort to understand and navigate (think of having to drive around piles of inventory in order to fetch anything or the lean practice of 5S). Even if the guess turns out to be correct, the additional carrying cost of this inventory will slow down the development of all batches of features between now and the imagined future.

Third, the developer’s guess may be just plain wrong. Either the imagined “requirement” is never requested, or it is requested and by that time the codebase is radically different from what it is now. The developer may have to spend time removing the feature (for instance if it would confuse or endanger the user) or completely re-design it to make it match how reality turned out. It is assumed that the “wow, that’s exactly what we needed” outcome is sufficiently unlikely that the costs of the other outcomes dominate.

So YAGNI is based on a few core assumptions:

  • The product is to be built incrementally in batches of features
  • Each increment should be potentially shippable in terms of quality and cohesiveness
  • It is hard to predict what features will be requested in later batches
  • It is hard to predict what future code may look like
  • Development is the bottleneck
  • Speed of development is crucial
  • The present value of current features is higher than the future value of future features

Under these conditions, YAGNI is part of the EXPLOIT step because it helps to maximise the amount of current development effort going into delivering current value.

TOC and YAGNI

My apologies if this has been said or written a thousand times before: YAGNI is XP’s way of exploiting the constraint.

Which means that XP, and hence most agile methods, are set up on the assumption that the development team is – or soon will be – the bottleneck. And having identified that bottleneck, our first task is to EXPLOIT it – that is, we make sure that the bottleneck resource only works on stuff that contributes to overall throughput. YAGNI and test-driven development do that. Oh, and a relentless pursuit of quality, so that the bottleneck doesn’t have to spend time on rework. And effective communication, so we get to spend more time developing and less time writing documents and attending meetings. And tight feedback loops, so that we can identify which stuff is valuable and which isn’t.

Next we must SUBORDINATE the whole flow to the pace of the bottleneck. Fixed-length iterations help us to measure that pace, and the various forms of planning game and iteration commitment help to prevent work arriving too fast.

And only when all of that is working well is it safe to ELEVATE the constraint, perhaps by expanding the size of the team. I’m fairly sure I’ve never seen a real-life case in which this step was required. For most of my engagements, successfully exploiting the bottleneck caused the constraint to move elsewhere; and in the rest, the SUBORDINATE step revealed a deeper constraint elsewhere in the organisation.

insurance for software defects

The more I think about it, the more astonished I become. Maintenance contracts for (bespoke) software: Buying insurance to cover against the possibility that the software doesn’t work.

I know the consumer electronics industry does the same, and I always baulk at the idea of spending an extra fifty quid in case the manufacturer cocked up. I wonder what percentage of purchasers buy the insurance? And I wonder what percentage of goods are sent back for repairs? Perhaps the price could be increased by 10% and all defects fixed for free. Or perhaps the manufacturer could invest a little in defect prevention.

It seems to me that software maintenance contracts are an addiction. Software houses undercut each other to win bids, and then rely on the insurance deal to claw back some profits. So no-one is incentivised to improve their performance, and in fact the set-up works directly against software quality. Perhaps it’s time now to break that addiction…

If a software house were able to offer to fix all defects for free, would that give them enough of a market advantage to pay for the investment needed to prevent those defects? Is “zero defects or we fix it for free” a viable vision? (Does any software house offer that already?) And how many software companies would have to do it before the buyer’s expectations evolved to match?

As an industry, do we now know enough to enable a few software houses to compete on the basis of quality?

blame

This week I’ve come across a few articles that clicked together as I read them, and in so doing they reinforced one of my deepest beliefs about software development – or any other profession, for that matter. The articles were:

  • Train Wreck Management by Mary Poppendieck, in which Mary chronicles the origins of “management” and “accountability”. It seems that hierarchical management structures were first applied in the mistaken belief that it was one person’s dereliction that caused a train crash, and that in future such things could be prevented by management and accountability. Lean thinking says the opposite – that the train crash was an inevitable consequence of the structure of the system, and that specific individuals should not be blamed.
  • Survey Blames Blame for Lean Struggles by Mark Graban, in which Mark notices that a recent survey seems to indicate that employees blame each other for their business failing to adopt lean thinking. Mark’s subsequent analysis of the survey shows, however, that it seems to lead the participants to ask “who is responsible for…?” – so it should be no surprise that the top answers mostly involve job titles! Mark’s response is to design a new survey in “5-whys” style – an effort I applaud, even though I disliked the example embedded in the survey.
  • Risk Aversity by Henrik MÃ¥rtensson, in which Henrik uses the Theory of Constraints thinking tools to dig into why many organisations are immune to change. One of the root causes, according to Henrik, is that “mistakes” are punished in the average workplace – and so after a while everyone becomes afraid to innovate, or even to change the status quo. A truly lean organisation will reward even the person who makes a change that leads to lower throughput, because at least they contributed, and knowledge about the whole system has been improved as a result. But even the use of the word “mistake” shows how deep-seated is our culture’s desire to blame, and hence to discourage.
  • The Secret Sauce of Highly Productive Software Development by Amr Elssamadisy and Deb Hartmann, in which the authors propose that inability to learn is the bottleneck (in the Goldratt / TOC sense) in most software teams. I need to think about their “bottleneck” claim, but I do agree that learning is the key to agile success, and that a learning organisation will out-perform any other in the longer term.
  • The QnEK Horse has left the Barn by Hal Macomber, in which Hal opens the lid on a community for sharing QnEK (Quick-n-Easy Kaizen) ideas. QnEK can only succeed in an organisation where “mistakes” don’t exist, where blame is replaced by systemic learning.

For me, these articles all dovetail:

  • learning is the key to long-term success
  • the system can always be improved
  • systemic learning requires a constant flow of improvement suggestions
  • blame discourages innovation

It seems clear to me that blame itself lies at the root of many organisations’ problems. The system can always be improved, and is always the source of problems. People should be rewarded for discovering those problems and trying to fix them, especially when they “only” create new knowledge about how to perform less well.

carnival of the agilists, 3-aug-07

John Brothers has posted the latest edition of the Carnival, entitled I’m running late. This is a perennial topic for all software development projects, and doubly so for those of us who take a lean or TOC view of productivity and change, so props to John for bringing that focus to the carnival this time around.

Speaking of TOC, one recent post that escaped John’s attention is Multi-Tasking: Why projects take so long and still go late by Kevin Fox. Every time I work in a large organisation I find myself writing about multi-tasking – and wondering why “common sense” is so rare…

a ruby sparkline showing variation

Back in May’s Carnival of the Agilists I referenced a post by Clarke Ching in which he suggests we can learn a lot about variation in a complex process by simply flipping coins. When I tried the simulation a few times with Excel I found, as expected, that heads and tails don’t always occur in equal measure. But that was a pain to do, so I’ve made it easier. I’ve written a little Ruby program that simulates 1000 coin tosses; you can get it from github.

The simulation generates SVG images; if you can view those in your browser, you can see three consecutive runs here, here and here.

(Disclaimer: The images are SVG. It works fine in Firefox, but if you use IE you may need to install some kind of viewer.)