If you’ve tried the Happy numbers kata you may have noticed a couple of things about the algorithm. Firstly,
13 => 32 + 12 => 10 => 1
is the same as
31 => 12 + 32 => 10 => 1.
That is, happiness and unhappiness are preserved under all permutations of the digits. So if you have calculated the status of 134, you also know the status of 143, 314, 341, 413 and 431!
And secondly, zeroes have no effect on the outcome. So 103 is just as happy as 13; and similarly 1300, 1000300000 and 30010 are all happy.
In a sense, that means it is not meaningful to ask how many numbers can be examined in 5 seconds, say — because the answer is always infinite! So let’s redefine the kata:
Write a program to report the number of happy numbers in the range [1, n]. What is the largest n you can deal with in 5 seconds?
Some interesting questions now come to mind:
- Where does the time go? Is it faster to calculate the happiness of a number than it is to create all permutations of its digits?
- What is the best data structure for holding the results? Can a recursive or functional algorithm compete with a procedural approach?
- What is the best way to deal with numbers containing zeroes?
- Are the answers to these questions dependent on programming language?
- Is this really an algorithm on the integers, or is it an algorithm on lists of digits?
Finally, something quite bizarre. The observations above regarding zeroes and digit permutations mean that we can acquire the answer to the challenge while examining fewer integers. How many fewer?
My rough calculations suggest that we only need to examine 54 of the integers below 100 in order to have complete information about every number in the range 1-100. Thats 54%. But we only need to examine 229 of the first 1000, which is 22.9%. To cover 1-10000 we only need to examine 714 numbers, and for 1-100000 we only need to examine 2002 numbers — ie. 2% of them! It would appear that this algorithm gets significantly faster as we move to larger and larger data sets. How much difference does this make to what your implementation can achieve?
This weekend my 9 year old son was given the following homework:
Choose a two-digit number (eg. 23), square each digit and add them together. Keep repeating this until you reach 1 or the cycle carries on in a continuous loop. If you reach 1 then the number you started with is a “happy number”.
Can you find all the happy numbers between 1 and 100?
For example, 31 is happy because
31 => 32 + 12 => 10 => 12 + 02 => 1
But 4 is not happy because
4 => 16 => 37 => 58 => 89 => 145 => 42 => 20 => 4
and we have a cycle.
I thought this might make an interesting code kata, so while he beavered away with pencil and paper I set to writing a little program to do the job. Why not give it a try yourself?
- What tests, if any, did you write? How many times did your stack overflow? What data structure(s) do you need?
- Did you use recursion? Or loops? Or filters? Or something else?
- Did you manage to eliminate all of the duplication? Can you eliminate all of the conditionals? Is any of this altered by your choice of programming language?
- How long does it take to run? At what range size (eg. 1-100, 1-1000, 1-10000) does it take more than, say, 5 seconds to run? Does it run in linear time? Polynomial time? What mathematical tricks can you find to speed it up?
- What will you do differently next time?
First time through, I wrote no tests. My stack overflowed three times. I used a dictionary to hold the results, with a simple array for cycle detection and a recursive algorithm. I made no attempt to eliminate either duplication or conditionals, and I used none of the arithmetic tricks that my son found while doing the exercise by hand.
Next time, however, will be different…
Supplementary (for the mathematicians among you):
Are there any numbers for which the series neither terminates at 1 nor eventually repeats forever?
My son was given this challenge for his school homework tonight:
Pick a number in the range 2-100. Next, pick a number that is a factor or a multiple of the first, again using only the numbers 2-100. Continue like this, building a chain of numbers in the range 2-100 in which each number is either a factor or a multiple of its predecessor in the chain. What is the longest non-repeating chain you can build?
I feel some gentle programming about to occur…
Last night was the October 2010 meeting of XP-Manchester, a local group set up by me and Jim McDonald. As always the meeting consisted of two halves, the first being a workshop (this time led by me) and the second being a coding dojo.
For the workshop this month I ran a version of James Shore’s Offing the Offsite Customer game, as described by Kane Mar and using Kane’s drawings as the requirements. I hadn’t run the session before, and it turned out really well. We had 19 participants, so we split into two roughly equal-sized teams, with each team further split equally between a group of Product Owners and a group of Developers.
In the first run-through neither team managed to create a diagram that looked anything like the requirement; whereas in the second attempt both teams produced very good diagrams, and well inside the allotted time. The difference was born in the team retrospectives between the two runs. Both teams independently decided to work much more iteratively and interactively second time around, and it paid off. It could be said that in the first run, the teams focussed on perfecting the written spec, whereas in the second run the teams focussed on perfecting the working diagram. This focus on evolving a diagram using direct feedback was “invented” independently by both teams, and towards the end of the second run they even had free time available for fine-grained polishing.
The second part of the evening was a dojo. Jim introduced us to the Minisculus challenge set by Eden Development at the recent Software Craftsmanship 2010 event. Jim decided we should attempt the Mark I problem in Ruby, and due to the relative lack of Ruby knowledge in the room last night this meant that Mike Josephson did most of the driving. We didn’t get very far, but we did have some very interesting discussions about TDD style: After you’ve faked a return value to get quickly to GREEN, what’s the best step to take next? Is it better to add another test in order to triangulate towards a more general solution, or is it better to treat the fake return value as duplication and fix that by moving specifics up into the test? We explored the latter approach last night, and no doubt we’ll continue the debate next month.
Many thanks to Jim and Mike for running things, and to everyone else for joining in!
XP-Manchester happens after work on the second Thursday of every month at Madlab in Manchester’s Northern Quarter. Everyone is welcome, and if you want to come along you can get details of upcoming meetings by joining the mailing list at http://groups.google.com/group/xp-manchester.
In Ruby, one easy way to check for palindromes would be to write a method such as this:
self == self.reverse
But the C++ part of my brain screams that this is likely to be very inefficient. So, today’s kata: test-drive a palindrome? method for String, ensuring (with tests) that it is much faster than the above.
In The Cardinality of a Fluent Interface Michael Feathers posed a code kata in which one has to create a DSL in which constructs such as the following all provide the expected numeric value:
My first pass at a solution in Ruby is available on github. There’s still some duplication in there, and hopefully I’ll get around to removing it soon.
In his post, Mike asks whether the DSL’s “cardinality” — the number of classes required to implement the DSL — says anything about the DSL’s grammar. Well, my solution has 7 classes (although PartialValue exists only to remove a little duplication). I have no idea what that might mean in the DSL world.
You can get the code from the github repository.
I’ve just noticed that one of my practices has changed subtly, probably sometime during the last six months or so:
I always keep a to-do list during each programming episode. The list begins with one entry, describing the goal I’m working towards. And as I read and write code, new entries are added to make sure I go back and fix or refactor everything I see along the way. Until recently, those additional entries tended to describe an outcome – “split class Foo to create Bar and Xyzzy”, or “wrap x and y into a Mouse class”, for example. But recently I’ve noticed myself tending to write entries that describe the problem – “class Foo too large”, “x,y travel around together”.
I have no idea how, why or when this change happened, but I’m pleased it did. Between the time I write a particular to-do item and the time I finally get around to executing it, I may significantly change the code in that area. Or I may learn something about the code and where it wants to go next. Or the problem may disappear, as a side-effect of some other change. Either way, the solution I might have on my to-do list has a chance of being inappropriate now that the code and I have moved on. So by listing the problem as I originally saw it, I’m giving myself a much better chance of creating the right solution for it – because I’m deciding on that solution in the presence of the full facts.
I’ve recently been re-doing a few kata that I first tried a couple of years ago. I’ve noticed that my results are dramatically better now, mostly I think due to delaying the moment at which I decide how to solve each micro-problem. This effect is an example of Mary Poppendieck’s “decide as late as possible” maxim. There are clear benefits in terms of the quality of my designs; and there are psychological benefits too, because I don’t have to spend time trying to remember why I wanted to do each item on the list. Everyone wins.