coding is like cooking

Friday, 21 September 2012

SOLID exercises in Python

This post has moved to http://coding-is-like-cooking.info/2012/09/solid-exercises-in-python/

Tuesday, 4 September 2012

SOLID principles and TDD

This post has moved to http://coding-is-like-cooking.info/2012/09/solid-principles-and-tdd/

Thursday, 16 August 2012

Comparing Refactoring Kata Feedback Tools

For a little while now I've been collecting Refactoring Kata exercises in a github repo, (you're welcome to clone it and try them out). I've recently facilitated working on some of these katas at various coding dojo meetings, and participants seem to have enjoyed doing them. I usually give a short introduction about the aims of the dojo and the refactoring skills we're focusing on, then we split into pairs and work on one of these Refactoring Katas for a fixed timebox. Afterwards we compare designs and discuss what we've learnt in a short retrospective. It's satisfying to take a piece of ugly code and after only an hour or so make it into something much more readable, flexible, and significantly smaller.

Test Driven Development is a multifaceted skill, and one aspect is the ability to improve a design incrementally, applying simple refactorings one at a time until they add up to a significant design improvement. I've noticed in these dojo meetings that some pairs do better than others at refactoring in small steps. I can of course stand behind them and coach to some extent, but I was wondering if we could use a tool that would watch more consistently, and help pairs notice when they are refactoring poorly.

I spent an hour or so doing the GildedRose refactoring kata myself in Java, and while I was doing it I had two different monitoring tools running in the background, the "Codersdojo Client" from http://content.codersdojo.org/codersdojo_client/, and "Sessions Recorder" from Industrial Logic. (This second tool is commercial and licenses cost money, but I actually got given a free license so I could try it out and review it). I wanted to see if these tools could help me to improve my refactoring skills, and whether I could use them in a coding dojo setting to help a group.

Setting it up and recording your Kata
The Codersdojo Client is a ruby gem that you download and install. When you want to work on a kata, you have to fiddle about a bit on the command line getting some files in the right places, (like the junit jar), then modify and run a couple of scripts. It's not difficult if you follow the instructions and know basicly how to use the command line. You have a script running all the time you are coding, and it runs the tests every time you save a source file.

The Sessions Recorder is an Eclipse plugin that you download and install in the same way as other Eclipse plugins. It puts a "record" button on your toolbar. You press "record" before you start working on the kata.

Uploading the Kata for analysis
When you've finished the kata, you need to upload your recording for analysis. With the Codersdojo Client, when you stop the script, it gives you the option of doing the upload. When that's completed it gives you a link to a webpage, where you can fill in some metadata about the Kata and who you are. Then it takes you to a page with the full final code listing and analysis.

The Sessions Recorder is similar. You press the button on the Eclipse toolbar to stop recording, and save the session in a file. Then you go to the Indutrial Logic webpage, log into your account, and go to the page where you upload your recorded Session file. You don't have to enter any metadata, since you have an account and it remembers who you are, (you did pay for this service after all!) It then takes you to a page of analysis.

Codersdojo Client Analysis
The codersdojo client creates a page that you can make public if you want - mine is available here. It gives you a graph like this:

It's showing how long you spent between saving the file, (a "move"), and whether the tests were red or green when you saved. There is also some general statistics about how long you spent in each move, and how many modifications there were on average. It points out your three longest moves, and has links to them so you can see what you were doing in the code at those points.

I think this analysis is quite helpful. I can see that I'm going no more than two or three minutes between saving the file, and usually if the tests go red I fix them quickly. Since it's a refactoring kata I spend quite a lot of moves at the start where it's all green, as I build up tests to cover the functionality. In the middle there is a red patch, and this is a clear sign to me that I could have done that part of the kata better. Looking over my code I was doing a major redesign and I should have done it in a better way that would have kept the tests running in the meantime.

Towards the end of the kata I have another flurry of red moves, as I start adding new functionality for "Conjured" items. I tried to move into a more normal TDD red-green-refactor cycle at that point, but it actually doesn't look like I succeeded very well from this graph. I think I rushed past the "green" step without running the tests, then did a big refactoring. It worked in the end but I think I could have done that better too.

Sessions Recorder Analysis
The Sessions Recorder produces a page which is personal to me, and I don't think it allows me to share it publicly on the web. On the page is a graph that looks like this:

As you can see it also shows how long I spend with passing and failing tests, in a slightly different way from the Codersdojo Client's graph. It also distinguishes compiler errors from failing tests, (pink vs red).

This graph also clearly shows the areas I need to improve - the long pink patch in the middle where I do a major redesign in too large a step, and at the red bit at the end when I'm not doing TDD all that well.

The line on the graph is a "score" where it is awarding me points when I successfully perform the moves of TDD. Further down the page it gives me a list of the "events" this score is based on:

(This is just some of the events, to show you the kinds of things this picks up on.) "New Green Test" seems to score zero points, which is a bit disappointing, but adding a failing test gets a point, and so does making it pass. "Went green, but broke other tests" gets zero points. It's clearly designed to help me successfully complete red-green-refactor cycles, not reward me for adding test coverage to existing code, then refactoring it.

There is another graph, more focused on the tests:

This graph has mouseover texts so when you hover over a red dot, it shows all the compilation errors you had at that point, and if you hover over a green dot it tells you which tests were passing. It also distinguishes "compler errors" from a "compiler rash". The difference is that a "compiler rash" is a more serious compilation problem, that affects several files.

You can clearly see from this graph that the first part of the kata I was building up the test coverage, then just leaning on these tests and refactoring for the rest. It hasn't noticed that I had two @Ignore 'd tests until the last few minutes though. (I added failing tests for Conjured Items near the start then left them until I had the design suitably refactored near the end).

I actually found this graph quite hard to use to work out what I need to improve on. There seem to be three long gaps in the middle, full of compilation errors where I wasn't running the tests. Unlike with the Codersdojo Client, there isn't a link to the actual code changes I was making at those points. I'm having trouble working out just from the compiler errors what I should have been doing differently. I think one of these gaps is the same major redesign I could see clearly in the Codersdojo Client graph as a too big step, but I'm not so sure what the other two are.

There are further statistics and analysis as well. There is a section for "code smells" but it claims not to have found any. The code I started with should qualify as smelly, surely? Not sure why it hasn't picked up on that.

Conclusions
I think both tools could help me to become better at Test Driven Development, and could be useful in a dojo setting. I can imagine pairs comparing their graphs after completing the kata, discussing how they handled the refactoring steps, and where the design ended up. Pairs could swap computers and look through someone else's statistics to get some comparison with their own.

The Codersdojo Client is free to use, and works with a large number of programming languages, and any editor. You do have to be comfortable with the command line though. The Sessions Recorder tool only supports Java and C# via Eclipse. It has more detailed analysis, but for this Refactoring Kata I don't think it was as helpful as it could have been.

The other big difference between the tools is about openness. The Sessions Recorder keeps your analysis private to you, and if you want to discuss your performance, it lets you do so with the designers of the tool via a "comment on this page" function. I havn't tried that out yet so I'm not sure how it works, that is, whether you get feedback from a real person as well as the tool.

The Codersdojo Client also lets you keep your analysis private if you want, but in addition lets you publish your Kata performance for general review, as I have done. You can share your desire for feedback on twitter, g+ or facebook. People can go in and comment on specific lines of code and make suggestions. That wouldn't be so needed during a dojo meeting, but might be useful if you were working alone.

Further comparison needed
On another occasion I tried out the Sessions Recorder on a normal TDD kata, and found the analysis much better. For example this graph of me doing the Tennis kata from scratch:

This shows a clear red-green pattern of small steps, and steadily increasing score rewarding me for doing TDD correctly. Unfortunately I didn't do a Codersdojo Client session at the same time as this one, for comparison. A further blog post is clearly needed for this case... :-)

Monday, 6 August 2012

Principles of Agile Test Automation

Please note - As of March 2013, I have rewritten this post in the light of further experience and discussions. The updated post is available here.

I feel like I've spent most of my career learning how to write good automated tests in an agile environment. When I downloaded JUnit in the year 2000 it didn't take long before I was hooked - unit tests for everything in sight. That gratifying green bar is near-instant feedback that everthing is as expected, my code does what I intended, and I can continue developing from a firm foundation.

Later, starting in about 2002, I began writing larger granularity tests, for whole subsystems; functional tests if you like. The feedback that my code does what I intended, and that it has working functionality has given me confidence time and again to release updated versions to end-users.

Often, I've written functional tests as regression tests, after the functionality is supposed to work. In other situations, I've been able to write these kinds of tests in advance, as part of an ATDD, or BDD process. In either case, I've found the regression tests you end up with need to have certain properties if they're going to be useful in an agile environment moving forward. I think the same properties are needed for good agile functional tests as for good unit tests, but it's much harder. Your mistakes are amplified as the scope of the test increases.

I'd like to outline four principles of agile test automation that I've derived from my experience.

Coverage

If you have a test for a feature, and there is a bug in that feature, the test should fail. Note I'm talking about coverage of functionality, not code coverage, although these concepts are related. If your code coverage is poor, your functionality coverage is likely also to be poor.

If your tests have poor coverage, they will continue to pass even when your system is broken and functionality unusable. This can happen if you have missed out needed test cases, or when your test cases don't check properly what the system actually did. The consequences of poor coverage is that you can't refactor with confidence, and need to do additional (manual) testing before release.

The aim for automated regression tests is good Coverage: If you break something important and no tests fail, your test coverage is not good enough. All the other principles are in tension with this one - improving Coverage will often impair the others.

Readability

When you look at the test case, you can read it through and understand what the test is for. You can see what the expected behaviour is, and what aspects of it are covered by the test. When the test fails, you can quickly see what is broken.

If your test case is not readable, it will not be useful. When it fails you will have to dig though other sources outside of the test case to find out what is wrong. Quite likely you will not understand what is wrong and you will rewrite the test to check for something else, or simply delete it.

As you improve Coverage, you will likely add more and more test cases. Each one may be fairly readable on its own, but taken all together it can become hard to navigate and get an overview.

Robustness

When a test fails, it means the functionality it tests is broken, or at least is behaving significantly differently from before. You need to take action to correct the system or update the test to account for the new behaviour. Fragile tests are the opposite of Robust: they fail often for no good reason.

Aspects of Robustness you often run into are tests that are not isolated from one another, duplication between test cases, and flickering tests. If you run a test by itself and it passes, but fails in a suite together with other tests, then you have an isolation problem. If you have one broken feature and it causes a large number of test failures, you have duplication between test cases. If you have a test that fails in one test run, then passes in the next when nothing changed, you have a flickering test.

If your tests often fail for no good reason, you will start to ignore them. Quite likely there will be real failures hiding amongst all the false ones, and the danger is you will not see them.

As you improve Coverage you'll want to add more checks for details of your system. This will give your tests more and more reasons to fail.

Speed

As an agile developer you run the tests frequently. Both (a) every time you build the system, and (b) before you check in changes. I recommend time limits of 2 minutes for (a) and 10 minutes for (b). This fast feedback gives you the best chance of actually being willing to run the tests, and to find defects when they're cheapest to fix.

If your test suite is slow, it will not be used. When you're feeling stressed, you'll skip running them, and problem code will enter the system. In the worst case the test suite will never become green. You'll fix the one or two problems in a given run and kick off a new test run, but in the meantime someone else has checked in other changes, and the new run is not green either. You're developing all the while the tests are running, and they never quite catch up. This can become pretty demoralizing.

As you improve Coverage, you add more test cases, and this will naturally increase the execution time for the whole test suite.

How are these principles useful?

I find it useful to remember these principles when designing test cases. I may need to make tradeoffs between them, and it helps just to step back and assess how I'm doing on each principle from time to time as I develop.

I also find these principles useful when I'm trying to diagnose why a test suite is not being useful to a development team, especially if things have got so bad they have stopped maintaining it. I can often identify which principle(s) the team has missed, and advise how to refactor the test suite to compensate.

For example, if the problem is lack of Speed you have some options and tradeoffs to make:

Invest in hardware and run tests in parallel (costs $)
Use a profiler to optimize the tests for speed the same as you would production code (may affect Readability)
push down tests to a lower level of granularity where they can execute faster. (may reduce Coverage and/or increase Readability)
Identify key test cases for essential functionality and remove the other test cases. (sacrifice Coverage to get Speed)

Explaining these principles can promote useful discussions with people new to agile, particularly testers. The test suite is a resource used by many agile teamembers - developers, analysts, managers etc, in its role as "Living Documentation" for the system, (See Gojko Adzic's writings on this). This emphasizes the need for both Readability and Coverage. Automated tests in agile are quite different from in a traditional process, since they are run continually throughout the process, not just at the end. I've found many traditional automation approaches don't lead to enough Speed and Robustness to support agile development.

I hope you will find these principles will help you to reason about the automated tests in your suite.

Images Attribution: DaPino Webdesign, Lebreton, Asher Abbasi, Woothemes, Iconshock, Andy Gongea, from www.iconspedia.com

Monday, 23 April 2012

Joseph Wilk on Acceptance Testing in a Startup

I've heard Joseph Wilk speak before, so I was delighted when he accepted our invitation to come to Scandinavian Developer Conference and share some ideas around "Acceptance testing in the land of the Startup". In this post I aim to summarize my understanding of what Joseph said. I think what he is doing at Songkick is really interesting and innovative, and is applicable beyond that particular environment.

Introduction

Joseph began by introducing what he meant by "Startup" using Eric Ries' definition:

"a human institution designed to create new products and services under conditions of extreme uncertainty"

"Acceptance testing" is usually about checking the product meets the previously agreed requirements - but if everything is changing all the time, that is not as simple as it sounds. Joseph's acceptance tests reflect what "done" looks like in conditions of extreme uncertainty.

Joseph talked about the Cynefin model (by Dave Snowden) and how working in a startup feels like you're in chaos a lot of the time, although parts of the business may be merely complicated or even simple. An effective strategy in the "chaos" part of the Cynefin model is "Act, Sense, Respond", which he related to the lean startup cycle of "Build, Measure, Learn".

The "Build" part of that cycle for their software product often takes more time than the other parts, and work should be coordinated between several people. During the "Build" phase you can use automated tests to help with communication and feedback. In the "Measure" phase, you're primarily looking for feedback from the actual end users, rather than automated tests.

The rest of the talk was largely about how Songkick use automated tests during the "Build" phase, and how they do acceptance testing with end users in the "Measure" phase.

The startup

The software Joseph is working on is for the startup Songkick which is devoted to live music. They are about 20-30 people, and the majority are programmers. They have two dedicated UX specialists and two QA specialists. Most people are multi-skilled, care deeply about the quality of the product, and are not afraid to learn how to help outside of their speciality.

Songkick holds three principles dear:

Embrace Change
Usability
Testing

The rest of his talk was about the concrete practices they use around this last principle - Testing. Joseph stressed that he was reporting what they had found to work, and not holding up any "best practice" or industry gold standard.

Features

At Songkick, they use Behaviour Driven Development. They are continually building and improving a "ubiquitous language" to describe their system, by writing Features and Scenarios in a semi-formal syntax. The tool Cucumber is used to turn these features and scenarios into Executable Specifications (aka Automated Tests).

Joseph gave a short intro to how Cucumber works - take a look at this site if you're not familiar with it. In short, Cucumber lets you define many "Features", each of which define numerous "Scenarios" which exemplify the desired feature behaviour. It requires you use a particular syntax called "Gherkin". The gist of it is that you use the following formats when writing features and scenarios:

Feature:
"In order to... I want... So that..."

Scenario:
"Given... When... Then..."

If you put in some extra work and also write "Step Definitions", your scenarios become executable, and can automatically check the application behaves as expected when run.

The primary purpose of the Cucumber specifications is to promote communication within the team: the words are carefully chosen to have clear meanings related to the domain of the system you're building.

Hypotheses

In Lean Startup you don't talk so much about building a "Feature", more about having a "Hypothesis". As in "if we add a pink button here, do we get more traffic to the site?"

Joseph showed the following example of a Feature with a Hypothesis:

Facebook signup
In order to increase signup
I want visitors to sign up via Facebook
So that we see a 10% increase in signups

That last line is the measurable part of the hypothesis, that makes it more than just a normal Cucumber Feature. When you've delivered a system that has this feature, it should lead to 10% more signups. If it doesn't, the system might be modified to remove this feature. Features written in this form (In order to... I want... So that...) are prioritized in the product backlog.

When a developer is ready to start working on a feature, she calls a meeting to discuss it. At the meeting there should be representatives from four different job roles: a developer, tester, business analyst and usability expert. Together they discuss what the feature is, the details of how it should work, and perhaps create some wireframes or UI mockups. They will also brainstorm perhaps 7-8 scenarios, such as:

successful signup via facebook
failed signup via facebook
facebook changes their API and no one can sign in any more

After the meeting, the developer starts working on the code to implement the feature, bearing in mind the discussion and the scenarios. The people in the other job roles will also get involved in development, and assessing whether the feature is ready for deployment. Some or all of the scenarios they have come up with will be automated as executable with Cucumber.

There is an overhead to automating scenarios with Cucumber: implementing "Step Definitions" takes time, and afterwards the scenarios are relatively slow to execute. Joseph said that they were more and more only automating a few scenarios for each feature, often just a happy and a sad path. The other scenarios would perhaps be used in unit tests, or exploratory testing. The most important aspect of writing many scenarios was to discover issues with a feature before implementation.

All their Cucumber features and scenarios are published in a searchable web interface called "Relish" which he said encouraged developers to use more business-friendly language, since they knew non-programmers would look there. Joseph said features containing incomprehensible technobabble like "Given the asynchronous message buffer queue has been emptied" (!) do appear occasionally, but don't last long. Their ubiquitous language is highly visible and in daily use.

Actual Acceptance Tests

Songkick uses a Continuous Delivery approach, where anyone in the company can deploy the latest code, at the push of a button. They have invested a lot of effort to make it all automated, so that when anyone checks in code changes, Jenkins builds it, runs all the tests, (including Cucumber specs), and allows anyone to deploy any successful build. He said deployment was so easy even a dog could do it, (and had on one occasion!).

Joseph explained that many features would be built in such a way that they were "turned off" in production for a few days while they were being built. Only once the whole feature was implemented and present would they activate the code, and perhaps only then for a subset of all the visitors to the site. In this way you can build a feature that takes a few days to implement, and continue to release the new code every few hours. The half built feature is being deployed, it's just switched off.

The reason for all this effort to get code out of the door is that a feature can't be considered Done until its hypothesis has been validated. As Joshua Kerievsky said:

"A story isn't done until it's being used by real users in production and has been validated to be a useful part of a product"

By deploying often, you get that confirmation or rejection as soon as possible, so you can evaluate whether the feature was worth building. "Build, Measure, Learn". The "Measure" part happens after an acual customer has got hold of the code and started using it. That is the real acceptance testing that's going on, not the Cucumber scenarios.

So that's the way they work today, but it hasn't always been quite like that. Joseph also talked about why they have arrived at this process.

The build time issue

The Cucumber scenarios are run at every checkin, and Joseph said that at one point it took 4 hours to run them all. This was a serious problem, delaying feedback and slowing the continuous deployment process to a crawl.

Their first approach was to use their technical skills to divide up the tests and run them in parallel on the cloud. The build time went down from 4 hours to just 16 minutes, but at a cost of about $7000 per month in cloud servers. That proved too expensive to be sustainable, so they reconsidered.

The automated tests are there to give confidence that the system works before you deploy. They realized that some of the tests gave more confidence than others, so they removed something like 60% of them. These less valuable tests were then run much less frequently. This took them off the critical path to deployment, and enabled more frequent releases. So far this strategy has led to only insignificant problems in production.

They also re-evaluated their policy for which scenarios to automate in Cucumber, and started putting much more emphasis on unit tests. When you have comprehensive Cucumber tests, it's easy to have lots of confidence that the system works, and not bother with unit tests. Joseph said they were missing the design benefits of unit testing though - unit tests force your code to be decomposed into units! These days they write fewer Cucumber scenarios, and more unit tests.

Duplicating effort with QA

In addition to automated Cucumber scenarios, the QA people do manual testing before deployment. At one point the developers were surprised to learn than maybe 60-70% of these manual tests were covering functionality that was already well covered by Cucumber specs. They realized that the QA people had little confidence in the automated tests.

Once they had noticed the problem, they began to include the QA people more in the scenario writing process, so they could learn about what the tests did, and how they work. This helped give QA confidence to remove some of the manual tests, so now they only test the most crucial functionality manually before deployment.

Metrics as feedback

Joseph invented a tool called "Limited Red" that records metrics from every test run. It shows failure rates for each scenario, and correlates that with changes in the corresponding feature file, using the Git log. Using this data, he can plot graphs that show each feature, how many times it has failed, and how often it has been edited.

For example, he might find one of the features has failed 16 times, while the code in the feature was only changed 4 times. This could indicate the feature is testing an area of the code that is poorly implemented - the code is broken more often than the test is invalid. This gives developers feedback that they can act on to improve the quality of both the code and the Cucumber scenarios.

Joseph has a blog post with more detail about this tool and metrics approach.

My Concluding Thoughts

Most startups fail, and I have little insight into whether Songkick will be successful in the long run. I am fascinated though, by the way they are working. It seems to me they have a great process that promotes communication and feedback, coupled with enough introspection to adapt it as they learn more.

I'm interested to note that Songkick has extended the "Rule of Three" introduced by Lisa Crispin and Janet Gregory. In their book on "Agile Testing" they encourage you to include a tester in all discussions between a developer and customer. At Songkick they appear to have business analysts in the customer role, and additionaly include a UX (usability) expert in the discussion.

Joseph didn't mention it, but I suspect that his experiences automating fewer scenarios with Cucumber may be one of the reasons Liz Keogh wrote her blog post "Step away from the tools". Or maybe the blog came first, I don't know. The advice is the same, in any case - BDD is about communication first, test automation second.

The "Limited Red" tool is quite new, and I think Joseph is still learning how to act on the data he's gathering. Having said that, I have high hopes that eventually this kind of measurement and feedback will find its way into more automated testing tools and Jenkins builds. It seems to me that finding out which of your tests are giving the best value for money and which are costing the most to maintain will be generally useful.

My next challenge is to work out how to apply these kinds of ideas to the situation I find myself in right now. It's a large company not a startup, the customer seems totally inaccessible, the release cycle is years not hours, the build is slow (when it works at all), and UX experts are thin on the ground. Hmm. As Joseph said, his story is about what they've found to work, not some universal "best practices". It's given me some new ideas and encouragement - now I just have to work out what principles and practices I can reasonably introduce where I am.

Tuesday, 20 December 2011

Global Day of Code Retreat

A little while back I was back in Stockholm facilitating another Code Retreat, (see previous post), this time as part of Global Day of Code Retreat, (GDCR). (Take a look at Corey Haines' site for loads of information about this global event that comprised meetings in over 90 cities worldwide, with over 2000 developers attending.)

I think the coding community could really do with more people who know how to do TDD and think about good design, so I'm generally pretty encouraged by the success of this event. I do feel a bit disappointed about some aspects of the day though, and this post is my attempt to outline what I think could be improved.

I spent most of the Global Day of Code Retreat walking around looking over people's shoulders and giving them feedback on the state of their code and tests. I noticed that very few pairs got anywhere close to solving the problem before they deleted the code at the end of each 45 minute session. Corey says that this is very much by design. You know you don't have time to solve the problem, so you can de-stress: no-one will demand delivery of anything. You can concentrate on just writing good tests and code.

In between coding sessions, I spent quite a lot of effort reminding people about simple design, SOLID principles, and how to do TDD (in terms of states and moves). Unfortunately I found people rarely wrote enough code for any of these ideas to really be applicable, and TDD wan't always helping people.

I saw a lot of solutions that started with the "Cell" class, that had an x and y coordinate, and a boolean to say whether it was alive or not. Then people tended to add a "void tick(int liveNeighbourCount)" method, and start to implement code that would follow the four rules of Conway's Game of Life to work out if the cell should flip the state of its boolean or not. At some point they would create a "World" class that would hold all the cells, and "tick()" them. Or some variant of that. Then, (generally when the 45 minutes were running out), people started trying to find a data structure to hold the cells in, and some way to find out which were neighbours with each other. Not many questioned whether they actually needed a Cell class in the first place.

Everyone at the code retreat deleted their code afterwards, of course, but you can see an example of a variant on this kind of design by Ryan Bigg here (a work in progress, he has a screencast too).

Of course, as facilitator I spent a fair amount of time trying to ask the kind of questions that would push people to reevaluate their approach. I had partial success, I guess. Overall though, I came away feeling a bit disillusioned, and wanting to improve my facilitation so that people would learn more about using TDD to actually solve problems.

At the final retrospective of the day, everyone seemed to be very positive about the event, and most people said they learnt more about pair programming, TDD, and the language and tools they were working with. We all had fun. (If you read Swedish, Peter Lind wrote up the retrospective in Valtech's blog) This is great, but could we tweak the format to encourage even more learning?

I think to solve Conway's Game of Life adequately, you need to find a good datastructure to represent the cells, and an efficient algorithm to "tick" to the next generation. Just having well named classes and methods, although a good idea, probably won't be enough.

For me, TDD is a good method for designing code when you already mostly know what you're doing. I don't think it's a good method for discovering algorithms. I'm reminded of the debate a few years back when Peter Norvig posted an excellent Sudoku solver (here) and people thought it was much better than Ron Jeffries' TDD solution to the same problem (here). Peter was later interviewed about whether this proved that TDD was not useful. Dr Norvig said he didn't think that it proved much about TDD at all, since

"you can test all you want and if you don’t know how to approach the problem, you’re not going to get a solution"
(from Peter Siebel's book "Coders At Work", an extract of which is available in his blog)

I felt that most of the coders at the code retreat were messing around without actually knowing how to solve the problem. They played with syntax and names and styles of tests without ever writing enough code to tell whether their tests were driving a design that would ultimately solve the problem.

Following the GDCR, I held a coding dojo where I experimented with the format a little. I only had time to get the participants do two 45 minute coding sessions on the Game of Life problem. At the start, as a group we discussed the Game of Life problem, much as Corey recommends for a Code Retreat introduction. However, in addition, I explained my favoured approach to a solution - the data structure and algorithm I like to use. I immediately saw that they started their TDD sessions with tests and classes that could lead somewhere. I feel that if they had continued coding for longer, they should have ended up with a decent solution. Hopefully it would also have been possible to refactor it from one datastructure to another, and to switch elements of the algorithm without breaking most of the tests.

I think this is one way to improve the code retreat. Just give people more clues at the outset as to how to tackle the problem. In real life when you sit down to do TDD you'll often already know how to solve similar problems. This would be a way for the facilitator to give everyone a leg-up if they havn't worked on any rule-based games involving a 2D infinite grid before.

Rather than just explaining a good solution up front, we could spend the first session doing a "Spike Solution". This is one of the practices of XP:

"the idea is just to drive through the entire problem in one blow, not to craft the perfect solution first time out."
From "Extreme Programming Installed" by Jeffries, Anderson, Hendrickson, p41

Basically, you hack around without writing any tests or polishing your design, until you understand the problem well enough to know how you plan to solve it. Then you throw your spike code away, and start over with TDD.

Spending the first 45 minute session spiking would enable people to learn more about the problem space in a shorter amount of time than you generally do with TDD. By dropping the requirement to write tests or good names or avoid code smells, you could hopefully hack together more alternatives, and maybe find out for yourself what would make a good data structure for easily enumerating neighbouring cells.

So the next time I run a code retreat, I think I'll start with a "spiking" session and encourage people to just optimize for learning about the problem, not beautiful code. Then after that maybe I'll sketch out my favourite algorithm, in case they didn't come up with anything by themselves that they want to pursue. Then in the second session we could start with TDD in earnest.

I always say that the code you end up with from doing a Kata is not half as interesting as the route you took to get there. To this end I've published a screencast of myself doing the Game of Life Kata in Python. (The code I end up with is also published, here). I'm hoping that people might want to prepare for a Code Retreat by watching it. It could be an approach they try to emulate, improve on or criticise. On the other hand, showing my best solution like this is probably breaking the neutrality of how a code retreat facilitator is supposed to behave. I don't think Corey has ever published a solution to this kata, and there's probably a reason for that.

I have to confess, I already broke all the rules of being a facilitator, when I spent the last session of the Global Day of Code Retreat actually coding. There was someone there who really wanted to do Python, and no-one else seemed to want to pair with him. So I succombed. In 45 minutes we very nearly had a working solution, mostly because I had hacked around and practiced how to do it several times before. It felt good. I think more people should experience the satisfaction of actually completing a useful piece of code using TDD at a Code Retreat.

Scandinavian Developer Conference

Scandinavian Developer Conference will be held in Göteborg in April, and last week we launched the detailed programme on the website. I've been involved with the conference ever since the first event in 2009, but this year I've taken on increased responsibilities, acting as Programme Chair.

When P-A Freiholtz, the Managing Director at Apper Systems*, (the company behind the conference), approached me about taking this role, I jumped at the chance. My business is all about professional development for software developers, and a conference in Göteborg is a really good opportunity for a lot of local people to hear about what's going on in the world outside. Many of the developers I meet in my work don't often get away from their desks, spending the majority of their time caught up in the intrigues and deadlines of a single project and company environment. SDC2012 will be a chance for a lot of local people to broaden their horizons without it costing their boss a fortune, or disrupting their usual schedule too much.

I wrote an editorial on the front page of the conference website which explains more about what's on the programme and who should come. Everyone involved in software development, basically :-)

These days, we compete in a global marketplace for software development. I'm hopeful about the future, I'm enthusiastic about Swedish working culture, and I think Scandinavian Developer Conference is helping Göteborg grow as a center of competence. Please do join us at the conference.

* formerly named Iptor - now under new ownership.