Last week at Scottish Ruby Conference I chatted with Brian Marick about software design. The week before that, he had been in Göteborg for Scandinavian Developer Conference, and had spent a morning pair programming with Geoff on TextTest. I took the chance to ask Brian what he thought about text-based testing now that he had seen it in action.
Brian's view seems to be that text-based testing may be effective as a testing technique, but it just doesn't offer the design benefits you get with standard TDD. It doesn't give you guidance about small-scale design decisions, or intice you to structure your code into really small methods and classes. He thought the TextTest codebase wasn't bad, but that the methods were larger than he would prefer, some classes were doing too much, and some ideas were not expressed clearly.
I've previously read about this design ethic of having really really small classes and methods in Bob Martin's book "Clean Code". Bob recommends one or two line methods. In contrast, I believe Steve McConnell's "Code Complete" advocates methods small enough to fit comfortably on one screen. Geoff's design for TextTest seems to land at about 5 or 6 lines for a typical method, which is somewhere in between.
Brian said the main drawback of code structured largely into small classes and one and two line methods is that it is harder for people who are unfamiliar with the codebase to get to grips with it. You get lost in the trees, and can't easily get an overview of the forest. The big benefit is that people who are familiar with the code can potentially make sweeping improvements through very small, localized changes.
I've had the opportunity to work on this kind of codebase recently, and my experience hasn't been entirely positive. Just as Brian predicted, I've found it hard to get into the code and grasp what it is doing. All the methods are one or two lines, and call each other. My pair and I ended up creating a temporary file where we pasted a whole call chain of about 6 methods so we could see them all at once, and read them in the order they called each other. There were 3 defects hidden in that 15 or so lines of code, and it took us a day or so to identify and fix them all. Yet, the code had been TDD'd, and the methods were well named. On the surface it looked very good. It actually took me some time to convince myself that we genuinely had found a defect, and that we hadn't just misunderstood what the code was supposed to do.
The trouble seemed to stem from the fact that almost all the tests were for the "happy path" and there were several edge cases they never considered. Also, in one case the test had stubbed out the answer that one of the lower-level methods would return, and provided an answer the real code never gave. It took a long time for us to add missing tests for edge cases, and localize the defects to particular methods in the long call chain.
I'm very interested in whether we would have found it easier to find the defects if the code had been structured as two or three 5 line methods instead of 6 one and two line methods. I'm also interested if we would have found the issues more easily if the code had been built with text-based testing, with fewer, more coarse grained tests, and a few log statements printing key intermediate values. I'm considering getting the original versions of the files out of git and refactoring them to see how else they could have looked.
I don't want you to conclude that I am against building designs with really small methods, or TDD, or using stubs or anything like that. I think there is value in all these techniques, each can be done well or badly, and you make tradeoffs when you choose your approach. You still need design skills and testing skills, whether you're doing TDD or text-based testing. If Brian Marick had built TextTest the design might well have turned out differently. I don't know how much of that would have been because of his use of TDD, and how much because of his skill, and views on design.
I'm actually relishing the prospect of working on more code with very small methods, and using TDD to build on it. I've got loads to learn about software design and testing :-)
I think my methods average longer than Mr. Martin's. How do the ones in http://github.com/marick/critter4us compare to Geoff's?
ReplyDelete