coding is like cooking: code coverage and tests

At GothPy yesterday, Geoff talked about code coverage and tests. Geoff has spent a lot of his evenings lately working on PyUseCase, and getting the test coverage up to 100%, (statement coverage), a feat which he achieved last week. The evidence for this is available for all to see on the texttest site, (which is updated daily, btw, so if it is not green and 100% the day you read this post, then clearly Geoff had a bad day yesterday).

I have limited experience of using coverage statistics to evaluate my tests, so it was interesting to hear Geoff summarize his findings. He thought it had been well worth the effort to get coverage to 100%, he'd found some bugs, some dead code, and improved his design along the way. Actually, saying he has 100% coverage is a statement that needs qualification. The tool he's been using - coverage.py - has a feature whereby you can mark lines of code as # pragma: no cover, ie I don't want this line counted for coverage purposes. So he's marked 37 of 3242 lines like this.

The reason for excluding these lines is mostly practical - due to the nature of the tool you can't test it automatically when it is in "interactive" mode without physically pressing the buttons yourself - so automated tests for that part are impossible. Some excluded lines are for error cases which should never occur, but for which it would be useful to have a good error message if they ever did.

Overall, Geoff thinks coverage is very useful to help you to identify

poorly tested areas of your code
mistakes in your tests
dead code
refactoring opportunities

The first one is obvious, but the others might take more explaination. Generally, each test is for a specific feature. If you think you have a test for a feature, but the code coverage shows the implementation of that feature not to be covered, then there is probably a mistake in your test.

Similarly, if your tests cover all your features and some code is not covered, maybe it's not that important code at all, and could be safely removed. Geoff's tests are not unit tests, they are testing the whole of PyUseCase, and that maybe makes a difference with this particular point. If I just had unit tests, and a piece of code wasn't covered, I'm not sure I could as easily infer that it wasn't needed as a part of a larger feature.

Refactoring opportunities can be identified from gaps in coverage too. The idea is that poorly tested code is a clue that it has other problems too. Perhaps you find two pieces of code are similar, and one copy has a gap in coverage. This could indicate they originate from copy-paste programming, and could be combined into one routine, with full test coverage.

Geoff had some tips for people who wanted to use coverage statistics to improve their tests.

Don’t design your tests around coverage. Write appropriate tests, and then measure coverage.
This applies even when working with coverage results. See the coverage report as containing clues for new tests, not commands.
Use “#pragma : no cover” in your code to be explicit about code that you decide not to try and cover. Review these periodically.
Don’t be fanatical about absolute numbers. Commands like “Aim for at least 85% coverage” are counterproductive. (You get what you measure).
It’s always good to increase feasible coverage. It’s sometimes better to spend your limited time on other things. But if you don’t measure, you can’t make that decision effectively.

These last points are mostly also made in an article by Brian Marick which is quite old (1997). Geoff found the article when he was researching the talk for GothPy, and thought it was very good, and fits his experience.

Inspired by Geoff's talk, I spent some time today trying to get some coverage numbers for the code and tests I'm working on at present. Unfortunately it seemed to be a bit tricky to get the coverage tool to work. It's not python, of course, and that may have something to do with it. Hopefully I'll sort it out and be able to write a new blog post about my own experiences with coverage statistics sometime soon.

2 comments:

Andrew Dalke said...: I like Marik's statement "coverage tools don't give commands, they give clues".

BTW, I think it was interesting that the test coverage of the kata solutions revealed a missing test case.; 6 February 2010 at 03:21
Emily Bache said...: Yes, coverage can reveal missing test cases but not missing production code :-)

(I didn't mention in my post, but we spent some time after Geoff's talk tackling KataPokerHands and measuring coverage, which ended up being 100%, apart from one pair which missed one line of branch coverage.); 7 February 2010 at 19:56

Friday, 5 February 2010

code coverage and tests

2 comments: