Some thoughts on naming tests by Mark Seemann
What is the purpose of a test name?
Years ago I was participating in a coding event where we did katas. My pairing partner and I was doing silent ping pong. Ping-pong style pair programming is when one programmer writes a test and passes the keyboard to the partner, who writes enough code to pass the test. He or she then writes a new test and passes control back to the first person. In the silent variety, you're not allowed to talk. This is an exercise in communicating via code.
My partner wrote a test and I made it pass. After the exercise was over, we were allowed to talk to evaluate how it went, and my partner remarked that he'd been surprised that I'd implemented the opposite behaviour of what he'd intended. (It was something where there was a fork in the logic depending on a number being less than or greater to zero; I don't recall the exact details.)
We looked at the test that he had written, and sure enough: He'd named the test by clearly indicating one behaviour, but then he'd written an assertion that looked for the opposite behaviour.
I hadn't even noticed.
I didn't read the test name. I only considered the test body, because that's the executable specification.
How tests are named #
I've been thinking about test names ever since. What is the role of a test name?
In some languages, you write unit tests as methods or functions. That's how you do it in C#, Java, and many other languages:
[Theory] [InlineData("Home")] [InlineData("Calendar")] [InlineData("Reservations")] public void WithControllerHandlesSuffix(string name) { var sut = new UrlBuilder(); var actual = sut.WithController(name + "Controller"); var expected = sut.WithController(name); Assert.Equal(expected, actual); }
Usually, when we define new class methods, we've learned that naming is important. Truly, this applies to test methods, too?
Yet, other languages don't use class methods to define tests. The most common JavaScript frameworks don't, and neither does Haskell HUnit. Instead, tests are simply values with labels.
This hints at something that may be important.
The role of test names #
If tests aren't necessarily class methods, then what role do names play?
Usually, when considering method names, it's important to provide a descriptive name in order to help client developers. A client developer writing calling code must figure out which methods to call on an object. Good names help with that.
Automated tests, on the other hand, have no explicit callers. There's no client developer to communicate with. Instead, a test framework such as xUnit.net scans the public API of a test suite and automatically finds the test methods to execute.
The most prominent motivation for writing good method names doesn't apply here. We must reevaluate the role of test names, also keeping in mind that with some frameworks, in some languages, tests aren't even methods.
Mere anarchy is loosed upon the world #
The story that introduces this article has a point. When considering a test, I tend to go straight to the test body. I only read the test name if I find the test body unclear.
Does this mean that the test name is irrelevant? Should we simply number the tests: Test1
, Test212
, and so on?
That hardly seems like a good idea - not even to a person like me who considers the test name secondary to the test definition.
This begs the question, though: If Test42
isn't a good name, then what does a good test name look like?
Naming schemes #
Various people suggest naming schemes. In the .NET world many people like Roy Osherove's naming standard for unit tests: [UnitOfWork_StateUnderTest_ExpectedBehavior]
. I find it too verbose to my tastes, but my point isn't to attack this particular naming scheme. In my Types + Properties = Software article series, I experimented with using a poor man's version of Given When Then:
[<Property>] let ``Given deuce when player wins then score is correct`` (winner : Player) = let actual : Score = scoreWhenDeuce winner let expected = Advantage winner expected =! actual
It was a worthwhile experiment, but I don't think I ever used that style again. After all, Given When Then is just another way of saying Arrange Act Assert, and I already organise my test code according to the AAA pattern.
These days, I don't follow any particular naming scheme, but I do keep a guiding principle in mind.
Information channel #
A test name, whether it's a method name or a label, is an opportunity to communicate with the reader of the code. You can communicate via code, via names, via comments, and so on. A test name is more like a mandatory comment than a normal method name.
Books like Clean Code make a compelling case that comments should be secondary to good names. The point isn't that all comments are bad, but that some are:
var z = x + y; // Add x and y
It's rarely a good idea to add a comment that describes what the code does. This should already be clear from the code itself.
A comment can still provide important information that code can't easily do. It may explain the purpose of the code. I try to take this into account when naming tests: Not repeat what the code does, but suggest a hint about its raison d'être.
I try to strike a balance between Test2112
and Given deuce when player wins then score is correct
. I view the task of naming tests as equivalent to producing section headings in an article like this one. They offer a hint at the kind of information that might be available in the section (The role of test names, How tests are named, or Information channel), but sometimes they're more tongue-in-cheek than helpful (Mere anarchy is loosed upon the world). I tend to name tests with a similar degree of precision (or lack thereof): HomeReturnsJson
, NoHackingOfUrlsAllowed
, GetPreviousYear
, etcetera.
These names, in isolation, hardly tell you what the tests are about. I'm okay with that, because I don't think that they have to.
What do you use test names for? #
I occasionally discuss this question with other people. It seems to me that it's one of the topics where Socratic questioning breaks down:
Them: How do you name tests?
Me: I try to strike a balance between information and not repeating myself.
Them: How do you like this particular naming scheme?
Me: It looks verbose to me. It seems to be repeating what's already in the test code.
Them: I like to read the test name to see what the test does.
Me: If the name and test code disagree, which one is right?
Them: The test name should follow the naming scheme.
Me: Why do you find that important?
Them: It's got... electrolytes.
Okay, I admit that I'm a being uncharitable, but the point that I'm after is that test names are different, yet most people seem to reflect little on this.
When do you read test names?
Personally, I rarely read or otherwise use test names. When I'm writing a test, I also write the name, but at that point I don't really need the name. Sometimes I start with a placeholder name (Foo
), write the test, and change the name once I understand what the test does.
Once a test is written, ideally it should just be sitting there as a regression test. The less you touch it, the better you can trust it.
You may have hundreds or thousands of tests. When you run your test suite, you care about the outcome. Did it pass or fail? The outcome is the result of a Boolean and operation. The test suite only passes when all tests pass, but you don't have to look at each test result. The aggregate result is enough as long as the test suite passes.
You only need to look at a test when it fails. When this happens, most tools enable you to go straight to the failing test by clicking on it. (And if this isn't possible, I usually find it easier to navigate to the failing test either by line number or by copying the test name and navigating to it by pasting the name into my editor's navigation UI.) You don't really need the name to find a failing test. If the test was named Test1337
it would be as easy to find as if it was named Given deuce when player wins then score is correct
.
Once I look at a failing test, I start by looking at the test code and comparing that to the assertion message.
Usually, when a test fails, it breaks for a reason. A code change caused the test to fail. Often, the offending change was one you did ten seconds earlier. Armed with an assertion message and the test code, I usually understand the problem right away.
In rare cases the test is one that I've never seen before, and I'm confused about its purpose. This is when I read the test name. At that point, I appreciate if the name is helpful.
Conclusion #
I'm puzzled that people are so passionate about test names. I consider them the least important part of a test. A name isn't irrelevant, but I find the test code more important. The code is an executable specification. It expresses the desired truth about a system.
Test code is code that has the same lifetime as the production code. It pays to structure it as well as the production code. If a test is well-written, you should be able to understand it without reading its name.
That's an ideal, and in reality we are fallible. Thus, providing a helpful name gives the reader a second chance to understand a test. The name shouldn't, however, be your first priority.
Comments
I often want run selected tests from the command line and thus use the test runner's abilty to filter all available tests. Where the set of tests I want to run is all the tests below some point in the heirarchy of tests I can filter by the common prefix, or the test class name.
But I also often find myself wanting to run a set of tests that meet some functional criteria, e.g Validation approval tests, or All the tests for a particular feature across all the levels of the code base. In this case if the tests follow a naming convention where such test attributes are included in the test name, either via the method or class name, then such test filtering is possible.
Mark, are you a Classicist or a Mockist? I'm going to go out on a limb here and say you're probably a classicist. Test code written in a classicist style probably conveys the intent well already. I think code written in a Mockist style may not convey the intent as well, hence the test name (or a comment) becomes more useful to convey that information.
There are (at least) two ways of using test names (as well as test module names, as suggested by Struan Judd) that we make extensive use of in the LinkedIn code base and which I have used in every code base I have ever written tests for:
To indicate the intent of the test. It is well and good to say that the assertions should convey the conditions, but often it is not clear why a condition is intended to hold. Test names (and descriptive strings on the assertions) can go a very long way, especially when working in a large and/or unfamiliar code base, to understand whether the assertion remains relevant, or how it is relevant.
Now, granted: it is quite possible for those to get out of date, much as comments do. However, just as good comments remain valuable even though there is a risk of stale comments, good test names can be valuable even though they can also become stale.
The key, for me, is exactly the same as good comments—and you could argue that comments therefore obviate the need for test names. If we only cared about tests from the POV of reading the code, I would actually agree! However, because we often read the tests as a suite of assertions presented in some other UI (a terminal, a web view, etc.), the names and assertion descriptions themselves serve as the explanation when reading.
To provide structure and organization to the test suite. This is the same point Struan Judd was getting at: having useful test names lets you filter down to relevant chunks of the suite easily. This is valuable even on a small code base (like the
Maybe
andResult
library in TypeScript a friend and I maintain), but it becomes invaluable when you have tens of thousands of tests to filter or search through, as in the main LinkedIn app!For that reason, we (and the Ember.js community more broadly) make extensive use of QUnit's
module()
hook to name the set of modules under test (module('Rendering | SomeComponent', function () { ... }
ormodule('Unit | some-utility', function () { ... }
) as well as namingtest()
(test('returns `null` if condition X does not hold', function (assert) { ... }
) and indeed providing descriptive strings forassert()
calls. We might even nestmodule()
calls to make it easy to see and filter from how our test UI presents things: Rendering | SomeComponent > someMethod > a test description.Now, how that plays out varies library to library. The aforementioned TS library just names the test with a decent description of what is under test (here, for example) as well as grouping them sensibly with overarching descriptions, and never uses assertion descriptions because they wouldn’t add anything. A couple of the libraries I wrote internally at LinkedIn, by contrast, make extensive use of both. It is, as usual, a tool to be employed as, and only as, it is useful. But it is indeed quite useful sometimes!
Struan, thank you for writing. I can't say that I've given much thought to the need to run subsets of a test suite. You have a point, though, that if that's a requirement, you need something on which to filter.
Is the name the appropriate criterion for that, though? It sounds brittle to me, but I grant that it depends on which alternatives are available. In xUnit.net, for example, you can use the
[Trait]
attribute to annotate tests with arbitrary metadata. I think that NUnit has a similar feature, but there's no guarantee that every unit testing framework on any platform or language supports such a feature.Whenever a framework supports such metadata-based filtering, I'd favour relying on that instead of naming conventions. Naming conventions are vulnerable to misspellings and other programmer errors. That may also be true of metadata-based categorisation, but hopefully to a lesser degree, as these might enable you to use ordinary language features to keep the categories DRY.
Using names also sounds restrictive to me. Doesn't this mean that you have to be able to predict your filtering requirements when you decide on a naming scheme?
What if, later, you find that you need to filter on a different dimension? With metadata annotations, you should be able to add a new category to the affected tests, but how will you do that with an established naming scheme?
Overall, though, the reason that I haven't given this much thought is that I've never had the need to filter tests in arbitrary ways. You must be doing something different from how I work with tests. Why do you need to filter tests?
Eddie, thank you for writing. I don't find the linked article illuminating if one hasn't already heard about the terms mockist and classicist. I rather prefer the terms interaction-based and state-based testing. In any case, I started out doing interaction-based testing, but have since moved away from that. Even when I mainly wrote interaction-based tests, though, I didn't like rigid naming schemes. I don't see how that makes much of a difference.
I agree that a test name is a fine opportunity to convey intent. Did that not come across in the article?
Chris, thank you for writing. As I also responded to Eddie Stanley, I agree that a test name is a fine opportunity to convey intent. Did that not come across in the article?
To your second point, I'll refer you to my answer to Struan Judd. I'm still curious to learn why you find it necessary to categorise and filter tests.