Test-specific Eq by Mark Seemann
Adding Eq instances for better assertions.
Most well-written unit tests follow some variation of the Arrange Act Assert pattern. In the Assert phase, you may write a sequence of assertions that verify different aspects of what 'success' means. Even so, it boils down to this: You check that the expected outcome is equal to the actual outcome. Some testing frameworks like to turn the order around, but the idea remains the same. After all, equality is symmetric.
The ideal assertion is one that simply checks that actual is equal to expected. Some languages allow custom infix operators, in which case it's natural to define this fundamental assertion as an operator, such as @?=.
Since this is Haskell, however, the @?= operator comes with type constraints. Specifically, what we compare must be an Eq instance. In other words, the type in question must support the == operator. What do you do when a type is no Eq instance?
No Eq #
In a recent article you saw how a complicated test induced a tautological assertion. The main reason that the test was complicated was that the values involved were not Eq instances.
This got me thinking: Might test-specific equality help?
The easiest way to find out is to try. In this article, you'll see how that experiment turns out. First, however, you need a quick introduction to the problem space. The task at hand was to implement a cellular automaton, ostensibly modelling Galápagos finches meeting. When two finches encounter each other, they play out a game of Prisoner's Dilemma according to a strategy implemented in a domain-specific language.
Specifically, a finch is modelled like this:
data Finch = Finch { finchID :: Int, finchHP :: HP, finchRoundsLeft :: Rounds, -- The colour is used for visualisation, but has no semantic significance. finchColour :: Colour, -- The current strategy. finchStrategy :: Strategy, -- The expression that is evaluated to produce the strategy. finchStrategyExp :: Exp }
The Finch data type is not an Eq instance. The reason is that Strategy is effectively a free monad over this functor:
data EvalOp a
= ErrorOp Error
| MeetOp (FinchID -> a)
| GroomOp (Bool -> a)
| IgnoreOp (Bool -> a)
Since EvalOp is a sum of functions, it can't be an Eq instance, and this then applies transitively to Finch, as well as the CellState container that keeps track of each cell in the cellular grid:
data CellState = CellState { cellFinch :: Maybe Finch, cellRNG :: StdGen }
An important part of working with this particular code base is that the API is given, and must not be changed.
Given these constraints and data types, is there a way to improve test assertions?
Smelly tests #
The lack of Eq instances makes it difficult to write simple assertions. The worst test I wrote is probably this, making use of a predefined example Finch value named flipflop:
testCase "Cell 1 reproduces" $ let cell1 = Galapagos.CellState (Just flipflop) (mkStdGen 0) cell2 = Galapagos.CellState Nothing (mkStdGen 6) -- seeded to reprod actual = Galapagos.reproduce Galapagos.defaultParams (cell1, cell2) in do -- Sanity check on first finch. Unfortunately, CellState is no Eq -- instance, so we can't just compare the entire record. Instead, -- using HP as a sample: (Galapagos.finchHP <$> Galapagos.cellFinch (fst actual)) @?= Just 20 -- New finch should have HP from params: (Galapagos.finchHP <$> Galapagos.cellFinch (snd actual)) @?= Just 14 -- New finch should have lifespan from params: (Galapagos.finchRoundsLeft <$> Galapagos.cellFinch (snd actual)) @?= Just 23 -- New finch should have same colour as parent: ( Galapagos.finchColour <$> Galapagos.cellFinch (snd actual)) @?= Galapagos.finchColour <$> Galapagos.cellFinch cell1 -- More assertions, described by their error messages: ( (Galapagos.finchID <$> Galapagos.cellFinch (fst actual)) /= (Galapagos.finchID <$> Galapagos.cellFinch (snd actual))) @? "Finches have same ID, but they should be different." ((/=) `on` Galapagos.cellRNG) cell2 (snd actual) @? "New cell 2 should have an updated RNG."
As you can tell from the apologies all these assertions leave something to be desired. The first assertion uses finchHP as a proxy for the entire finch in cell1, which is not supposed to change. Instead of an assertion for each of the first finch's attributes, the test 'hopes' that if finchHP didn't change, then so didn't the other values.
The test then proceeds to verify various fields of the new finch in cell2, checking them one by one, since the lack of Eq makes it impossible to simply check that the actual value is equal to the expected value.
In comparison, the test you saw in the previous article is almost pretty. It uses another example Finch value named cheater.
testCase "Cell 1 does not reproduce" $ let cell1 = Galapagos.CellState (Just cheater) (mkStdGen 0) cell2 = Galapagos.CellState Nothing (mkStdGen 1) -- seeded: no repr. actual = Galapagos.reproduce Galapagos.defaultParams (cell1, cell2) in do -- Sanity check that cell 1 remains, sampling on strategy: ( Galapagos.finchStrategyExp <$> Galapagos.cellFinch (fst actual)) @?= Galapagos.finchStrategyExp <$> Galapagos.cellFinch cell1 ( Galapagos.finchHP <$> Galapagos.cellFinch (snd actual)) @?= Nothing
The apparent simplicity is mostly because at that time, I'd almost given up on more thorough testing. In this test, I chose finchStrategyExp as a proxy for each value, and 'hoped' that if these properties behaved as expected, other attributes would, too.
Given that I was following test-driven development and thus engaging in grey-box testing, I had reason to believe that the implementation was correct if the test passes.
Still, those tests exhibit more than one code smell. Could test-specific equality be the answer?
Test utilities for finches #
The fundamental problem is that the finchStrategy field prevents Finch from being an Eq instance. Finding a way to compare Strategy values seems impractical. A more realistic course of action might be to compare all other fields. One option is to introduce a test-specific type with proper Eq and Show instances.
data FinchEq = FinchEq { feqID :: Int , feqHP :: Galapagos.HP , feqRoundsLeft :: Galapagos.Rounds , feqColour :: Galapagos.Colour , feqStrategyExp :: Exp } deriving (Eq, Show)
This data type only exists in the test code base. It has all the fields of Finch, except finchStrategy.
While I could use it as-is, it quickly turns out that a helper function to turn a CellState value into a FinchEq value would also be useful.
finchEq :: Galapagos.Finch -> FinchEq finchEq f = FinchEq { feqID = Galapagos.finchID f , feqHP = Galapagos.finchHP f , feqRoundsLeft = Galapagos.finchRoundsLeft f , feqColour = Galapagos.finchColour f , feqStrategyExp = Galapagos.finchStrategyExp f } cellFinchEq :: Galapagos.CellState -> Maybe FinchEq cellFinchEq = fmap finchEq . Galapagos.cellFinch
Finally, the System Under Test (the reproduce function) takes a tuple as input, and returns a tuple of the same type as output. To avoid some code duplication, it's practical to introduce a data type that can map over both components.
newtype Pair a = Pair (a, a) deriving (Eq, Show, Functor)
This newtype wrapper makes it possible to map both the first and the second component of a pair (a two-tuple) using a single projection, since Pair is a Functor instance.
That's all the machinery required to rewrite the two tests shown above.
Improving the first test #
The first test may be rewritten as this:
testCase "Cell 1 reproduces" $ let cell1 = Galapagos.CellState (Just flipflop) (mkStdGen 0) cell2 = Galapagos.CellState Nothing (mkStdGen 6) -- seeded to reprod actual = Galapagos.reproduce Galapagos.defaultParams (cell1, cell2) expected = Just $ finchEq $ flipflop { Galapagos.finchID = -1142203427417426925 -- From Character. Test , Galapagos.finchHP = 14 , Galapagos.finchRoundsLeft = 23 } in do (cellFinchEq <$> Pair actual) @?= Pair (cellFinchEq cell1, expected) ((/=) `on` Galapagos.cellRNG) cell2 (snd actual) @? "New cell 2 should have an updated RNG."
That's still a bit of code. If you're used to C# or Java code, you may not bat an eyelid over a fifteen-line code block (that even has a few blank lines), but fifteen lines of Haskell code is still significant.
There are compound reasons for this. One is that the Galapagos module is a qualified import, which makes the code more verbose than it otherwise could have been. It doesn't help that I follow a strict rule of staying within an 80-character line width.
That said, this version of the test has stronger assertions than before. Notice that the first assertion compares two Pairs of FinchEq values. This means that all five comparable fields of each finch is compared against the expected value. Since the assertion compares two Pairs, that's ten comparisons in all. The previous test only made five comparisons on the finches.
The second assertion remains as before. It's there to ensure that the System Under Test (SUT) remembers to update its pseudo-random number generator.
Perhaps you wonder about the expected values. For the finchID, hopefully the comment gives a hint. I originally set this value to 0, ran the test, observed the actual value, and used what I had observed. I could do that because I was refactoring an existing test that exercised an existing SUT, following the rules of empirical Characterization Testing.
The finchID values are in practice randomly generated numbers. These are notoriously awkward in test contexts, so I could also have excluded that field from FinchEq. Even so, I kept the field, because it's important to be able to verify that the new finch has a different finchID than the parent that begat it.
Derived values #
Where do the magic constants 14 and 23 come from? Although we could use comments to explain their source, another option is to use Derived Values to explicitly document their origin:
testCase "Cell 1 reproduces" $ let cell1 = Galapagos.CellState (Just flipflop) (mkStdGen 0) cell2 = Galapagos.CellState Nothing (mkStdGen 6) -- seeded to reprod actual = Galapagos.reproduce Galapagos.defaultParams (cell1, cell2) expected = Just $ finchEq $ flipflop { Galapagos.finchID = -1142203427417426925 -- From Character. Test , Galapagos.finchHP = Galapagos.startHP Galapagos.defaultParams , Galapagos.finchRoundsLeft = Galapagos.lifespan Galapagos.defaultParams } in do (cellFinchEq <$> Pair actual) @?= Pair (cellFinchEq cell1, expected) ((/=) `on` Galapagos.cellRNG) cell2 (snd actual) @? "New cell 2 should have an updated RNG."
We now learn that the finchHP value originates from the startHP value of the defaultParams, and similarly for finchRoundsLeft.
To be honest, I'm not sure that this is an improvement. It makes the test more abstract, and if we wish that tests may serve as executable documentation, concrete example values may be easier to understand. Besides, this gets uncomfortably close to duplicating the actual implementation code contained in the SUT.
This variation only serves as an exploration of alternatives. I would strongly consider rolling this change back, and instead add some comments to the magic numbers.
Improving the second test #
The second test improves better.
testCase "Cell 1 does not reproduce" $ let cell1 = Galapagos.CellState (Just cheater) (mkStdGen 0) cell2 = Galapagos.CellState Nothing (mkStdGen 1) -- seeded: no repr. actual = Galapagos.reproduce Galapagos.defaultParams (cell1, cell2) in (cellFinchEq <$> Pair actual) @?= Pair (cellFinchEq cell1, Nothing)
Not only is it shorter, the assertion is much stronger. It achieves the ideal of verifying that the actual value is equal to the expected value, comparing five data fields on each of the two finches.
Comparing cells #
The reproduce function uses the pseudo-random number generators embedded in the CellState data type to decide whether a finch reproduces in a given round. Thus, the number generators change in deterministic, but by human cognition unpredictable, ways. It makes sense to exclude the generators from the assertions, apart from the above assertion that verifies the change itself.
Other functions in the Galapagos module also work on CellState values, but are entirely deterministic; that is, they don't make use of the pseudo-random number generators. One such function is groom, which models what happens when two finches meet and play out their game of Prisoner's Dilemma by deciding to groom the other for parasites, or not. The function has this type:
groom :: Params -> (CellState, CellState) -> (CellState, CellState)
By specification, this function has no random behaviour, which means that we expect the number generators to stay the same. Even so, due to the lack of an Eq instance, comparing cells is difficult.
testCase "Groom when right cell is empty" $ let cell1 = Galapagos.CellState (Just flipflop) (mkStdGen 0) cell2 = Galapagos.CellState Nothing (mkStdGen 1) actual = Galapagos.groom Galapagos.defaultParams (cell1, cell2) in do ( Galapagos.finchHP <$> Galapagos.cellFinch (fst actual)) @?= Galapagos.finchHP <$> Galapagos.cellFinch cell1 ( Galapagos.finchHP <$> Galapagos.cellFinch (snd actual)) @?= Nothing
Instead of comparing cells, this test only considers the contents of each cell, and it only compares a single field, finchHP, as a proxy for comparing the more complete data structure.
With FinchEq we have a better way of comparing two finches, but we don't have to stop there. We may introduce another test-utility type that can compare cells.
data CellStateEq = CellStateEq { cseqFinch :: Maybe FinchEq , cseqRNG :: StdGen } deriving (Eq, Show)
A helper function also turns out to be useful.
cellStateEq :: Galapagos.CellState -> CellStateEq cellStateEq cs = CellStateEq { cseqFinch = cellFinchEq cs , cseqRNG = Galapagos.cellRNG cs }
We can now rewrite the test to compare both cells in their entirety (minus the finchStrategy).
testCase "Groom when right cell is empty" $ let cell1 = Galapagos.CellState (Just flipflop) (mkStdGen 0) cell2 = Galapagos.CellState Nothing (mkStdGen 1) actual = Galapagos.groom Galapagos.defaultParams (cell1, cell2) in (cellStateEq <$> Pair actual) @?= cellStateEq <$> Pair (cell1, cell2)
Again, the test is both simpler and stronger.
A fly in the ointment #
Introducing FinchEq and CellStateEq allowed me to improve most of the tests, but a few annoying issues remain. The most illustrative example is this test of the core groom behaviour, which lets two example Finch values named samaritan and cheater interact.
testCase "Groom two finches" $ let cell1 = Galapagos.CellState (Just samaritan) (mkStdGen 0) cell2 = Galapagos.CellState (Just cheater) (mkStdGen 1) actual = Galapagos.groom Galapagos.defaultParams (cell1, cell2) expected = Just <$> Pair ( finchEq $ samaritan { Galapagos.finchHP = 16 } , finchEq $ cheater { Galapagos.finchHP = 13 } ) in (cellFinchEq <$> Pair actual) @?= expected
This test ought to compare cells with CellStateEq, but only compares finches. The practical reason is that defining the expected value as a pair of cells entails embedding the expected finches in their respective cells. This is possible, but awkward, due to the nested nature of the data types.
It's possible to do something about that, too, but that's the topic for another article.
Conclusion #
If a test is difficult to write, it may be a symptom that the System Under Test (SUT) has an API which is difficult to use. When doing test-driven development you may want to reconsider the API. Is there a way to model the desired data and behaviour in such a way that the tests become simpler? If so, the API may improve in general.
Sometimes, however, you can't change the SUT API. Perhaps it's already given. Perhaps improving it would be a breaking change. Or perhaps you simply can't think of a better way.
An alternative to changing the SUT API is to introduce test utilities, such as types with test-specific equality. This is hardly better than improving the SUT API, but may be useful in those situations where the best option is unavailable.