Typing and testing problem 23 by Mark Seemann
Yet another reflection on the relationship between types and tests, this time with a simple example.
The debate about dynamic typing versus static typing still goes on. If it ever gets resolved, I suppose it'll be in the far future. Until then, one's position is bound to be determined mostly by experience and belief. I openly admit that I prefer statically typed languages like F# and Haskell.
As I've previously touched on, I can't help seeing types as a slider. The more to the right you pull it, the stronger the type system. The more to the left you pull it, the more you'll need automated tests to give you a sense of confidence in your code.
In this article, I'll share an small revelation recently given to me.
Problem 23 #
The problem is elementary:
λ> rndSelect "abcdefgh" 3 "fag"
The first argument to
rndSelect is the candidates from which to pick elements; in this case the letters a to h. The second argument is the number of values to select; in this case the number 3.
Test plan #
How does one test a function like that? Clearly, when randomness is involved, you'll need some way to regulate the randomness in order to make tests deterministic. With my blinders on, I assumed that this was the main problem, so I answered with the following plan for a few properties:
- The length of the returned list should be equal to the input length.
- All elements in the returned list should be elements of the list of candidates.
In response to this plan, the user chi commented on my second suggestion:
"I think this it is a consequence of the free theorem. If so, no need to test for that!"Sometimes, I find it difficult to shake my object-oriented TDD-influenced way of thinking, but chi is right. Here's why:
Parametric polymorphism #
.NET, including C# and F#, has a platform feature called generics. Haskell has generics as well, although normally, that language feature is called parametric polymorphism. What I had in mind was a set of parametrically polymorphic functions with these types:
rndGenSelect :: (RandomGen g, Integral i) => g -> [a] -> i -> [a] rndSelect :: Integral i => [a] -> i -> IO [a]
Notice that both functions return lists of
a values, where
a is a type variable (in C#, you'd call it a generic type argument). It could be
Day, or a custom domain type you'd added to the code base two minutes earlier.
Given a completely unrestricted type variable, Haskell has no way of creating values. How could it, logically?
In C#, you can write
default(T), which tends to mostly produce null references. Haskell doesn't have null, so with that option cut off, how would it be able to produce values of arbitrary types? It can't.
When returning a list of
a values, the only option open to a parametric polymorphic function is to pick values from its input arguments. For both
rndSelect, there's only a single source of
a values, so there's no reason to test that the functions return values from those lists of candidates. It's the only thing it can do. That's the free theorem for that function.
It'd been an entirely different story if the function had had concrete types. If, for example, the function had had the type
RandomGen g => g -> String -> Int -> String, I could have written a function like this one:
rndGenSelect' :: RandomGen g => g -> String -> Int -> String rndGenSelect' _ _ count = replicate count 's'
Because the type of elements is known at compile-time, we can pick an arbitrary
Char value (
's'). This is possible because we know the type, and therefore can come up with a strategy to hard-code known values of that type. When the type argument is unknown, this is no longer possible. To paraphrase Robert C. Martin, as the types get more generic, the tests become more redundant.
Taming randomness #
Before we look at automated testing, let's consider how to turn randomness into deterministic behaviour. This is (seemingly) always a problem with unit testing when the desired behaviour contains randomness, because tests should be deterministic. Once again, however, it turns out that functional design is intrinsically testable. Since Haskell design favours pure functions, the core of
System.Random is deterministic.
This is, in fact, not much different from C#, where the Random class encapsulates an algorithm that computes a series of random-looking values based on an initial seed value. If you give it the same seed, it'll produce the same sequence of random-looking numbers. Haskell works the same way.
This led me to a design with a 'core' function that does all the work, and a 'wrapper' function that only adds one extra feature: randomness.
Starting my design with types, I wanted a function with this type:
rndGenSelect :: (RandomGen g, Integral i) => g -> [a] -> i -> [a]
This is the type that I've already discussed above. Because of the free theorem, we already know that the returned list can only contain values selected from the input list. In other words, there's no need to test for that.
This function takes a
RandomGen argument, which is a type class of pure functions.
RandomGen itself is pure; the source of randomness comes from how it's produced. More on that later. This, however, should enable me to write deterministic tests.
Before we start adding deterministic tests, let's see how far we can get with property-based testing. First, designing with types, I need to implement the function so that it compiles. This is the simplest implementation I could think of:
rndGenSelect :: (RandomGen g, Integral i) => g -> [a] -> i -> [a] rndGenSelect _ xs _ = [head xs]
This implementation is both incorrect and unsafe, but it compiles. In TDD fashion, then, I found it appropriate to add a test - in this case a QuickCheck property:
lenProp :: Integral i => Int -> [a] -> NonNegative i -> Bool lenProp seed xs (NonNegative i) = i == genericLength (rndGenSelect (mkStdGen seed) xs i)
This little piece of test code is the only surviving property from my original test plan. It states that for any non-negative count, the list returned from
rndGenSelect should have the requested length.
Writing this property, however, quickly forced me to deal with the case where the count is negative. It's easy to forget about edge cases when your function is nothing but a pie in the sky, but QuickCheck (and property-based testing in general) is really effective at grounding you in reality. Even with a language like Haskell, I still find the fast feedback loop from tests helpful.
The original exercise specification doesn't mention what should happen if the count is negative, so after short deliberation, I decide to write another property:
negLenProp :: Integral i => Int -> [a] -> Positive i -> Bool negLenProp seed xs (Positive i) = 0 == genericLength (rndGenSelect (mkStdGen seed) xs (-i))
This property simply states that for all negative counts, the returned list should be empty.
Both of these properties obviously fail, because of the incorrect implementation.
The simplest implementation I could think of that passes both properties is this:
rndGenSelect :: (RandomGen g, Integral i) => g -> [a] -> i -> [a] rndGenSelect _ xs count = genericReplicate count (head xs)
At this point, I don't see how TDD or property-based testing can help me move forward. The remaining work required is to add randomness to the mix. In this case, I'll need to use the
RandomGen argument to produce random values, but since I don't know how its algorithm works, then even if I had a seed value known at compile-time, I wouldn't be able to predict which values it'd produce.
Selecting random indices #
I admit that I don't know how to write the next test a priori. I do know, however, that if I implement what's missing, I have a deterministic function, and I can use it to write regression test. In other words, I'll reverse direction and write the code first, and then the test. What a novel idea.
rndGenSelect :: (RandomGen g, Integral i) => g -> [a] -> i -> [a] rndGenSelect rnd xs count = let indices = genericTake count $ randomRs (0, length xs - 1) rnd in fmap (xs !!) indices
This function first uses
randomRs to produce an infinite list of values. These values are indices because they all fall between
length xs - 1. In other words, they are indices into
While the list is infinite, it's lazily evaluated, so infinity itself isn't a problem. We only need
count elements, though, so we can simply take the first
Finally, the function maps over the list of indices, and for each index value, selects the element at that position.
I could inline
indices in the return expression, like this:
rndGenSelect :: (RandomGen g, Integral i) => g -> [a] -> i -> [a] rndGenSelect rnd xs count = fmap (xs !!) $ genericTake count $ randomRs (0, length xs - 1) rnd
I find that more obscure than the first alternative, though, but both versions pass the properties and do what they're supposed to do.
Regression testing #
How do I know that my code works? Well, that's always difficult with code that contains randomness, but you can load the function into GHCi and perform some sanity testing:
λ> rndGenSelect (mkStdGen 42) "foo" 3 "ofo" λ> rndGenSelect (mkStdGen 1337) "bar" 10 "rabbaarrra" λ> rndGenSelect (mkStdGen (-197221)) ['a'..'z'] 5 "ntfnc"
That looks, I suppose, random enough... What's more important is that this is completely repeatable. This means that I can write parametrised tests that protect against regressions:
"rndGenSelect of chars returns correct result" ~: do (seed, xs, count, expected) <- [ ( 42, "foo", 3, "ofo"), ( 1337, "bar", 10, "rabbaarrra"), (-197221, ['a'..'z'], 5, "ntfnc") ] let rnd = mkStdGen seed let actual = rndGenSelect rnd xs count return $ expected ~=? actual
These tests don't drive the design, but they prevent regressions. If, at a later time, I, or someone else, inadvertently revert
genericReplicate count (head xs), these tests will fail.
Humble function #
The original problem statement is to write a function without an explicit
RandomGen argument. In the spirit of xUnit Test Patterns' Humble Object pattern, we can now click all our pieces together to a function that does what is required:
rndSelect :: Integral i => [a] -> i -> IO [a] rndSelect xs count = do rnd <- newStdGen return $ rndGenSelect rnd xs count
The only thing of interest here is that the function is impure, because it uses
newStdGen to produce a random
RandomGen value. It then delegates all work to
rndGenSelect, which is covered by tests.
As you can see, this function does not exhibit repeatable behaviour:
λ> rndSelect "abcdefgh" 3 "add" λ> rndSelect "abcdefgh" 3 "daf"
This should, I think, address the original problem statement.
All source code for this article is available on GitHub.
The first time I encountered parametric polymorphism was when C# got generics in 2005. Back then it was mostly explained as a mechanism to avoid boxing, although it also seriously reduced the amount of boilerplate code you'd have to write in order to have type-safe collections. In many years, I mostly understood C# generics as a language feature aimed at efficiency and programmer productivity.
It wasn't until I started to program in F#, with its stronger type inference, that it began to dawn on me that parametric polymorphism could also be a design tool. Making a function more generic tightens its contract, so to speak. The more generic a function becomes, the less wriggle room does it have. This may sound like a disadvantage to a programmer, but it's a boon to a code reader. When you, as a reader, encounter a parametrically polymorphic function, you already know that there are things that function can't do. Such functions come with invariants, or 'theorems', for free.