Test Data Generator monad by Mark Seemann
With examples in C# and F#.
This article is an instalment in an article series about monads. In other related series previous articles described Test Data Generator as a functor, as well as Test Data Generator as an applicative functor. As is the case with many (but not all) functors, this one also forms a monad.
This article expands on the code from the above-mentioned articles about Test Data Generators. Keep in mind that the code is a simplified version of what you'll find in a real property-based testing framework. It lacks shrinking and referentially transparent (pseudo-)random value generation. Probably more things than that, too.
SelectMany #
A monad must define either a bind or join function. In C#, monadic bind is called SelectMany
. For the Generator<T>
class, you can implement it as an instance method like this:
public Generator<TResult> SelectMany<TResult>(Func<T, Generator<TResult>> selector) { Func<Random, TResult> newGenerator = r => { Generator<TResult> g = selector(generate(r)); return g.Generate(r); }; return new Generator<TResult>(newGenerator); }
SelectMany
enables you to chain generators together. You'll see an example later in the article.
Query syntax #
As the monad article explains, you can enable C# query syntax by adding a special SelectMany
overload:
public Generator<TResult> SelectMany<U, TResult>( Func<T, Generator<U>> k, Func<T, U, TResult> s) { return SelectMany(x => k(x).Select(y => s(x, y))); }
The implementation body always looks the same; only the method signature varies from monad to monad. Again, I'll show you an example of using query syntax later in the article.
Flatten #
In the introduction you learned that if you have a Flatten
or Join
function, you can implement SelectMany
, and the other way around. Since we've already defined SelectMany
for Generator<T>
, we can use that to implement Flatten
. In this article I use the name Flatten
rather than Join
. This is an arbitrary choice that doesn't impact behaviour. Perhaps you find it confusing that I'm inconsistent, but I do it in order to demonstrate that the behaviour is the same even if the name is different.
public static Generator<T> Flatten<T>(this Generator<Generator<T>> generator) { return generator.SelectMany(x => x); }
As you can tell, this function has to be an extension method, since we can't have a class typed Generator<Generator<T>>
. As usual, when you already have SelectMany
, the body of Flatten
(or Join
) is always the same.
Return #
Apart from monadic bind, a monad must also define a way to put a normal value into the monad. Conceptually, I call this function return (because that's the name that Haskell uses):
public static Generator<T> Return<T>(T value) { return new Generator<T>(_ => value); }
This function ignores the random number generator and always returns value
.
Left identity #
We needed to identify the return function in order to examine the monad laws. Let's see what they look like for the Test Data Generator monad, starting with the left identity law.
[Theory] [InlineData(17, 0)] [InlineData(17, 8)] [InlineData(42, 0)] [InlineData(42, 1)] public void LeftIdentityLaw(int x, int seed) { Func<int, Generator<string>> h = i => new Generator<string>(r => r.Next(i).ToString()); Assert.Equal( Generator.Return(x).SelectMany(h).Generate(new Random(seed)), h(x).Generate(new Random(seed))); }
Notice that the test can't directly compare the two generators, because equality isn't clearly defined for that class. Instead, the test has to call Generate
in order to produce comparable values; in this case, strings.
Since Generate
is non-deterministic, the test has to seed
the random number generator argument in order to get reproducible results. It can't even declare one Random
object and share it across both method calls, since generating values changes the state of the object. Instead, the test has to generate two separate Random
objects, one for each call to Generate
, but with the same seed
.
Right identity #
In a manner similar to above, we can showcase the right identity law as a test.
[Theory] [InlineData('a', 0)] [InlineData('a', 8)] [InlineData('j', 0)] [InlineData('j', 5)] public void RightIdentityLaw(char letter, int seed) { Func<char, Generator<string>> f = c => new Generator<string>(r => new string(c, r.Next(100))); Generator<string> m = f(letter); Assert.Equal( m.SelectMany(Generator.Return).Generate(new Random(seed)), m.Generate(new Random(seed))); }
As always, even a parametrised test constitutes no proof that the law holds. I show the tests to illustrate what the laws look like in 'real' code.
Associativity #
The last monad law is the associativity law that describes how (at least) three functions compose. We're going to need three functions. For the demonstration test I'm going to conjure three nonsense functions. While this may not be as intuitive, it on the other hand reduces the noise that more realistic code tends to produce. Later in the article you'll see a more realistic example.
[Theory] [InlineData('t', 0)] [InlineData('t', 28)] [InlineData('u', 0)] [InlineData('u', 98)] public void AssociativityLaw(char a, int seed) { Func<char, Generator<string>> f = c => new Generator<string>(r => new string(c, r.Next(100))); Func<string, Generator<int>> g = s => new Generator<int>(r => r.Next(s.Length)); Func<int, Generator<TimeSpan>> h = i => new Generator<TimeSpan>(r => TimeSpan.FromDays(r.Next(i))); Generator<string> m = f(a); Assert.Equal( m.SelectMany(g).SelectMany(h).Generate(new Random(seed)), m.SelectMany(x => g(x).SelectMany(h)).Generate(new Random(seed))); }
All tests pass.
CPR example #
Formalities out of the way, let's look at a more realistic example. In the article about the Test Data Generator applicative functor you saw an example of parsing a Danish personal identification number, in Danish called CPR-nummer (CPR number) for Central Person Register. (It's not a register of central persons, but rather the central register of persons. Danish works slightly differently than English.)
CPR numbers have a simple format: DDMMYY-SSSS
, where the first six digits indicate a person's birth date, and the last four digits are a sequence number. An example could be 010203-1234
, which indicates a woman born February 1, 1903.
In C# you might model a CPR number as a class with a constructor like this:
public CprNumber(int day, int month, int year, int sequenceNumber) { if (year < 0 || 99 < year) throw new ArgumentOutOfRangeException( nameof(year), "Year must be between 0 and 99, inclusive."); if (month < 1 || 12 < month) throw new ArgumentOutOfRangeException( nameof(month), "Month must be between 1 and 12, inclusive."); if (sequenceNumber < 0 || 9999 < sequenceNumber) throw new ArgumentOutOfRangeException( nameof(sequenceNumber), "Sequence number must be between 0 and 9999, inclusive."); var fourDigitYear = CalculateFourDigitYear(year, sequenceNumber); var daysInMonth = DateTime.DaysInMonth(fourDigitYear, month); if (day < 1 || daysInMonth < day) throw new ArgumentOutOfRangeException( nameof(day), $"Day must be between 1 and {daysInMonth}, inclusive."); this.day = day; this.month = month; this.year = year; this.sequenceNumber = sequenceNumber; }
The system has been around since 1968 so clearly suffers from a Y2k problem, as years are encoded with only two digits. The workaround for this is that the most significant digit of the sequence number encodes the century. At the time I'm writing this, the Danish-language wikipedia entry for CPR-nummer still includes a table that shows how one can derive the century from the sequence number. This enables the CPR system to handle birth dates between 1858 and 2057.
The CprNumber
constructor has to consult that table in order to determine the century. It uses the CalculateFourDigitYear
function for that. Once it has the four-digit year, it can use the DateTime.DaysInMonth method to determine the number of days in the given month. This is used to validate the day parameter.
The previous article showed a test that made use of a Test Data Generator for the CprNumber
class. The generator was referenced as Gen.CprNumber
, but how do you define such a generator?
CPR number generator #
The constructor arguments for month
, year
, and sequenceNumber
are easy to generate. You need a basic generator that produces values between two boundaries. Both QuickCheck and FsCheck call it choose
, so I'll reuse that name:
public static Generator<int> Choose(int min, int max) { return new Generator<int>(r => r.Next(min, max + 1)); }
The choose
functions of QuickCheck and FsCheck consider both boundaries to be inclusive, so I've done the same. That explains the + 1
, since Random.Next excludes the upper boundary.
You can now combine choose
with DateTime.DaysInMonth
to generate a valid day:
public static Generator<int> Day(int year, int month) { var daysInMonth = DateTime.DaysInMonth(year, month); return Gen.Choose(1, daysInMonth); }
Let's pause and consider the implications. The point of this example is to demonstrate why it's practically useful that Test Data Generators are monads. Keep in mind that monads are functors you can flatten. When do you need to flatten a functor? Specifically, when do you need to flatten a Test Data Generator? Right now, as it turns out.
The Day
method returns a Generator<int>
, but where do the year
and month
arguments come from? They'll typically be produced by another Test Data Generator such as choose
. Thus, if you only map (Select
) over previous Test Data Generators, you'll produce a Generator<Generator<int>>
:
Generator<int> genYear = Gen.Choose(1970, 2050); Generator<int> genMonth = Gen.Choose(1, 12); Generator<(int, int)> genYearAndMonth = genYear.Apply(genMonth); Generator<Generator<int>> genDay = genYearAndMonth.Select(t => Gen.Choose(1, DateTime.DaysInMonth(t.Item1, t.Item2)));
This example uses an Apply
overload to combine genYear
and genMonth
. As long as the two generators are independent of each other, you can use the applicative functor capability to combine them. When, however, you need to produce a new generator from a value produced by a previous generator, the functor or applicative functor capabilities are insufficient. If you try to use Select
, as in the above example, you'll produce a nested generator.
Since it's a monad, however, you can Flatten
it:
Generator<int> flattened = genDay.Flatten();
Or you can use SelectMany
(monadic bind) to flatten as you go. The CprNumber
generator does that, although it uses query syntax syntactic sugar to make the code more readable:
public static Generator<CprNumber> CprNumber => from sequenceNumber in Gen.Choose(0, 9999) from year in Gen.Choose(0, 99) from month in Gen.Choose(1, 12) let fourDigitYear = TestDataBuilderFunctor.CprNumber.CalculateFourDigitYear(year, sequenceNumber) from day in Gen.Day(fourDigitYear, month) select new CprNumber(day, month, year, sequenceNumber);
The expression first uses Gen.Choose
to produce three independent int
values: sequenceNumber
, year
, and month
. It then uses the CalculateFourDigitYear
function to look up the proper century based on the two-digit year
and the sequenceNumber
. With that information it can call Gen.Day
, and since the expression uses monadic composition, it's flattening as it goes. Thus day
is an int
value rather than a generator.
Finally, the entire expression can compose the four int
values into a valid CprNumber
object.
You can consult the previous article to see Gen.CprNumber
in use.
Hedgehog CPR generator #
You can reproduce the CPR example in F# using one of several property-based testing frameworks. In this example, I'll continue the example from the previous article as well as the article Danish CPR numbers in F#. You can see a couple of tests in these articles. They use the cprNumber
generator, but never show the code.
In all the property-based testing frameworks I've seen, generators are called Gen
. This is also the case for Hedgehog. The Gen
container is a monad, and there's a gen
computation expression that supplies syntactic sugar.
You can translate the above example to a Hedgehog Gen
value like this:
let cprNumber = gen { let! sequenceNumber = Range.linear 0 9999 |> Gen.int32 let! year = Range.linear 0 99 |> Gen.int32 let! month = Range.linear 1 12 |> Gen.int32 let fourDigitYear = Cpr.calculateFourDigitYear year sequenceNumber let daysInMonth = DateTime.DaysInMonth (fourDigitYear, month) let! day = Range.linear 1 daysInMonth |> Gen.int32 return Cpr.tryCreate day month year sequenceNumber } |> Gen.some
To keep the example simple, I haven't defined an explicit day
generator, but instead just inlined DateTime.DaysInMonth
.
Consult the articles that I linked above to see the Gen.cprNumber
generator in use.
Conclusion #
Test Data Generators form monads. This is useful when you need to generate test data that depend on other generated test data. Monadic bind (SelectMany
in C#) can flatten the generator functor as you go. This article showed examples in both C# and F#.
The same abstraction also exists in the Haskell QuickCheck library, but I haven't shown any Haskell examples. If you've taken the trouble to learn Haskell (which you should), you already know what a monad is.
Next: Functor relationships.
Comments
CsCheck is a full implementation of something along these lines. It uses the same random sample generation in the shrinking step always reducing a Size measure. It turns out to be a better way of shrinking than the QuickCheck way.
Anthony, thank you for writing. You'll be pleased to learn, I take it, that the next article in the series about the epistemology of interaction testing uses CsCheck as the example framework.