# Test Data Generator monad by Mark Seemann

*With examples in C# and F#.*

This article is an instalment in an article series about monads. In other related series previous articles described Test Data Generator as a functor, as well as Test Data Generator as an applicative functor. As is the case with many (but not all) functors, this one also forms a monad.

This article expands on the code from the above-mentioned articles about Test Data Generators. Keep in mind that the code is a simplified version of what you'll find in a real property-based testing framework. It lacks shrinking and referentially transparent (pseudo-)random value generation. Probably more things than that, too.

### SelectMany #

A monad must define either a *bind* or *join* function. In C#, monadic bind is called `SelectMany`

. For the `Generator<T>`

class, you can implement it as an instance method like this:

public Generator<TResult> SelectMany<TResult>(Func<T, Generator<TResult>> selector) { Func<Random, TResult> newGenerator = r => { Generator<TResult> g = selector(generate(r)); return g.Generate(r); }; return new Generator<TResult>(newGenerator); }

`SelectMany`

enables you to chain generators together. You'll see an example later in the article.

### Query syntax #

As the monad article explains, you can enable C# query syntax by adding a special `SelectMany`

overload:

public Generator<TResult> SelectMany<U, TResult>( Func<T, Generator<U>> k, Func<T, U, TResult> s) { return SelectMany(x => k(x).Select(y => s(x, y))); }

The implementation body always looks the same; only the method signature varies from monad to monad. Again, I'll show you an example of using query syntax later in the article.

### Flatten #

In the introduction you learned that if you have a `Flatten`

or `Join`

function, you can implement `SelectMany`

, and the other way around. Since we've already defined `SelectMany`

for `Generator<T>`

, we can use that to implement `Flatten`

. In this article I use the name `Flatten`

rather than `Join`

. This is an arbitrary choice that doesn't impact behaviour. Perhaps you find it confusing that I'm inconsistent, but I do it in order to demonstrate that the behaviour is the same even if the name is different.

public static Generator<T> Flatten<T>(this Generator<Generator<T>> generator) { return generator.SelectMany(x => x); }

As you can tell, this function has to be an extension method, since we can't have a class typed `Generator<Generator<T>>`

. As usual, when you already have `SelectMany`

, the body of `Flatten`

(or `Join`

) is always the same.

### Return #

Apart from monadic bind, a monad must also define a way to put a normal value into the monad. Conceptually, I call this function *return* (because that's the name that Haskell uses):

public static Generator<T> Return<T>(T value) { return new Generator<T>(_ => value); }

This function ignores the random number generator and always returns `value`

.

### Left identity #

We needed to identify the *return* function in order to examine the monad laws. Let's see what they look like for the Test Data Generator monad, starting with the left identity law.

[Theory] [InlineData(17, 0)] [InlineData(17, 8)] [InlineData(42, 0)] [InlineData(42, 1)] public void LeftIdentityLaw(int x, int seed) { Func<int, Generator<string>> h = i => new Generator<string>(r => r.Next(i).ToString()); Assert.Equal( Generator.Return(x).SelectMany(h).Generate(new Random(seed)), h(x).Generate(new Random(seed))); }

Notice that the test can't directly compare the two generators, because equality isn't clearly defined for that class. Instead, the test has to call `Generate`

in order to produce comparable values; in this case, strings.

Since `Generate`

is non-deterministic, the test has to `seed`

the random number generator argument in order to get reproducible results. It can't even declare one `Random`

object and share it across both method calls, since generating values changes the state of the object. Instead, the test has to generate two separate `Random`

objects, one for each call to `Generate`

, but with the same `seed`

.

### Right identity #

In a manner similar to above, we can showcase the right identity law as a test.

[Theory] [InlineData('a', 0)] [InlineData('a', 8)] [InlineData('j', 0)] [InlineData('j', 5)] public void RightIdentityLaw(char letter, int seed) { Func<char, Generator<string>> f = c => new Generator<string>(r => new string(c, r.Next(100))); Generator<string> m = f(letter); Assert.Equal( m.SelectMany(Generator.Return).Generate(new Random(seed)), m.Generate(new Random(seed))); }

As always, even a parametrised test constitutes no *proof* that the law holds. I show the tests to illustrate what the laws look like in 'real' code.

### Associativity #

The last monad law is the associativity law that describes how (at least) three functions compose. We're going to need three functions. For the demonstration test I'm going to conjure three nonsense functions. While this may not be as intuitive, it on the other hand reduces the noise that more realistic code tends to produce. Later in the article you'll see a more realistic example.

[Theory] [InlineData('t', 0)] [InlineData('t', 28)] [InlineData('u', 0)] [InlineData('u', 98)] public void AssociativityLaw(char a, int seed) { Func<char, Generator<string>> f = c => new Generator<string>(r => new string(c, r.Next(100))); Func<string, Generator<int>> g = s => new Generator<int>(r => r.Next(s.Length)); Func<int, Generator<TimeSpan>> h = i => new Generator<TimeSpan>(r => TimeSpan.FromDays(r.Next(i))); Generator<string> m = f(a); Assert.Equal( m.SelectMany(g).SelectMany(h).Generate(new Random(seed)), m.SelectMany(x => g(x).SelectMany(h)).Generate(new Random(seed))); }

All tests pass.

### CPR example #

Formalities out of the way, let's look at a more realistic example. In the article about the Test Data Generator applicative functor you saw an example of parsing a Danish personal identification number, in Danish called *CPR-nummer* (*CPR number*) for *Central Person Register*. (It's not a register of central persons, but rather the central register of persons. Danish works slightly differently than English.)

CPR numbers have a simple format: `DDMMYY-SSSS`

, where the first six digits indicate a person's birth date, and the last four digits are a sequence number. An example could be `010203-1234`

, which indicates a woman born February 1, 1903.

In C# you might model a CPR number as a class with a constructor like this:

public CprNumber(int day, int month, int year, int sequenceNumber) { if (year < 0 || 99 < year) throw new ArgumentOutOfRangeException( nameof(year), "Year must be between 0 and 99, inclusive."); if (month < 1 || 12 < month) throw new ArgumentOutOfRangeException( nameof(month), "Month must be between 1 and 12, inclusive."); if (sequenceNumber < 0 || 9999 < sequenceNumber) throw new ArgumentOutOfRangeException( nameof(sequenceNumber), "Sequence number must be between 0 and 9999, inclusive."); var fourDigitYear = CalculateFourDigitYear(year, sequenceNumber); var daysInMonth = DateTime.DaysInMonth(fourDigitYear, month); if (day < 1 || daysInMonth < day) throw new ArgumentOutOfRangeException( nameof(day), $"Day must be between 1 and {daysInMonth}, inclusive."); this.day = day; this.month = month; this.year = year; this.sequenceNumber = sequenceNumber; }

The system has been around since 1968 so clearly suffers from a Y2k problem, as years are encoded with only two digits. The workaround for this is that the most significant digit of the sequence number encodes the century. At the time I'm writing this, the Danish-language wikipedia entry for CPR-nummer still includes a table that shows how one can derive the century from the sequence number. This enables the CPR system to handle birth dates between 1858 and 2057.

The `CprNumber`

constructor has to consult that table in order to determine the century. It uses the `CalculateFourDigitYear`

function for that. Once it has the four-digit year, it can use the DateTime.DaysInMonth method to determine the number of days in the given month. This is used to validate the day parameter.

The previous article showed a test that made use of a Test Data Generator for the `CprNumber`

class. The generator was referenced as `Gen.CprNumber`

, but how do you define such a generator?

### CPR number generator #

The constructor arguments for `month`

, `year`

, and `sequenceNumber`

are easy to generate. You need a basic generator that produces values between two boundaries. Both QuickCheck and FsCheck call it `choose`

, so I'll reuse that name:

public static Generator<int> Choose(int min, int max) { return new Generator<int>(r => r.Next(min, max + 1)); }

The `choose`

functions of QuickCheck and FsCheck consider both boundaries to be inclusive, so I've done the same. That explains the `+ 1`

, since Random.Next excludes the upper boundary.

You can now combine `choose`

with `DateTime.DaysInMonth`

to generate a valid day:

public static Generator<int> Day(int year, int month) { var daysInMonth = DateTime.DaysInMonth(year, month); return Gen.Choose(1, daysInMonth); }

Let's pause and consider the implications. The point of this example is to demonstrate why it's practically useful that Test Data Generators are monads. Keep in mind that monads are functors you can flatten. When do you need to flatten a functor? Specifically, when do you need to flatten a Test Data Generator? Right now, as it turns out.

The `Day`

method returns a `Generator<int>`

, but where do the `year`

and `month`

arguments come from? They'll typically be produced by another Test Data Generator such as `choose`

. Thus, if you only *map* (`Select`

) over previous Test Data Generators, you'll produce a `Generator<Generator<int>>`

:

Generator<int> genYear = Gen.Choose(1970, 2050); Generator<int> genMonth = Gen.Choose(1, 12); Generator<(int, int)> genYearAndMonth = genYear.Apply(genMonth); Generator<Generator<int>> genDay = genYearAndMonth.Select(t => Gen.Choose(1, DateTime.DaysInMonth(t.Item1, t.Item2)));

This example uses an `Apply`

overload to combine `genYear`

and `genMonth`

. As long as the two generators are independent of each other, you can use the applicative functor capability to combine them. When, however, you need to produce a new generator from a value produced by a previous generator, the functor or applicative functor capabilities are insufficient. If you try to use `Select`

, as in the above example, you'll produce a nested generator.

Since it's a monad, however, you can `Flatten`

it:

Generator<int> flattened = genDay.Flatten();

Or you can use `SelectMany`

(monadic *bind*) to flatten as you go. The `CprNumber`

generator does that, although it uses query syntax syntactic sugar to make the code more readable:

public static Generator<CprNumber> CprNumber => from sequenceNumber in Gen.Choose(0, 9999) from year in Gen.Choose(0, 99) from month in Gen.Choose(1, 12) let fourDigitYear = TestDataBuilderFunctor.CprNumber.CalculateFourDigitYear(year, sequenceNumber) from day in Gen.Day(fourDigitYear, month) select new CprNumber(day, month, year, sequenceNumber);

The expression first uses `Gen.Choose`

to produce three independent `int`

values: `sequenceNumber`

, `year`

, and `month`

. It then uses the `CalculateFourDigitYear`

function to look up the proper century based on the two-digit `year`

and the `sequenceNumber`

. With that information it can call `Gen.Day`

, and since the expression uses monadic composition, it's flattening as it goes. Thus `day`

is an `int`

value rather than a generator.

Finally, the entire expression can compose the four `int`

values into a valid `CprNumber`

object.

You can consult the previous article to see `Gen.CprNumber`

in use.

### Hedgehog CPR generator #

You can reproduce the CPR example in F# using one of several property-based testing frameworks. In this example, I'll continue the example from the previous article as well as the article Danish CPR numbers in F#. You can see a couple of tests in these articles. They use the `cprNumber`

generator, but never show the code.

In all the property-based testing frameworks I've seen, generators are called `Gen`

. This is also the case for Hedgehog. The `Gen`

container is a monad, and there's a `gen`

computation expression that supplies syntactic sugar.

You can translate the above example to a Hedgehog `Gen`

value like this:

let cprNumber = gen { let! sequenceNumber = Range.linear 0 9999 |> Gen.int32 let! year = Range.linear 0 99 |> Gen.int32 let! month = Range.linear 1 12 |> Gen.int32 let fourDigitYear = Cpr.calculateFourDigitYear year sequenceNumber let daysInMonth = DateTime.DaysInMonth (fourDigitYear, month) let! day = Range.linear 1 daysInMonth |> Gen.int32 return Cpr.tryCreate day month year sequenceNumber } |> Gen.some

To keep the example simple, I haven't defined an explicit `day`

generator, but instead just inlined `DateTime.DaysInMonth`

.

Consult the articles that I linked above to see the `Gen.cprNumber`

generator in use.

### Conclusion #

Test Data Generators form monads. This is useful when you need to generate test data that depend on other generated test data. Monadic *bind* (`SelectMany`

in C#) can flatten the generator functor as you go. This article showed examples in both C# and F#.

The same abstraction also exists in the Haskell QuickCheck library, but I haven't shown any Haskell examples. If you've taken the trouble to learn Haskell (which you should), you already know what a monad is.

**Next:** Functor relationships.

## Comments

CsCheck is a full implementation of something along these lines. It uses the same random sample generation in the shrinking step always reducing a Size measure. It turns out to be a better way of shrinking than the QuickCheck way.

Anthony, thank you for writing. You'll be pleased to learn, I take it, that the next article in the series about the epistemology of interaction testing uses CsCheck as the example framework.