With examples in C# and F#.

This article is an instalment in an article series about monads. In other related series previous articles described Test Data Generator as a functor, as well as Test Data Generator as an applicative functor. As is the case with many (but not all) functors, this one also forms a monad.

This article expands on the code from the above-mentioned articles about Test Data Generators. Keep in mind that the code is a simplified version of what you'll find in a real property-based testing framework. It lacks shrinking and referentially transparent (pseudo-)random value generation. Probably more things than that, too.

SelectMany #

A monad must define either a bind or join function. In C#, monadic bind is called SelectMany. For the Generator<T> class, you can implement it as an instance method like this:

public Generator<TResult> SelectMany<TResult>(Func<T, Generator<TResult>> selector)
{
    Func<Random, TResult> newGenerator = r =>
    {
        Generator<TResult> g = selector(generate(r));
        return g.Generate(r);
    };
    return new Generator<TResult>(newGenerator);
}

SelectMany enables you to chain generators together. You'll see an example later in the article.

Query syntax #

As the monad article explains, you can enable C# query syntax by adding a special SelectMany overload:

public Generator<TResult> SelectMany<UTResult>(
    Func<T, Generator<U>> k,
    Func<T, U, TResult> s)
{
    return SelectMany(x => k(x).Select(y => s(x, y)));
}

The implementation body always looks the same; only the method signature varies from monad to monad. Again, I'll show you an example of using query syntax later in the article.

Flatten #

In the introduction you learned that if you have a Flatten or Join function, you can implement SelectMany, and the other way around. Since we've already defined SelectMany for Generator<T>, we can use that to implement Flatten. In this article I use the name Flatten rather than Join. This is an arbitrary choice that doesn't impact behaviour. Perhaps you find it confusing that I'm inconsistent, but I do it in order to demonstrate that the behaviour is the same even if the name is different.

public static Generator<T> Flatten<T>(this Generator<Generator<T>> generator)
{
    return generator.SelectMany(x => x);
}

As you can tell, this function has to be an extension method, since we can't have a class typed Generator<Generator<T>>. As usual, when you already have SelectMany, the body of Flatten (or Join) is always the same.

Return #

Apart from monadic bind, a monad must also define a way to put a normal value into the monad. Conceptually, I call this function return (because that's the name that Haskell uses):

public static Generator<T> Return<T>(T value)
{
    return new Generator<T>(_ => value);
}

This function ignores the random number generator and always returns value.

Left identity #

We needed to identify the return function in order to examine the monad laws. Let's see what they look like for the Test Data Generator monad, starting with the left identity law.

[Theory]
[InlineData(17, 0)]
[InlineData(17, 8)]
[InlineData(42, 0)]
[InlineData(42, 1)]
public void LeftIdentityLaw(int xint seed)
{
    Func<int, Generator<string>> h = i => new Generator<string>(r => r.Next(i).ToString());
    Assert.Equal(
        Generator.Return(x).SelectMany(h).Generate(new Random(seed)),
        h(x).Generate(new Random(seed)));
}

Notice that the test can't directly compare the two generators, because equality isn't clearly defined for that class. Instead, the test has to call Generate in order to produce comparable values; in this case, strings.

Since Generate is non-deterministic, the test has to seed the random number generator argument in order to get reproducible results. It can't even declare one Random object and share it across both method calls, since generating values changes the state of the object. Instead, the test has to generate two separate Random objects, one for each call to Generate, but with the same seed.

Right identity #

In a manner similar to above, we can showcase the right identity law as a test.

[Theory]
[InlineData('a', 0)]
[InlineData('a', 8)]
[InlineData('j', 0)]
[InlineData('j', 5)]
public void RightIdentityLaw(char letterint seed)
{
    Func<char, Generator<string>> f = c => new Generator<string>(r => new string(c, r.Next(100)));
    Generator<stringm = f(letter);
    Assert.Equal(
        m.SelectMany(Generator.Return).Generate(new Random(seed)),
        m.Generate(new Random(seed)));
}

As always, even a parametrised test constitutes no proof that the law holds. I show the tests to illustrate what the laws look like in 'real' code.

Associativity #

The last monad law is the associativity law that describes how (at least) three functions compose. We're going to need three functions. For the demonstration test I'm going to conjure three nonsense functions. While this may not be as intuitive, it on the other hand reduces the noise that more realistic code tends to produce. Later in the article you'll see a more realistic example.

[Theory]
[InlineData('t',  0)]
[InlineData('t', 28)]
[InlineData('u',  0)]
[InlineData('u', 98)]
public void AssociativityLaw(char aint seed)
{
    Func<char, Generator<string>> f = c => new Generator<string>(r => new string(c, r.Next(100)));
    Func<string, Generator<int>> g = s => new Generator<int>(r => r.Next(s.Length));
    Func<int, Generator<TimeSpan>> h =
        i => new Generator<TimeSpan>(r => TimeSpan.FromDays(r.Next(i)));
 
    Generator<stringm = f(a);
 
    Assert.Equal(
        m.SelectMany(g).SelectMany(h).Generate(new Random(seed)),
        m.SelectMany(x => g(x).SelectMany(h)).Generate(new Random(seed)));
}

All tests pass.

CPR example #

Formalities out of the way, let's look at a more realistic example. In the article about the Test Data Generator applicative functor you saw an example of parsing a Danish personal identification number, in Danish called CPR-nummer (CPR number) for Central Person Register. (It's not a register of central persons, but rather the central register of persons. Danish works slightly differently than English.)

CPR numbers have a simple format: DDMMYY-SSSS, where the first six digits indicate a person's birth date, and the last four digits are a sequence number. An example could be 010203-1234, which indicates a woman born February 1, 1903.

In C# you might model a CPR number as a class with a constructor like this:

public CprNumber(int dayint monthint yearint sequenceNumber)
{
    if (year < 0 || 99 < year)
        throw new ArgumentOutOfRangeException(
            nameof(year),
            "Year must be between 0 and 99, inclusive.");
    if (month < 1 || 12 < month)
        throw new ArgumentOutOfRangeException(
            nameof(month),
            "Month must be between 1 and 12, inclusive.");
    if (sequenceNumber < 0 || 9999 < sequenceNumber)
        throw new ArgumentOutOfRangeException(
            nameof(sequenceNumber),
            "Sequence number must be between 0 and 9999, inclusive.");
 
    var fourDigitYear = CalculateFourDigitYear(year, sequenceNumber);
    var daysInMonth = DateTime.DaysInMonth(fourDigitYear, month);
    if (day < 1 || daysInMonth < day)
        throw new ArgumentOutOfRangeException(
            nameof(day),
            $"Day must be between 1 and {daysInMonth}, inclusive.");
 
    this.day = day;
    this.month = month;
    this.year = year;
    this.sequenceNumber = sequenceNumber;
}

The system has been around since 1968 so clearly suffers from a Y2k problem, as years are encoded with only two digits. The workaround for this is that the most significant digit of the sequence number encodes the century. At the time I'm writing this, the Danish-language wikipedia entry for CPR-nummer still includes a table that shows how one can derive the century from the sequence number. This enables the CPR system to handle birth dates between 1858 and 2057.

The CprNumber constructor has to consult that table in order to determine the century. It uses the CalculateFourDigitYear function for that. Once it has the four-digit year, it can use the DateTime.DaysInMonth method to determine the number of days in the given month. This is used to validate the day parameter.

The previous article showed a test that made use of a Test Data Generator for the CprNumber class. The generator was referenced as Gen.CprNumber, but how do you define such a generator?

CPR number generator #

The constructor arguments for month, year, and sequenceNumber are easy to generate. You need a basic generator that produces values between two boundaries. Both QuickCheck and FsCheck call it choose, so I'll reuse that name:

public static Generator<intChoose(int minint max)
{
    return new Generator<int>(r => r.Next(min, max + 1));
}

The choose functions of QuickCheck and FsCheck consider both boundaries to be inclusive, so I've done the same. That explains the + 1, since Random.Next excludes the upper boundary.

You can now combine choose with DateTime.DaysInMonth to generate a valid day:

public static Generator<intDay(int yearint month)
{
    var daysInMonth = DateTime.DaysInMonth(year, month);
    return Gen.Choose(1, daysInMonth);
}

Let's pause and consider the implications. The point of this example is to demonstrate why it's practically useful that Test Data Generators are monads. Keep in mind that monads are functors you can flatten. When do you need to flatten a functor? Specifically, when do you need to flatten a Test Data Generator? Right now, as it turns out.

The Day method returns a Generator<int>, but where do the year and month arguments come from? They'll typically be produced by another Test Data Generator such as choose. Thus, if you only map (Select) over previous Test Data Generators, you'll produce a Generator<Generator<int>>:

Generator<intgenYear = Gen.Choose(1970, 2050);
Generator<intgenMonth = Gen.Choose(1, 12);
Generator<(intint)> genYearAndMonth = genYear.Apply(genMonth);
 
Generator<Generator<int>> genDay =
    genYearAndMonth.Select(t => Gen.Choose(1, DateTime.DaysInMonth(t.Item1, t.Item2)));

This example uses an Apply overload to combine genYear and genMonth. As long as the two generators are independent of each other, you can use the applicative functor capability to combine them. When, however, you need to produce a new generator from a value produced by a previous generator, the functor or applicative functor capabilities are insufficient. If you try to use Select, as in the above example, you'll produce a nested generator.

Since it's a monad, however, you can Flatten it:

Generator<intflattened = genDay.Flatten();

Or you can use SelectMany (monadic bind) to flatten as you go. The CprNumber generator does that, although it uses query syntax syntactic sugar to make the code more readable:

public static Generator<CprNumber> CprNumber =>
    from sequenceNumber in Gen.Choose(0, 9999)
    from year in Gen.Choose(0, 99)
    from month in Gen.Choose(1, 12)
    let fourDigitYear =
        TestDataBuilderFunctor.CprNumber.CalculateFourDigitYear(year, sequenceNumber)
    from day in Gen.Day(fourDigitYear, month)
    select new CprNumber(day, month, year, sequenceNumber);

The expression first uses Gen.Choose to produce three independent int values: sequenceNumber, year, and month. It then uses the CalculateFourDigitYear function to look up the proper century based on the two-digit year and the sequenceNumber. With that information it can call Gen.Day, and since the expression uses monadic composition, it's flattening as it goes. Thus day is an int value rather than a generator.

Finally, the entire expression can compose the four int values into a valid CprNumber object.

You can consult the previous article to see Gen.CprNumber in use.

Hedgehog CPR generator #

You can reproduce the CPR example in F# using one of several property-based testing frameworks. In this example, I'll continue the example from the previous article as well as the article Danish CPR numbers in F#. You can see a couple of tests in these articles. They use the cprNumber generator, but never show the code.

In all the property-based testing frameworks I've seen, generators are called Gen. This is also the case for Hedgehog. The Gen container is a monad, and there's a gen computation expression that supplies syntactic sugar.

You can translate the above example to a Hedgehog Gen value like this:

let cprNumber =
    gen {
        let! sequenceNumber = Range.linear 0 9999 |> Gen.int32
        let! year           = Range.linear 0   99 |> Gen.int32
        let! month          = Range.linear 1   12 |> Gen.int32
        let fourDigitYear   = Cpr.calculateFourDigitYear year sequenceNumber
        let daysInMonth     = DateTime.DaysInMonth (fourDigitYear, month)
        let! day            = Range.linear 1 daysInMonth |> Gen.int32
        return Cpr.tryCreate day month year sequenceNumber }
    |> Gen.some

To keep the example simple, I haven't defined an explicit day generator, but instead just inlined DateTime.DaysInMonth.

Consult the articles that I linked above to see the Gen.cprNumber generator in use.

Conclusion #

Test Data Generators form monads. This is useful when you need to generate test data that depend on other generated test data. Monadic bind (SelectMany in C#) can flatten the generator functor as you go. This article showed examples in both C# and F#.

The same abstraction also exists in the Haskell QuickCheck library, but I haven't shown any Haskell examples. If you've taken the trouble to learn Haskell (which you should), you already know what a monad is.

Next: Functor relationships.


Comments

CsCheck is a full implementation of something along these lines. It uses the same random sample generation in the shrinking step always reducing a Size measure. It turns out to be a better way of shrinking than the QuickCheck way.

2023-02-27 22:51 UTC

Anthony, thank you for writing. You'll be pleased to learn, I take it, that the next article in the series about the epistemology of interaction testing uses CsCheck as the example framework.

2023-02-28 7:33 UTC


Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Monday, 27 February 2023 07:10:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Monday, 27 February 2023 07:10:00 UTC