The Test Data Generator functor by Mark Seemann
A Test Data Generator modelled as a functor.
In a previous article series, you learned that while it's possible to model Test Data Builders as a functor, it adds little value. You shouldn't, however, dismiss the value of functors. It's an abstraction that applies broadly.
Closely related to Test Data Builders is the concept of a generator of random test data. You could call it a Test Data Generator instead. Such a generator can be modelled as a functor.
A C# Generator #
At its core, the idea behind a Test Data Generator is to create random test data. Still, you'll like to be able control various parts of the process, because you'd often need to pin parts of the generated data to deterministic values, while allowing other parts to vary randomly.
In C#, you can write a generic Generator like this:
public class Generator<T> { private readonly Func<Random, T> generate; public Generator(Func<Random, T> generate) { if (generate == null) throw new ArgumentNullException(nameof(generate)); this.generate = generate; } public Generator<T1> Select<T1>(Func<T, T1> f) { if (f == null) throw new ArgumentNullException(nameof(f)); Func<Random, T1> newGenerator = r => f(this.generate(r)); return new Generator<T1>(newGenerator); } public T Generate(Random random) { if (random == null) throw new ArgumentNullException(nameof(random)); return this.generate(random); } }
The Generate
method takes a Random
object as input, and produces a value of the generic type T
as output. This enables you to deterministically reproduce a particular randomly generated value, if you know the seed of the Random
object.
Notice how Generator<T>
is a simple Adapter over a (lazily evaluated) function. This function also takes a Random
object as input, and produces a T
value as output. (For the FP enthusiasts, this is simply the Reader functor in disguise.)
The Select
method makes Generator<T>
a functor. It takes a map function f
as input, and uses it to define a new generate
function. The return value is a Generator<T1>
.
General-purpose building blocks #
Functors are immanently composable. You can compose complex Test Data Generators from simpler building blocks, like the following.
For instance, you may need a generator of alphanumeric strings. You can write it like this:
private const string alphaNumericCharacters = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; public static Generator<string> AlphaNumericString = new Generator<string>(r => { var length = r.Next(25); // Arbitrarily chosen max length var chars = new char[length]; for (int i = 0; i < length; i++) { var idx = r.Next(alphaNumericCharacters.Length); chars[i] = alphaNumericCharacters[idx]; } return new string(chars); });
This Generator<string>
can generate a random string with alphanumeric characters. It randomly picks a length between 0 and 24, and fills it with randomly selected alphanumeric characters. The maximum length of 24 is arbitrarily chosen. The generated string may be empty.
Notice that the argument passed to the constructor is a function. It's not evaluated at initialisation, but only if Generate
is called.
The r
argument is the Random
object passed to Generate
.
Another useful general-purpose building block is a generator that can use a single-object generator to create many objects:
public static Generator<IEnumerable<T>> Many<T>(Generator<T> generator) { return new Generator<IEnumerable<T>>(r => { var length = r.Next(25); // Arbitrarily chosen max length var elements = new List<T>(); for (int i = 0; i < length; i++) elements.Add(generator.Generate(r)); return elements; }); }
This method takes a Generator<T>
as input, and uses it to generate zero or more T
objects. Again, the maximum length of 24 is arbitrarily chosen. It could have been a method argument, but in order to keep the example simple, I hard-coded it.
Domain-specific generators #
From such general-purpose building blocks, you can define custom generators for your domain model. This enables you to use such generators in your unit tests.
In order to generate post codes, you can combine the AlphaNumericString
and the Many
generators:
public static Generator<PostCode> PostCode = new Generator<PostCode>(r => { var postCodes = Many(AlphaNumericString).Generate(r); return new PostCode(postCodes.ToArray()); });
The PostCode
class is part of your domain model; it takes an array of strings as input to its constructor. The PostCode
generator uses the AlphaNumericString
generator as input to the Many
method. This generates zero or many alphanumeric strings, which you can pass to the PostCode
constructor.
This, in turn, gives you all the building blocks you need to generate Address
objects:
public static Generator<Address> Address = new Generator<Address>(r => { var street = AlphaNumericString.Generate(r); var city = AlphaNumericString.Generate(r); var postCode = PostCode.Generate(r); return new Address(street, city, postCode); });
This Generator<Address>
uses the AlphaNumericString
generator to generate street and city strings. It uses the PostCode
generator to generate a PostCode
object. All these objects are passed to the Address
constructor.
Keep in mind that all of this logic is defined in lazily evaluated functions. Only when you invoke the Generate
method on a generator does the code execute.
Generating values #
You can now write tests similar to the tests shown in the article series about Test Data Builders. If, for example, you need an address in Paris, you can generate it like this:
var rnd = new Random(); var address = Gen.Address.Select(a => a.WithCity("Paris")).Generate(rnd);
Gen.Address
is the Address
generator shown above; I put all those generators in a static class called Gen
. If you don't modify it, Gen.Address
will generate a random Address
object, but by using Select
, you can pin the city to Paris.
You can also start with one type of generator and use Select
to map to another type of generator, like this:
var rnd = new Random(); var address = Gen.PostCode .Select(pc => new Address("Rue Morgue", "Paris", pc)) .Generate(rnd);
You use Gen.PostCode
as the initial generator, and then Select
a new Address
in Rue Morgue, Paris, with a randomly generated post code.
Functor #
Such a Test Data Generator is a functor. One way to see that is to use query syntax instead of the fluent API:
var rnd = new Random(); var address = (from a in Gen.Address select a.WithCity("Paris")).Generate(rnd);
Likewise, you can also translate the Rue Morgue generator to query syntax:
var address = ( from pc in Gen.PostCode select new Address("Rue Morgue", "Paris", pc)).Generate(rnd);
This is, however, awkward, because you have to enclose the query expression in brackets in order to be able to invoke the Generate
method. Alternatively, you can separate the query from the generation, like this:
var g = from a in Gen.Address select a.WithCity("Paris"); var rnd = new Random(); var address = g.Generate(rnd);
Or this:
var g = from pc in Gen.PostCode select new Address("Rue Morgue", "Paris", pc); var rnd = new Random(); var address = g.Generate(rnd);
You'd probably still prefer the fluent API over this syntax. The reason I show this alternative is to demonstrate that the functor gives you the ability to separate the definition of data generation from the actual generation. In order to emphasise this point, I defined the g
variables before creating the Random
object rnd
.
Property-based testing #
The above Generator<T>
is only a crude example of a Test Data Generator. In order to demonstrate how such a generator is a functor, I left out several useful features. Still, this should have given you a sense for how the Generator<T>
class itself, as well as such general-purpose building blocks as Many
and AlphaNumericString
, could be packaged in a reusable library.
The examples above show how to use a generator to create a single random object. You could, however, easily generate many (say, 100) random objects, and run unit tests for each object created. This is the idea behind property-based testing.
There's more to property-based testing than generation of random values, but the implementations I've seen are all based on Test Data Generators as functors (and monads).
FsCheck #
FsCheck is an open source F# library for property-based testing. It defines a Gen
functor (and monad) that you can use to generate Address
values, just like the above examples:
let! address = Gen.address |> Gen.map (fun a -> { a with City = "Paris"} )
Here, Gen.address
is a Gen<Address>
value. By itself, it'll generate random Address
values, but by using Gen.map
, you can pin the city to Paris.
The map
function corresponds to the C# Select
method. In functional programming, map is the most common name, although Haskell calls the function fmap
; the Select
name is, in fact, the odd man out.
Likewise, you can map from one generator type to another:
let! address = Gen.postCode |> Gen.map (fun pc -> { Street = "Rue Morgue"; City = "Paris"; PostCode = pc })
This example uses Gen.postCode
as the initial generator. This is, as the name implies, a Gen<PostCode>
value. For every random PostCode
value generated, map
turns it into an address in Rue Morgue, Paris.
There's more going on here than I'd like to cover in this article. The use of let!
syntax actually requires Gen<'a>
to be a monad (which it is), but that's a topic for another day. Both of these examples are contained in a computation expression, and the implication of that is that the address
values represent a multitude of randomly generated Address
values.
Hedgehog #
Hedgehog is another open source F# library for property-based testing. With Hedgehog, the Address
code examples look like this:
let! address = Gen.address |> Gen.map (fun a -> { a with City = "Paris"} )
And:
let! address = Gen.postCode |> Gen.map (fun pc -> { Street = "Rue Morgue"; City = "Paris"; PostCode = pc })
Did you notice something?
This is literally the same syntax as FsCheck! This isn't because Hedgehog is copying FsCheck, but because both are based on the same underlying abstraction: functor (and monad). There are other parts of the API where Hedgehog differs from FsCheck, but their generators are similar.
This is one of the most important advantages of using well-known abstractions like functors. Once you understand such an abstraction, it's easy to learn a new library. With professional experience with FsCheck, it only took me a few minutes to figure out how to use Hedgehog.
Summary #
Functors are well-defined objects from category theory. It may seem abstract, and far removed from 'real' programming, but it's extraordinarily useful. Many category theory abstractions can be applied to a host of different situations. Once you've learned what a functor is, you'll find it easy to learn to use new libraries that build on that abstraction.
In this article you saw a sketch of how the functor abstraction can be used to model Test Data Generators. Contrary to Test Data Builders, which turned out to be a redundant abstraction, a Test Data Generator is truly useful.
Many years ago, I had the idea to create a Test Data Generator for unit testing purposes. I called it AutoFixture, and although it's had some success, the API isn't as clean as it could be. Back then, I didn't know about functors, so I had to invent an API for AutoFixture. This API is proprietary to AutoFixture, so anyone learning AutoFixture must learn this particular API, and its abstractions. It would have been so much easier for all involved if I had designed AutoFixture as a functor instead.
Comments
I'm curious as to what the "useful features" are that that you left out of the Test Data Generator?
Stuart, thank you for writing. Test Data Generators like the one described here are rich data structures that you can do a lot of interesting things with. As described here, the generator only generates a single value every time you invoke its
Generate
method. What property-based testing libraries like QuickCheck, FsCheck, and Hedgehog do is that instead of a single random value, they generate many values (the default number seems to be 100).These property-based testing libraries tend to then 'elevate' their generators into another type of data structure called Arbitraries, and these again into Properties. What typically happens is that they use the Generators to generate values, but for each generated value, they evaluate the associated Property. If all Properties succeed, nothing more happens, but in the case of a test failure, no more values are generated. Instead, the libraries switch to a state where they attempt to shrink the counter-example to a simpler counter-example. It uses a Shrinker associated with the Arbitrary to do this. The end result is that if your test doesn't hold, you'll get an easy-to-understand example of the input that caused the test to fail.
Apart from that, there are many other features of Test Data Generators that I left out. Some of these include ways to combine several Generators to a single Generator. It turns out that Test Data Generators are also Applicative Functors and Monads, and you can use these traits to define powerful combinators. In the future, I'll publish more articles on this topic, but it'll take months, because my article queue has quite a few other articles in front of those.
If you want to explore this topic, I'd recommend playing with FsCheck. While it's written in F#, it also works from C#, and its documentation includes C# examples as well. Hedgehog may also work from C#, but being a newer, more experimental library, its documentation is still sparse.
That's right. Hedgehog may be used from C# as well.