Generalised Test Data Builder

This article presents a generalised Test Data Builder.

This is the second in a series of articles about the relationship between the Test Data Builder design pattern, and the identity functor. The previous article was a review of the Test Data Builder pattern.

Boilerplate #

While the Test Data Builder is an incredibly versatile and useful design pattern, it has a problem. In languages like C# and Java, it's difficult to generalise. This leads to an excess of boilerplate code.

Expanding on Nat Pryce's original example, an InvoiceBuilder is composed of other builders:

public class InvoiceBuilder
{
    private Recipient recipient;
    private IReadOnlyCollection<InvoiceLine> lines;
 
    public InvoiceBuilder()
    {
        this.recipient = new RecipientBuilder().Build();
        this.lines = new List<InvoiceLine> { new InvoiceLineBuilder().Build() };
    }
 
    public InvoiceBuilder WithRecipient(Recipient newRecipient)
    {
        this.recipient = newRecipient;
        return this;
    }
 
    public InvoiceBuilder WithInvoiceLines(
        IReadOnlyCollection<InvoiceLine> newLines)
    {
        this.lines = newLines;
        return this;
    }
 
    public Invoice Build()
    {
        return new Invoice(recipient, lines);
    }
}

In order to create a Recipient, a RecipientBuilder is used. Likewise, in order to create a single InvoiceLine, an InvoiceLineBuilder is used. This pattern repeats in the RecipientBuilder:

public class RecipientBuilder
{
    private string name;
    private Address address;
 
    public RecipientBuilder()
    {
        this.name = "";
        this.address = new AddressBuilder().Build();
    }
 
    public RecipientBuilder WithName(string newName)
    {
        this.name = newName;
        return this;
    }
 
    public RecipientBuilder WithAddress(Address newAddress)
    {
        this.address = newAddress;
        return this;
    }
 
    public Recipient Build()
    {
        return new Recipient(this.name, this.address);
    }
}

In order to create an Address object, an AddressBuilder is used.

Generalisation attempts #

You can describe the pattern in a completely automatable manner:

For each domain class, create a corresponding Builder class.
For each class field or property in the domain class, define a corresponding field or property in the Builder.
In the Builder's constructor, initialise each field or property with a 'good' default value.
- If the field is a primitive value, such as a string or integer, hard-code an appropriate value.
- If the field is a complex domain type, use that type's corresponding Builder to create the default value.
For each class field or property, add a With[...] method that changes the field and returns the Builder itself.
Add a Build method that returns a new instance of the domain class with the constituent values collected so far.

When you can deterministically descrbe an automatable process, you can write code to automate it.

People have already done that. After having written individual Test Data Builders for a couple of months, I got tired of it and wrote AutoFixture. It uses Reflection to build objects at run-time, but I've also witnessed attempts to automate Test Data Builders via automated code generation.

AutoFixture has been moderately successful, but some people find its API difficult to learn. Correspondingly, code generation comes with its own issues.

In languages like C# or Java, it's difficult to identify a better generalisation.

Generic Builder #

Instead of trying to automate the Test Data Builder pattern, you can pursue a different strategy. At first, it doesn't look all that promising, but if you soldier on, it'll reveal meaningful insights.

As an alternative to replicating the Test Data Builder pattern exactly, you can define a single generically typed Builder class:

public class Builder<T>
{
    private readonly T item;
 
    public Builder(T item)
    {
        if (item == null)
            throw new ArgumentNullException(nameof(item));
 
        this.item = item;
    }
 
    public Builder<T1> Select<T1>(Func<T, T1> f)
    {
        var newItem = f(this.item);
        return new Builder<T1>(newItem);
    }
 
    public T Build()
    {
        return this.item;
    }
 
    public override bool Equals(object obj)
    {
        var other = obj as Builder<T>;
        if (other == null)
            return base.Equals(obj);
 
        return object.Equals(this.item, other.item);
    }
 
    public override int GetHashCode()
    {
        return this.item.GetHashCode();
    }
}

The Builder<T> class reduces the Test Data Builder design patterns to the essentials:

A constructor that initialises the Builder with default data.
A single fluent interface Select method, which returns a new Builder object.
A Build method, which returns the built object.

Perhaps you wonder about the name of the Select method, but there's a good reason for that; you'll learn about it later.

This example of a generic Builder class overrides Equals (and, therefore, also GetHashCode). It doesn't have to do that, but there's a good reason to do this that we'll also come back to later.

It doesn't seem particularly useful, and a first attempt at using it seems to confirm such scepticism:

var address = Build.Address().Select(a =>
{
    a.City = "Paris";
    return a;
}).Build();

This example first uses Build.Address() to create an initial Builder object with appropriate defaults. This static method is defined on the static Build class:

public static Builder<Address> Address()
{
    return new Builder<Address>(new Address("", "", PostCode().Build()));
}

Contrary to Builder<T>, which is a reusable, general-purpose class, the static Build class is an example of a collection of Test Utility Methods specific to the domain model you're testing. Notice how the Build.Address() method uses Build.PostCode().Build() to create a default value for the initial Address object's post code.

The above example passes a C# code block to the Select method. It takes the a (Address) object as input, specifically mutates its City property, and returns it. This syntax is crude, but works. It may look acceptable when pinning a single City property, but it quickly becomes awkward:

var invoice = Build.Invoice().Select(i =>
    {
        i.Recipient = Build.Recipient().Select(r =>
        {
            r.Address = Build.Address().WithNoPostCode().Build();
            return r;
        }).Build();
        return i;
    }).Build();

Not only is it difficult to get right when writing such nested statements, it's also hard to read. You can, however, correct that problem, as you'll see in a little while.

Before we commence on making the code prettier, you may have noticed that the Select method returns a Builder with a different generic type argument than it contains. The Select method on a Builder<T> object has the signature public Builder<T1> Select<T1>(Func<T, T1> f). Until now, however, all the examples you've seen return the input object. In those examples, T is the same as T1. For completeness' sake, here's an example of a proper change of type:

var address = Build.PostCode()
    .Select(pc => new Address("Rue Morgue", "Paris", pc))
    .Build();

This example uses a Builder<PostCode> to create a new Address object. Plugging in the types, T becomes PostCode, and T1 becomes Address.

Perhaps you noticed that this example looks a little better than the previous examples. Instead of having to supply a C# code block, with return statement and all, this call to Select passes a proper (lambda) expression.

Expressions from extensions #

It'd be nice if you could use expressions, instead of full code blocks, with the Select method. As a first step, you could write some test-specific extension methods for your domain model, like this:

public static Address WithCity(this Address address, string newCity)
{
    address.City = newCity;
    return address;
}

This is same code as one of the code blocks above, only refactored to a named extension method. It simplifies use of the generic Builder, though:

var address = Build.Address().Select(a => a.WithCity("Paris")).Build();

That looks good in such a simple example, but unfortunately isn't much of an improvement when it comes to a more complex case:

var invoice =
    Build.Invoice()
        .Select(i => i
            .WithRecipient(Build.Recipient()
                .Select(r => r
                    .WithAddress(Build.Address()
                        .WithNoPostCode()
                        .Build()))
                .Build()))
        .Build();

If, at this point, you're tempted to give up on the overall strategy with a single generic Builder, you'd be excused. It will, however, turn out to be beneficial to carry on. There are more obstacles, but eventually, things will start to fall into place.

Copy and update #

The above WithCity extension method mutates the input object, which can lead to surprising behaviour. While it's a common way to implement fluent interfaces in object-oriented languages, nothing prevents you from making the code saner. Instead of mutating the input object, create a new object with the single value changed:

public static Address WithCity(this Address address, string newCity)
{
    return new Address(address.Street, newCity, address.PostCode);
}

Some people will immediately be concerned about the performance implications of doing this, but you're not one of those people, are you?

Granted, there's allocation and garbage collection overhead by creating new objects like this, but I'd digress if I started to discuss this here. In most cases, the impact is insignificant.

Fluent domain model #

Using extension methods enables you to use a more elegant syntax with the Select method, but there's still some maintenance overhead. If, for now, we accept such maintenance overhead, you could ask: given that we have to define and maintain all those With[...] methods, why limit them to your test code?

Would there be any harm in defining them as proper methods on your domain model?

public Address WithCity(string newCity)
{
    return new Address(this.Street, newCity, this.PostCode);
}

The above example shows the WithCity method as an instance method on the Address class. Here's the entire Address class, refactored to an immutable class:

public class Address
{
    public string Street { get; }
    public string City { get; }
    public PostCode PostCode { get; }
 
    public Address(string street, string city, PostCode postCode)
    {
        if (street == null)
            throw new ArgumentNullException(nameof(street));
        if (city == null)
            throw new ArgumentNullException(nameof(city));
        if (postCode == null)
            throw new ArgumentNullException(nameof(postCode));
 
        this.Street = street;
        this.City = city;
        this.PostCode = postCode;
    }
 
    public Address WithStreet(string newStreet)
    {
        return new Address(newStreet, this.City, this.PostCode);
    }
 
    public Address WithCity(string newCity)
    {
        return new Address(this.Street, newCity, this.PostCode);
    }
 
    public Address WithPostCode(PostCode newPostCode)
    {
        return new Address(this.Street, this.City, newPostCode);
    }
 
    public override bool Equals(object obj)
    {
        var other = obj as Address;
        if (other == null)
            return base.Equals(obj);
 
        return object.Equals(this.Street, other.Street)
            && object.Equals(this.City, other.City)
            && object.Equals(this.PostCode, other.PostCode);
    }
 
    public override int GetHashCode()
    {
        return
            this.Street.GetHashCode() ^
            this.City.GetHashCode() ^
            this.PostCode.GetHashCode();
    }
}

Technically, you could introduce instance methods like WithCity even if you kept the class itself mutable, but once you start down that path, it makes sense to make the class immutable. As Eric Evans recommends in Domain-Driven Design, modelling your domain with (immutable) Value Objects has many benefits. Such objects should also have structural equality, which is the reason that this version of Address also overrides Equals and GetHashCode.

While it looks like more work in a language like C# or Java, there are many benefits to be derived from modelling your domain with Value Objects. As an interim result, then, observe that working with unit testing (in this case a general-purpose Test Data Builder) has prompted a better design of the System Under Test.

You may still think that this seems unnecessarily verbose, and I'd agree. This is one of the many reasons I prefer languages like F# and Haskell over C# or Java. The former have such a copy and update feature built-in. Here's an F# example of updating an Address record with a specific city:

let address = { a with City = "Paris" }

This capability is built into the language. You don't have to add or maintain any code in order to be able to write code like that. Notice, even, how with is a keyword. I'm not sure about the etymology of the word with used in this context, but I find the similarity compelling.

In Haskell, it looks similar:

address = a { city = "Paris" }

In other words, domain models created from immutable Value Objects are laborious in some languages, but that only suggests a deficiency in such a language.

Default Builders as values #

Now that the domain model is immutable, you can define default builders as values. Previously, to start building e.g. an Address value, you had to call the Build.Address() method. When the domain model was mutable, containing a single default value inside of a Builder would enable tests to mutate that default value. Now that domain classes are immutable, this is no longer a concern, and you can instead define test-specific default builders as values:

public static class Builder
{
    public readonly static Builder<Address> Address;
    public readonly static Builder<Invoice> Invoice;
    public readonly static Builder<InvoiceLine> InvoiceLine;
    public readonly static Builder<PostCode> PostCode;
    public readonly static Builder<PoundsShillingsPence> PoundsShillingsPence;
    public readonly static Builder<Recipient> Recipient;
 
    static Builder()
    {
        PoundsShillingsPence = new Builder<PoundsShillingsPence>(
            DomainModel.PoundsShillingsPence.Zero);
        PostCode = new Builder<PostCode>(new PostCode());
        Address =
            new Builder<Address>(new Address("", "", PostCode.Build()));
        Recipient =
            new Builder<Recipient>(new Recipient("", Address.Build()));
        Invoice = new Builder<Invoice>(
            new Invoice(Recipient.Build(), new List<InvoiceLine>()));
        InvoiceLine = new Builder<InvoiceLine>(
            new InvoiceLine("", PoundsShillingsPence.Build()));
    }
 
    public static Builder<Address> WithNoPostCode(this Builder<Address> b)
    {
        return b.Select(a => a.WithPostCode(new PostCode()));
    }
}

This enables you to write expressions like this:

var address = Builder.Address.Select(a => a.WithCity("Paris")).Build();

To be clear: such a static Builder class is a Test Utility API specific to your unit tests. It would often be defined in a completely different file than the Builder<T> class, perhaps even in separate libraries.

Summary #

Instead of trying to automate Test Data Builders to the letter of the original design pattern description, you can define a single, reusable, generic Builder<T> class. It enables you to achieve some of the expressivity of Test Data Builders.

If you still don't find this strategy's prospects fertile, I understand. We're not done, though. In the next article, you'll see why Select is an appropriate name for the Builder's most important method, and how it relates to good abstractions.

Next: The Builder functor.

Comments

Mikhail Shilkov #

When I found myself writing too many With() methods, I created an extension to Fody code weaving tool: Fody.With.

Basically I declare the With() methods without body implementation, and then Fody does the implementation for me. It can also convert a generic version to N overloads with an implementation per each public property.

The link about has some usage examples, that hopefully make the idea clear.

2017-08-21 12:32 UTC

Harshdeep Mehta #

C# does have Object Initializer to build "address" with specified "city", similar to F# and Haskell.

2017-08-22 12:40 UTC

Mark Seemann #

Harshdeep, thank you for writing. C# object initialisers aren't the same as F# Copy and Update Record Expressions. Unless I misunderstand what you mean, when you write

var address = new Address { City = "Paris" };

address will have "Paris" as City, but all other properties, such as Street and PostCode will be null. That's not what I want. That's the problem the Test Data Builder pattern attempts to address. Test values should be populated with 'good' values, not null.

I admit that I'm not keeping up with the latest developments in C#, but if I try to use the C# object initializer syntax with an existing value, like this:

var defaultAddress =
    new Address { Street = "", PostCode = new DomainModel.PostCode(), City = "" };
var address = defaultAddress { City = "Paris" };

it doesn't compile.

I'm still on Visual Studio 2015, though, so that may be it...

2017-08-22 13:27 UTC

Harshdeep Mehta #

Aah. Now I get it. Thanks for explaining. I am from C# world and certainly not into F# yet so I missunderstood "Copy & Update Expression" with "Object Initializer".

2017-08-23 5:22 UTC

Published: Monday, 21 August 2017 06:09:00 UTC

Generalised Test Data Builder by Mark Seemann