The Builder functor by Mark Seemann
The Test Data Builder design pattern as a functor.
This is the third in a series of articles about the relationship between the Test Data Builder design pattern, and the identity functor. The previous article introduced this generic Builder class:
public class Builder<T> { private readonly T item; public Builder(T item) { if (item == null) throw new ArgumentNullException(nameof(item)); this.item = item; } public Builder<T1> Select<T1>(Func<T, T1> f) { var newItem = f(this.item); return new Builder<T1>(newItem); } public T Build() { return this.item; } public override bool Equals(object obj) { var other = obj as Builder<T>; if (other == null) return base.Equals(obj); return object.Equals(this.item, other.item); } public override int GetHashCode() { return this.item.GetHashCode(); } }
The workhorse is the Select
method. As I previously promised to explain, there's a reason I chose that particular name.
Query syntax #
C# comes with a small DSL normally known as query syntax. People mostly think of it in relation to ORMs such as Entity Framework, but it's a general-purpose language feature. Still, most developers probably associate it with the IEnumerable<T>
interface, but it's more general than that. In fact, any type that comes with a Select
method with a compatible signature supports query syntax.
I deliberately designed the Builder's Select
method to support query syntax:
Address address = from a in Builder.Address select a.WithCity("Paris");
Builder.Address
is a Builder<Address>
object that contains a 'good' default Address
value. Since Builder<T>
has a compatible Select
method, you can 'query it'. In this example, you use the WithCity
method to explicitly pin the Address
object's City
property, while all the other Address
values remain the default values.
There's an extra bit (pun intended) of compiler magic at work. Did you wonder how a Builder<Address>
automatically turns into an Address
value? After all, address
is of the type Address
, not Builder<Address>
.
I specifically added an implicit conversion so that I didn't have to surround the query expression with brackets in order to call the Build
method:
public static implicit operator T(Builder<T> b) { return b.item; }
This conversion is defined on Builder<T>
. It's the reason I explicitly use the type name when I declare the address
variable above, instead of using the var
keyword. Declaring the type forces the implicit conversion.
You can also use query syntax to map one constructed Builder type into another (and ultimately to the value it contains):
Address address = from pc in Builder.PostCode select new Address("Rue Morgue", "Paris", pc);
This expression starts with a Builder<PostCode>
object, transforms it into a Builder<Address>
object, and then finally uses the implicit conversion to turn the Builder<Address>
into an Address
object.
Even a more complex 'query' looks almost palatable:
Invoice invoice = from i in Builder.Invoice select i.WithRecipient( from r in Builder.Recipient select r.WithAddress(Builder.Address.WithNoPostCode()));
Again, the implicit type conversion makes the syntax much cleaner.
Functor #
Isn't it amazing that the C# designers were able to come up with such a generally useful language feature? It certainly is a nice piece of work, but it's based on a an existing body of knowledge.
A type like Builder<T>
with a suitable Select
method is a functor. This is a term from category theory, but I'll try to avoid turning this article into a category theory lecture. Likewise, I'm not going to talk specifically about monads here, although it's a closely related topic. A functor is a mapping between categories; it maps an object from one category into an object of another category.
Although I've never seen Microsoft explicitly acknowledge the connection to functors and monads, it's clear that it's there. One of the principal designers of LINQ was Erik Meijer, who definitely knows his way around category theory and functional programming. A functor is a simple, but widely applicable abstraction.
In order to be a functor, a type must have an associated mapping. In C#'s query syntax, this is a method named Select
, but more often it's called map
.
Haskell Builder #
In Haskell, the mapping is called fmap
, and you can define the Builder functor like this:
newtype Builder a = Builder a deriving (Show, Eq) instance Functor Builder where fmap f (Builder a) = Builder $ f a
Notice how terse the definition is, compared to the C# version. Despite the difference in size, they accomplish the same goal. The first line of code defines the Builder
type, complete with structural equality (Eq
) and the ability to convert a Builder
value to a string (Show
).
This Builder
type is explicitly defined as a Functor
in the second expression, where the fmap
function is implemented. The code is similar to the Select
method in the above C# example: f
is a function that takes the generic type a
(corresponding to T
in the C# example) as input, and returns a value of the generic type b
(corresponding to T1
in the C# example). The mapping pulls the underlying value out of the input Builder, calls f
with that value, and puts the return value into a new Builder.
In Haskell, a functor is part of the language itself, so Builder
is explicitly declared to be a Functor
instance.
If you define some default Builder
values, corresponding to the above Builder.Address
, you can use them to build addresses in the same way:
Builder address = fmap (\a -> a { city = "Paris" }) addressBuilder
Here, addressBuilder
is a Builder Address
value, corresponding to the C# Builder.Address
value. \a -> a { city = "Paris" }
is a lambda expression that takes an Address
value as input, and return a similar value as output, only with city
explicitly bound to "Paris"
.
F# example #
Unlike Haskell, F# doesn't treat functors as an explicit construct, but you can still define a Builder functor:
type Builder<'a> = Builder of 'a module Builder = // ('a -> 'b) -> Builder<'a> -> Builder<'b> let map f (Builder x) = Builder (f x)
You can see how similar this is to the Haskell example. In F#, it's common to define a module with the same name as a generic type. This example defines a generic Builder<'a>
type and a supporting Builder
module. Normally, a module would contain other functions, in addition to map
.
Just like in C# and Haskell, you can build an address in Paris with a predefined Builder value as a start:
let (Builder address) = addressBuilder |> Builder.map (fun a -> { a with City = "Paris" })
Again, addressBuilder
is a Builder<Address>
that contains a 'default' Address
(test) value. You use Builder.map
with a lambda expression to map the default value into a new Address
value where City
is bound to "Paris"
.
Functor laws #
In order to be a proper functor, an object must obey two simple laws. It's not enough that a mapping function exists, it must also obey the laws. While that sounds uncomfortably like mathematics, the laws are simple and intuitive.
The first law is that when the mapping returns the input, the functor returned is also the input functor. There's only one (generic) function that returns its input unmodified. It's called the identity function (often abbreviated id).
Here's an example test case that illustrates the first functor law for the C# Builder<T>
:
[Fact] public void BuilderObeysFirstFunctorLaw() { Func<int, int> id = x => x; var sut = new Builder<int>(42); var actual = sut.Select(id); Assert.Equal(sut, actual); }
The .NET Base Class Library doesn't come with a built-in identity function, so the test case first defines it as id
. Normally, the identity function would be defined as a function that takes a value of the generic type T
as input, and returns the same value (still of type T
) as output. This test is only an example for the type int
, so it also defines the identity function as constrained to int
.
The test creates a new Builder<int>
with the value 42
, and calls Select
with id
. Since the first functor law says that mapping with the identity function must return the input functor, the expected value is the sut
variable.
This test is only an example of the first functor law. It doesn't prove that Builder<T>
obeys the law for all generic types (T
) and for all values. It only proves that it holds for the integer 42. You get the idea, though, I'm sure.
The second functor law says that if you chain two functions to make a third function, and map your functor using that third function, the result should be equal to the result you get if you chain two mappings made out of those two functions. Here's an example:
[Fact] public void BuilderObeysSecondFunctorLaw() { Func<int, string> g = i => i.ToString(); Func<string, string> f = s => new string(s.Reverse().ToArray()); var sut = new Builder<int>(1337); var actual = sut.Select(i => f(g(i))); var expected = sut.Select(g).Select(f); Assert.Equal(expected, actual); }
This test case (which is, again, only an example) first defines two functions, f
and g
. It then creates a new Builder<int>
and calls Select
with the combined function f(g)
. This returns the actual
result, which is a Builder<string>
.
This result should be equal to first calling Select
with g
(which returns a Builder<string>
), and then calling Select
with f
(which returns another Builder<string>
). These two Builder objects should be equal to each other, which they are.
Both these tests compare an expected Builder to an actual Builder, which is the reason that Builder<T>
overrides Equals
in order to have structural equality. In Haskell, the above Builder type has structural equality because it uses the default instance of Eq
, and in F#, Builder<'a>
has structural equality because that's the default equality for the immutable F# data types.
We can't easily talk about the functor laws without being able to talk about functor values being (or not being) equal to each other, so structural equality is an important element in the discussion.
Summary #
You can define a Test Data Builder as a functor by defining a generic Builder type with a Select
method. In order to be a proper functor, it must also obey the functor laws, but these laws are quite natural; you almost have to go out of your way in order to violate them.
A functor is a well-known abstraction. Instead of trying to come up with a half-baked, ad-hoc abstraction, modelling an API based on already known and understood abstractions such as functors will make the API easier to learn. Everyone who knows what a functor is, will automatically have a good understanding of the API. Even if you didn't know about functors until now, you only have to learn about them once.
This can often be beneficial, but for Test Data Builders, it turns out to be a red herring. The Builder functor is nothing but the Identity functor in disguise.
Next: Builder as Identity.
Comments
Maybe I am missing something, so could you explain with few words what is the advantage of having this _generic builder_?
I mean, inmutable entities and _with methods_ seems to be enough to easily create test data without builders, for example:
Andrés, thank you for writing. I hope that the next two articles in this article series will answer your question. It seems, however, that you've already predicted where this is headed. A fine display of critical thinking!
Just for the DSL and implicit conversion laius it's worth reading.
_Implicit/explicit_ is a must-have to lighten _value object_ usage and contributes to deliver proper API.
Thank you Mark.