ploeh blog danish software design
Unit Testing with F# Pluralsight course
My latest Pluralsight course is an introduction to unit testing with F#.
Perhaps you already know all about unit testing. Perhaps you already know all about F#. But do you know how to write unit tests in F#?
My new Pluralsight course explains how to write unit tests with F#. If you already know F# and unit testing on .NET, it's quite straightforward. This is my first beginner-level course on Pluralsight, so regular readers of this blog may find it too basic.
Still, if you don't know what Unquote is and can do for you, you may want to consider watching module four, which introduces this great assertion library, and provides many examples.
This entire course will, together with some of my existing Pluralsight courses, serve as a basis for more courses on F# and Test-Driven Development.
POSTing JSON to an F# Web API
How to write an ASP.NET Web API service that accepts JSON in F#.
It seems that many people have problems with accepting JSON as input to a POST method when they attempt to implement an ASP.NET Web API service in F#.
It's really quite easy, with one weird trick :)
You can follow my recipe for creating a pure F# Web API project to get started. Then, you'll need to add a Data Transfer Record and a Controller to accept your data:
[<CLIMutable>] type MyData = { MyText : string; MyNumber : int } type MyController() = inherit ApiController() member this.Post(myData : MyData) = this.Ok myData
That's quite easy; there's only one problem with this: the incoming myData
value is always null.
The weird trick #
In addition to routes etc. you'll need to add this to your Web API configuration:
GlobalConfiguration.Configuration.Formatters.JsonFormatter.SerializerSettings.ContractResolver <- Newtonsoft.Json.Serialization.CamelCasePropertyNamesContractResolver()
You add this in your Application_Start method in your Global class, so you only have to add it once for your entire project.
The explanation #
Why does this work? Part of the reason is that when you add the [<CLIMutable>] attribute to your record, it causes the record type to be compiled with auto-generated internal mutable fields, and these are named by appending an @ character - in this case, the field names become MyText@
and MyNumber@
.
Apparently, the default JSON Contract Resolver (whatever that is) goes for those fields, even though they're internal, but the CamelCasePropertyNamesContractResolver doesn't. It goes for the properly named MyText
and MyNumber
writeable public properties that the compiler also generates.
As the name implies, the CamelCasePropertyNamesContractResolver converts the names to camel case, so that the JSON properties become myText
and myNumber
instead, but I only find this appropriate anyway, since this is the convention for JSON.
Example HTTP interaction #
You can now start your service and make a POST request against it:
POST http://localhost:49378/my HTTP/1.1 Content-Type: application/json { "myText": "ploeh", "myNumber": 42 }
This request creates this response:
HTTP/1.1 200 OK Content-Type: application/json; charset=utf-8 {"myText":"ploeh","myNumber":42}
That's all there is to it.
You can also receive XML instead of JSON using a similar trick.
Property Based Testing without a Property Based Testing framework
Sometimes, you don't need a Property-Based Testing framework to do Property-Based Testing.
In my previous post, I showed you how to configure FsCheck so that it creates char values exclusively from the list of the upper-case letters A-Z. This is because the only valid input for the Diamond kata is the set of these letters.
By default, FsCheck generates 100 random values for each property, and runs each property with those 100 values. My kata code has 9 properties, so that means 900 function calls (taking just over 1 second on my Lenovo X1 Carbon).
However, why would we want to select 100 random values from a set of 26 valid values? Why not simply invoke each property (which is a function) with those 26 values?
That's not so hard to do, but if there's a way to do it with FsCheck, I haven't figured it out yet. It's fairly easy to do with xUnit.net, though.
What you'll need to do is to change the Letters type to an instance class implementing seq<obj[]> (IEnumerable<object[]> for the single C# reader still reading):
type Letters () = let letters = seq {'A' .. 'Z'} |> Seq.cast<obj> |> Seq.map (fun x -> [|x|]) interface seq<obj[]> with member this.GetEnumerator () = letters.GetEnumerator() member this.GetEnumerator () = letters.GetEnumerator() :> Collections.IEnumerator
This is simply a class that enumerates the char values 'A' to 'Z' in ascending order.
You can now use xUnit.net's Theory and ClassData attributes to make each Property execute exactly 26 times - one for each letter:
[<Theory; ClassData(typeof<Letters>)>] let ``Diamond is as wide as it's high`` (letter : char) = let actual = Diamond.make letter let rows = split actual let expected = rows.Length test <@ rows |> Array.forall (fun x -> x.Length = expected) @>
Instead of 900 tests executing in just over 1 second, I now have 234 tests executing in just under 1 second. A marvellous speed improvement, and, in general, a triumph for mankind.
The point is that if the set of valid input values (the domain) is small enough, you may consider simply using all of them, in which case you don't need a Property-Based Testing framework. However, I still think this is probably a rare occurrence, so I'll most likely reach for FsCheck again next time I need to write some tests.
A simpler Arbitrary for the Diamond kata
There's a simple way to make FsCheck generate letters in a particular range.
In my post about the Diamond kata with FsCheck, I changed the way FsCheck generates char values, using this custom Arbitrary (essentially a random value generator):
type Letters = static member Char() = Arb.Default.Char() |> Arb.filter (fun c -> 'A' <= c && c <= 'Z')
This uses the default, built-in Arbitrary for char values, but filters its values so that most of them are thrown away, and only the letters 'A'-'Z' are left. This works, but isn't particularly efficient. Why generate a lot of values only to throw them away?
It's also possible to instruct FsCheck to generate values from a particular set of valid values, which seems like an appropriate action to take here:
type Letters = static member Char() = Gen.elements ['A' .. 'Z'] |> Arb.fromGen
Instead of using Arb.Default.Char()
and filtering the values generated by it, this implementation uses Gen.elements
to create a Generator of the values 'A'-'Z', and then an Arbitrary from that Generator.
Much simpler, but now it's also clear that this custom Arbitrary will be used to generate 100 test cases (for each property) from a set of 26 values; that doesn't seem right...
Library Bundle Facade
Some people want to define a Facade for a bundle of libraries. Is that a good idea?
My recent article on Composition Root reuse generated some comments:
These comments are from two different people, but they provide a decent summary of the concerns being voiced."What do you think about pushing these factories and builders to a library so that they can be reused by different composition roots?"
"We want to share the composition root, because otherwise when a component needs a new dependency and a constructor parameter is added, we'd have to change the same code in two different places."
Is it a good idea to provide one or more Facades, for example in the form of Factories or Builders, for the libraries making up a Composition Root? More specifically, is it a good idea to provide a Factory or Builder that can compose a complete object graph spanning multiple libraries?
In this article, I will attempt to answer that question for various cases of library bundles. To make the terminology a bit more streamlined, I'll refer to any Factory or Builder that composes object graphs as a Composer.
Single library #
In the case of a single library, I think I've already answered the question in the affirmative. There's nothing wrong with including a Facade in the form of a Factory or Builder in order to make that single library easier to use.
Two libraries #
When you introduce a second library, things start becoming interesting. If we consider the case of two libraries, for example a Domain Model and a Data Access Library, the Composition Root will need to compose an object graph where some of the objects in the graph are from the Domain Model, and some of the objects are from the Data Access Library.
In the spirit of Agile Principles, Patterns, and Practices (APPP), it turns out that simply drawing dependency diagrams can be helpful. From the Dependency Inversion Principle (DIP) we know that the "clients [...] own the abstract interfaces" (APPP, chapter 11), which means that for our two example libraries, the dependency graph must look like this:
At least, if you follow the most common architectures for loosely couple code, the Domain Model is the 'client', so it gets to define the interfaces it needs. Thus, it follows that the Data Access Library, in order to implement those interfaces, must have a compile-time dependency on the Domain Model. That's what the arrow means.
From this diagram, it should be clear that you can't put a Factory or Builder in the Domain Model library. If the Composer should compose object graphs from both libraries, it would need to reference both of those libraries, and the Domain Model can't reference the Data Access Library, since that would result in a circular reference.
You could put the Composer in the Data Access Library, but that somehow doesn't feel right, and in any case, as we shall see later, this solution can't be generalised to n libraries.
A solution that many people reach for, then, is to pull the interfaces out into a separate library, like this:
It's a bit like cheating, according to the DIP, but it's as decoupled as before. In this diagram, the Domain Model depends on the interfaces because it uses them, while the Data Access Library depends on the interface library because it implements the interfaces. Unfortunately, this doesn't solve the problem at all, because there's still no place to place a Composer without getting into a problem with either the DIP, or circular references (exercise: try it!).
A possible option is to keep the libraries as the DIP dictates, and then add a third Composer library:
The Composer library references both the Domain Model and the Data Access Library, so it's possible for it to compose object graphs with objects from both libraries. The only purpose of this library, then, is to compose those object graphs, so it'll likely only contain a single class.
Multiple libraries #
Does the above conclusions change if you have more than two libraries? Only in the sense that it further restricts your options. As the analysis of the special case with two libraries demonstrated, you only have two options for adding a Composer to your bundle of libraries:
- Put the Composer in the Data Access Library
- Put the Composer in a new dedicated library
Imagine that you have two Data Access Libraries instead of one:
For instance, the SQL Access Library may implement various interfaces defined by the Domain Model, based on a SQL Server database; and the Web Service Access Library may implement some other interfaces by calling out to some web service.
If the Composer must be able to compose object graphs with object from all three libraries, it must reside in a library that references all of the relevant libraries. The Domain Model is still out of the question because you can't have circular references. That leaves one of the two Data Access libraries. It'd be technically possible to e.g. add a reference from the SQL Access Library to the Web Service Access Library, and put the Composer in the SQL Access Library:
However, why would you ever do that? It's clearly wrong to let one Data Access library depend on another, and it doesn't help if you reverse the arrow.
Thus, the only option left is to add a new Composer library:
As before, the Composer library has references to all other libraries, and contains a single class that composes object graphs.
Over-engineering #
The point of this analysis is to arrive at the conclusion that no matter how you twist and turn, you'll have to add a completely new library with the only purpose of composing object graphs. Is it warranted?
If the only motivation for doing this is to avoid duplicated code, I would argue that this looks like over-engineering. In the questions quoted above, it sounds as if the Rule of Three isn't even satisfied. It's always important to question your motivations for avoiding duplication. In this case, I'd be wary of introducing so much extra complexity only in order to avoid writing the same lines of code twice - particularly when it's likely that the duplication is accidental.
TL;DR #
Attempting to provide a reusable Facade to compose object graphs across multiple libraries is hardly worth the trouble. Think twice before you do it.
Comments
Dear Mark,
Thanks again for another wonderful post. I'm one of the guys you mentioned at the beginning of this text and I totally agree with you on this subject.
Maybe I wasn't specific enough in my comment that I didn't mean to introduce factories and builders handling dependencies from two or more assemblies, but only from a single one. The composition root wires the independent subgraphs from the different assemblies together.
However, I want to reemphasize the Builder Pattern in this context: it provides default dependencies for the object graph to be created and (usually) a fluent API (i.e. some sort of method chaining) to exchange these dependencies. This has the following consequences for the client programmer using the builder(s): he or she can easily create a default object graph by calling new Builder().Build() and still exchange dependencies he or she cares about. This can keep the composition root clean (at least for Arrange phases of tests this is true).
Why I'm so excited about this? Because I use this all the time in my automated tests, but haven't really used it in my production composition roots (or seen it in others). After reading this post and Composition Root Reuse, I will try to incorporate builders more often in my production code.
Mark - thank you for making me think about this.
Kenny, thank you for writing. The Builder pattern is indeed a good way to implement Facades for a single library.
From Primitive Obsession to Domain Modelling
A string is sometimes not a string. Model it accordingly.
Recently, I was reviewing some code that looked like this:
public IHttpActionResult Get(string userName) { if (string.IsNullOrWhiteSpace(userName)) return this.BadRequest("Invalid user name."); var user = this.repository.FindUser(userName.ToUpper()); return this.Ok(user); }
There was a few things with this that struck me as a bit odd; most notably the use of IsNullOrWhiteSpace. When I review code, IsNullOrWhiteSpace is one of the many things I look for, because most people use it incorrectly.
This made me ask the author of the code why he had chosen to use IsNullOrWhiteSpace, or, more specifically, what was wrong with a string with white space in it?
The answer wasn't what I expected, though. The answer was that it was a business rule that the user name can't be all white space.
You can't really argue about business logic.
In this case, the rule even seems quite reasonable, but I was just so ready to have a discussion about invariants (pre- and postconditions) that I didn't see that one coming. It got me thinking, though.
Where should business rules go? #
It seems like a reasonable business rule that a user name can't consist entirely of white space, but is it reasonable to put that business rule in a Controller? Does that mean that everywhere you have a user name string, you must remember to add the business rule to that code, in order to validate it? That sounds like the kind of duplication that actually hurts.
Shouldn't business rules go in a proper Domain Model?
This is where many programmers would start to write extension methods for strings, but that's just putting lipstick on a pig. What if you forget to call the appropriate extension method? What if a new developer on the team doesn't know about the appropriate extension method to use?
The root problem here is Primitive Obsession. Just because you can represent a value as a string, it doesn't mean that you always should.
The string
data type literally can represent any text. A user name (in the example domain above) can not be any text - we've already established that.
Make it a type #
Instead of string, you can (and should) make user name a type. That type can encapsulate all the business rules in a single place, without violating DRY. This is what is meant by a Domain Model. In Domain-Driven Design terminology, a primitive like a string or a number can be turned into a Value Object. Jimmy Bogard already covered that ground years ago, but here's how I would define a UserName class:
public class UserName { private readonly string value; public UserName(string value) { if (value == null) throw new ArgumentNullException("value"); if (!UserName.IsValid(value)) throw new ArgumentException("Invalid value.", "value"); this.value = value; } public static bool IsValid(string candidate) { if (string.IsNullOrEmpty(candidate)) return false; return candidate.Trim().ToUpper() == candidate; } public static bool TryParse(string candidate, out UserName userName) { userName = null; if (string.IsNullOrWhiteSpace(candidate)) return false; userName = new UserName(candidate.Trim().ToUpper()); return true; } public static implicit operator string(UserName userName) { return userName.value; } public override string ToString() { return this.value.ToString(); } public override bool Equals(object obj) { var other = obj as UserName; if (other == null) return base.Equals(obj); return object.Equals(this.value, other.value); } public override int GetHashCode() { return this.value.GetHashCode(); } }
As you can tell, this class protects its invariants. In case you were wondering about the use of ToUpper, it turns out that there's also another business rule that states that user names are case-insensitive, and one of the ways you can implement that is by converting the value to upper case letters. All business rules pertaining to user names are now nicely encapsulated in this single class, so you don't need to remember where to apply them to strings.
If you want to know the underlying string, you can either invoke ToString, or take advantage of the implicit conversion from UserName to string. You can also compare two UserName instances, because the class overrides Equals. If you have a string, and want to convert it to a UserName, you can use TryParse.
The original code example above can be refactored to use the UserName class instead:
public IHttpActionResult Get(string candidate) { UserName userName; if (!UserName.TryParse(candidate, out userName)) return this.BadRequest("Invalid user name."); var user = this.repository.FindUser(userName); return this.Ok(user); }
This code has the same complexity as the original example, but now it's much clearer what's going on. You don't have to wonder about what looks like arbitrary rules; they're all nicely encapsulated in the UserName class.
Furthermore, as soon as you've left the not Object-Oriented boundary of your system, you can express the rest of your code in terms of the Domain Model; in this case, the UserName class. Here's the IUserRepository interface's Find method:
User FindUser(UserName userName);
As you can tell, it's expressed in terms of the Domain Model, so you can't accidentally pass it a string. From that point, since you're receiving a UserName instance, you know that it conforms to all business rules encapsulated in the UserName class.
Not only for OOP #
While I've used the term encapsulation once or twice here, this way of thinking is in no way limited to Object-Oriented Programming. Scott Wlaschin describes how to wrap primitives in meaningful types in F#. The motivation and the advantages you gain are the same in Functional F# as what I've described here.
Over-engineering? #
Isn't this over-engineering? A 56 lines of code class instead of a string? Really? The answer to such a question is always context-dependent, but I rarely find that it is (over-engineering). You can create a Value Object like the above UserName class in less than half an hour - even if you use Test-Driven Development. When I created this example, I wrote 27 test cases distributed over five test methods in order to make sure that I hadn't done something stupid. It took me 15 minutes.
You may argue that 15 minutes is a lot, compared to the 0 minutes it would take you if you'd 'just' use a string. On the surface, that seems like a valid counter-argument, but perhaps you're forgetting that with a primitive string, you still need to write validation and business logic 'around' the string, and you have to remember to apply that logic consistently across your entire code base. My guess is that you'll spend more than 15 minutes on doing this, and troubleshooting defects that occur when someone forgets to apply one of those rules to a string in some other part of the code base.
Summary #
Primitive values such as strings, integers, decimal numbers, etc. often represent concepts that are constrained in some ways; they're not just any string, any integer, or any decimal number. Ask yourself if extreme values (like the entire APPP manuscript, Int32.MinValue, and so on) are suitable for such variables. If that's not the case, consider introducing a Value Object instead.
If you want to learn more about Encapsulation, you can watch my Encapsulation and SOLID Pluralsight course.
Comments
Mark, I enjoyed your article. I feel that this is the way to go. Too often have I seen people using arrays if bytes for images and strings for email addresses.
An important aspect of DDD is within the language and when there are people talking about user names all the time, that usually indicates that usernames are an important aspect of what is going on and should be modeled explicitly and not use primitives.
The solution doesn't look overengineered to me, although I had to defend similar code before for the same reason. Not only is it now a good place to put further validation, but also allows further distinction using other types and a polymorphism (if carefully used, they can be a delight), each of which covers different invariants.
A UserName can be moved around and used to retrieve objects. But maybe you can't do that with a class of the type InvalidUsername. In another example, maybe there are functions that deal with email eddresses - which I have often seen modeled as strings. Some accept InvalidEmailAddress objects, others accept ValidEmail addresses. While it doesn't reduce the amount of boilerplate null checks, a SendEmail function wouldn't have to do
... SendEmail(EmailAddress email)
{
if(email.IsValid()) {
email.Send()
}
}
One could simply make it to only accept valid email addresses. Their mere existence would guarantee, that the email address is valid. One could then write functions dealing with valid email addresses and some dealing with invalid ones. You can log and analyze bad account creation requests with the invalid ones, but not send email to them and prevent everyone else from doing so, depending on what kind of email address is being passed around.
Johannes, thank you for writing. You are right, it's possible to take the idea further in the way you describe. Again, such techniques aren't even limited to OOP; for instance, Scott Wlaschin explains how to make illegal states unrepresentable in F#.
In fact, the Functional approach using Sum Types (Discriminated Unions) is nicer, because it doesn't rely on inheritance :)
Interesting article. I have no qualms about creating domain types around primitives per se, but IMHO there should be a bit more complexity in the invariants than a single condition before resorting to that. In this case, I would think that the username constraint should be implemented in the User class. If other entities shared the constraint (e.g. Company.AccountName), the logic might be refactored into a NameValidator, and then referenced concretely by the entities.
A string holding a strong password would be a good candidate for promoting to something smarter.
Mark,
This is exactly right. I honestly have no comment on the concept because I have changed primitives to value objects a number of times and use something similar.
Quick side-note, and this may just be a preference: In the TryParse, a Trim()'d value is being passed into the constructor. So if I want to create two usernames " Suamere " and "Suamere", the second will fail since, unbeknownst to me, the first had been successfully created with Trim(). It seems to me that "Do not allow pre/post white-space" is a business rule. Therefore, if there is white-space, I consider that to be breaking a business rule.
My main motivation for this comment is that this UserName Object shows a perfect example of a pattern I've seen when using the TryParse pattern (or Monads). That is: The logic in the TryParse is frequently an exact duplicate of logic in another method such as the constructor. Though instead of throwing exceptions, TryParse gracefully informs the consumer of success/failure.
So business logic is duplicated in the CTOR(), IsValid(), and the TryParse(). Instead, I consider this pattern:
public UserName(string value) { CheckIsValid(value, true); this.value = value; } public static bool TryParse(string candidate, out UserName userName) { userName = null; if (!CheckIsValid(candidate, false)) return false; userName = new UserName(candidate); return true; } public static bool IsValid(string candidate) { return CheckIsValid(candidate, false); } private static bool CheckIsValid(string candidate, bool throwExceptions) { //Brevity follows, but can be split up into more granular rules and exceptions if (string.IsNullOrWhiteSpace(candidate)) { if (throwExceptions) throw new ArgumentNullException("candidate"); return false; } if (!string.Equals(candidate, candidate.Trim(), StringComparison.OrdinalIgnoreCase)) { if (throwExceptions) throw new ArgumentException("Invalid value.", "candidate"); return false; } return true; }
Let me know what you think!
~Suamere, Steven Fletcher
Mike, thank you for writing. A business rule doesn't have to be complex in order to be important. The point isn't only to encapsulate complex logic, but to make sure that (any) business logic is applied consistently.
In my experience, things often start simple, but evolve (or do they devolve?) into something more complex; when do you draw the line?
Steven, thank you for writing. You exhibit traits of critical thinking, which is always good when dealing with business logic :) In the end, it all boils down to what exactly the business rule is. As you interpret the business rule, you rephrase it as "Do not allow pre/post white-space", but that's not the business rule I had in mind.
Since I'm the author of the blog post, I have the luxury of coming up with the example, and thus, in this case, defining the business rule. The business rule I had in mind is that user names are case-insensitive, and they're also insensitive to leading and trailing white space. Thus, " Suamere " and "Suamere" are considered to be two representations of the same user name, just as "FOO" and "foo" represent the same user name. The canonicalised representation of these two user names are "SUAMERE" and "FOO", respectively.
That's also the reason I chose the name TryParse; it's equivalent to DateTime.TryParse, which should be a well-known .NET idiom. If I parse the two strings "2015-01-21" and " 21. januar 2015 " on my machine (which is running with the da-DK locale), I get the exact same value from both strings. Notice that DateTime.TryParse also blissfully ignores leading and trailing white space. This is all correct according to Postel's law.
If all I'd wanted to do was to convert an already valid string into a UserName instance, I'd have implemented an explicit conversion, which is what Jimmy Bogard did in his article. In fact, such an explicit conversion doesn't conflict with the current TryParse method, so I'd find it quite reasonable to add that as well, if any client had the need for it.
This question might be slightly besides the point, but I'm having trouble with the bit about UserName values being case-insensitive. I take that to mean that you can create a UserName with all uppercase, all lowercase, or any combination thereof, and they'll all be treated equally.
To me, that doesn't mean that you have to pass in an uppercase-only value to the constructor (or TryParse() method, for that matter) in order to successfully instantiate a UserName. Was that your intent?
I created a Gist with some tests to illustrate what I'm talking about: I can instantiate with "JEFF", but not with "jeff", using either the constructor directly or the TryParse method.
Jeff, I attempted to run your unit tests, and the only one that fails is, as expected, if you attempt to use the constructor with "jeff"; TryParse with "jeff" works.
Would it have helped if the constructor was private?
I stand corrected regarding the failing tests; I worked with the tests again, and indeed, only the constructor method failed passing in "jeff" in lowercase.
Making the constructor private would clarify things in that there would only be one way for a client to instantiate UserName, which happens to ensure anything passed into the private constructor is all upper-case.
That seems to better match the stated business rule, that UserName values are case-insensitive, rather than a rule that prohibits anything but upper-case values. In any case, it's not quite central to the point of your article, so I thank you for taking the time to reply anyway.
Jeff, I think we fundamentally agree, and I do understand why you'd find it surprising that the constructor accepts "JEFF", but not "jeff". FWIW, I follow (a lot) of rules (of thumb) when I design APIs, and one of them is that the purpose of constructors is to initialise objects. A constructor may perform validation, but must not perform work in the sense of transforming the input or producing side-effects. In other words, constructors are used to initialise objects with valid values.
Whenever I need to transform data as part of initialisation, I often create some sort of factory method for that explicit purpose. A static TryParse method is one of many options available when such a need arises. The point is exactly that the input may not start out being 'valid', but it can be transformed into a valid representation. Going from "jeff" to UserName("JEFF") is a conversion.
This idea of constructors only accepting valid values may seem a bit foreign at first, but if you follow it as a general principle, you get a code base that's easier to reason about, because it's consistent. It's probably because I'm so used to this principle that I just left the constructor public, following another principle of mine, which is that there's no reason to make anything internal or private if the class or member can properly protect its invariants.
Sorry to rant; you just inspired me to explain some of my underlying design thoughts that caused me to arrive at this particular solution.
Thanks for your post, very informative as usual. I would add a few insights.
As much as it can feel Over engineering, I would totally agree and say it doesn't. I deal with GPU resource creation (Direct3D11) on a daily basis and this has helped me so much and saved me headaches (for example a Buffer size must have more than 0 elements, failing to do so gives you a cryptic runtime error than you can only catch error message by enabling Debug Device and looking at Debug output window!
Using a simple BufferElementCount in that case enforces the fact that I provide a correct value, and throw a meaningful exception early in the process. By passing a BufferElementCount I can (almost) safely expect that my buffer gets created.
On the lines of code, I would expect that the initial 50 lines would largely be overtaken by rewriting the same logic, but scattered across the whole code (plus likely adding try/catch blocks in many places on top of it), so I'm pretty sure doing the 0 lines version would lead to more code at the end. Plus 50 lines of code to save be a deep debug session on half million lines code base is a trade-off I'll happily take!
Also as a little bonus, for simple cases I wrote a little snippet so I guess I would share it here Domain Primitive. Now it takes even less than 2 minutes in most simple cases.
Your Equals is fragile. Because the class is not sealed, one could descend from it and break the equality contract.
Example of how to break through descending:
public class DomaināUserName : UserName{ private string domain; //Equals written to compare both username and domain } UserName u1 = new DomainUserName("BOB", "GITHUB"); UserName u2 = new UserName("BOB"); u1.Equals(u2) //will use DomainUserName's Equals and return false u2.Equals(u1) //will use UserName's Equals and return true
But equality contract must be symmetrical. So either seal the class or else compare GetTypes() as example here: msdn.microsoft.com/en-us/library/vstudio/336aedhh%28v=vs.100%29.aspx
Alan, thank you for writing. You're right; I'd never considered this - most likely because I never use inheritance. Therefore, my default coping strategy would most likely be to make the class sealed by default, but otherwise, explicitly comparing GetType() return values would have to do.
Thank you for teaching me something today!
Great article and completely agree on having primitive types wrapped up into its own class. I had few questions around how this would be applied to provide a good validation feedback to consuming clients. If we want to provide more details on why the 'user name is invalid', like say 'User name should not be small letters' and 'User name should not have trailing spaces' how would we handle this in the class. How would the class communicate out the details of the rules that it is abstracting to its consuming clients.
When accepting multiple such 'classes' in a endpoint, what would be a suggested validation approach. For instance if we are to look at a similar endpoint, which takes in 'Name' and 'Phone' number(or even more in the case of POST endpoint where we are creating a new user), wouldnt the controller soon be overladed with a lot of such TryParse calls.
From the above comment of constructor only accepting valid values, where would it make sense to use the constructor as opposed to TryParse. Is it at the persistence boundary of the application?
Rahul, thank you for writing. As a general rule, at the boundary of an application, all input must be considered evil until proven otherwise. That means that if you try to pass unvalidated input values into constructors, you should expect exceptions to be thrown on a regular basis. Since exceptions are for exceptional cases, that's most likely not the best way to go forward.
Instead, domain objects could offer Validate methods. These would be a bit like TryParse methods, but instead of returning a primitive boolean value, they'd have to return a list of messages. If there are no messages, the input is good. If there are messages, the input is bad. You can encapsulate such a validation result in an object if you think the rule about messages/no messages is too implicit. Such Validate methods could still take out
parameters like TryParse methods if you want to validate and potentially convert into a domain object in one go.
This isn't particularly elegant in languages like C# or Java, but due to the lack of sum types in these languages, that's the best we can do.
In languages with sum types, you can use the Either monad and applicative composition to address this type of problem much more elegantly. A good place to start would be Scott Wlaschin's article on Railway Oriented Programming.
Mark, implicit conversion operator should never throw exception. In the example, it can throw NullReferenceException when UserName is null. What would you recommend to fix that issue?
Eugene, thank you for writing. That's probably a good rule, but I admit I hadn't thought about it. One option is to convert it to an explicit conversion instead. Another option is to forgo the conversion operators all together. The conversion ability isn't the important point in this article.
A third option is to leave it as is. In the code bases I control, null is never an appropriate value, so if null appears anywhere, I'd consider it a bug somewhere else.
To be realistic, though, I think I'd prefer one of the two first options, as I tend to agree with the Zen of Python: Explicit is better than implicit.
Insightful post Mark!
I have a question, if we are implementing a class that implements Parse and TryParse menthods and this class requires
dependency, how should that be passed?
Consider following example - MobileNumber requires a list of valid area codes, I am passing this dependency in the
constructor and Parse and TryParse method:
public class MobileNumber { public MobileNumber(IEnumerable<string> areaCodes, string countryCode, string phoneNumber) { /* Something */ } public static MobileNumber Parse(IEnumerable<string> areaCodes, string countryCode, string phoneNumber) { /* Something */ } public static bool TryParse(IEnumerable<string> areaCodes, string countryCode, string phoneNumber, out MobileNumber mobileNumber) { /* Something */ } }Is it a right design?
Moiz, thank you for writing. Is it the right design? As usual, It Dependsā¢. Which problem are you trying to solve? What does the input look like? What should the result be of attempting to parse various examples?
It can often be illuminating to write a couple of parametrised tests. What would such tests look like?
A benefit of this technique is that it doesn't prescribe where the business logic should be; only where
- access to it
10 tips for better Pull Requests
Making a good Pull Request involves more than writing good code.
The Pull Request model has turned out to be a great way to build software in teams - particularly for distributed teams; not only for open source development, but also in enterprises. Since some time around 2010, I've been reviewing Pull Requests both for my open source projects, but also as a team member for some of my customers, doing closed-source software, but still using the Pull Request work flow internally.
During all of that time, I've seen many great Pull Requests, and some that needed some work.
A good Pull Request involves more than just some code. In most cases, there's one or more reviewer(s) involved, who will have to review your Pull Request in order to evaluate whether it's a good fit for inclusion in the code base. Not only must you produce good code, but you must also cater to the person(s) doing the review.
Here's a list of tips to make your Pull Request better. It isn't exhaustive, but I think it addresses some of the more important aspects of creating a good Pull Request.
1. Make it small #
A small, focused Pull Request gives you the best chance of having it accepted.
The first thing I do when I get a notification about a Pull Request is that I look it over to get an idea about its size. It takes time to properly review a Pull Request, and in my experience, the time it takes is exponential to the size; the relationship certainly isn't linear.
If I get a big Pull Request for an open source project, I do realize that the submitter has most likely already put in substantial work in his or her spare time, so I do go to some lengths to review a big Pull Request, even if I think it's too big - particularly when it looks like it's a first-time contributor. Still, if the Pull Request is big, I'll need to schedule time to review it: I can't review a big chunk of code using five minutes here and five minutes there; I need contiguous time to do that. This already introduces a delay into the review process.
If I get a big Pull Request in a professional setting (i.e. where the submitter is being paid to write the code), I often reject the Pull Request simply because of the size of it.
How small is small enough? Obviously, it depends on what the Pull Request is about, but a Pull Request that touches less than a dozen files isn't too bad.
2. Do only one thing #
Just as the Single Responsibility Principle states that a class should have only one responsibility, so should a Pull Request address only a single concern.
Imagine, as a counter-example, that you submit a Pull Request that addresses three independent, separate concerns (let's call them A, B, and C). The reviewer may immediately agree with you that A and C are valid concerns, and that your solution is correct. However, the reviewer has issues with your B concern. Perhaps he or she thinks it's not a concern at all, or she disagrees with the way you've addressed it.
This becomes the start of a lengthy discussion about concern B, and how it's being addressed. This discussion can go on for days (particularly if you're in different time zones), while you attempt to come to agreement; perhaps you'll need to make changes to your Pull Request to address the reviewer's concerns. This all takes time.
It may, in fact, take so much time that other commits have been merged into master in the meantime, and your Pull Request has fallen so much behind that it no longer can be automatically merged. Welcome to Merge Hell.
All that time, your perfectly acceptable solutions to the A and C concerns are sitting idly in your Pull Request, adding absolutely no value to the overall code base.
Instead, submit three independent Pull Requests that address respectively A, B, and C. If you do that, the reviewer who agrees with A and C will immediately accept two of those three Pull Requests. In this way, your non-controversial contributions can immediately add value to the code base.
The more concerns you address in a single Pull Request, the bigger the risk that at least one of them will block acceptance of your contribution. Do only one thing per Pull Request. It also helps you make each Pull Request smaller.
3. Watch your line width #
The reviewer of your Pull Request will most likely be reviewing your contribution using a diff tool. Both GitHub and Stash provide browser-based diff views for reviewing. A reviewer can even configure the diff view to be side-by-side; it makes it much easier to understand what changes are included in the contribution, but it also means that the code must be readable on half a screen.
If you have wide lines, you force the reviewer to scroll horizontally.
There are many reasons to keep line width below 80 characters; making your code easy to review just adds another reason to that list.
4. Avoid re-formatting #
You may feel the urge to change the formatting of the existing code to fit 'your' style. Please abstain.
Every byte you change in the source code shows up in the diff views. Some diff viewers have options to ignore changes of white space, but even with this option on, there are limits to what those diff viewers can ignore. Particularly, they can't ignore if you move code around, so please don't do that.
If you really need to address white space issues, move code around within files, change formatting, or do other stylistic changes to the code, please do so in an isolated pull request that does only that, and state so in your Pull Request comment.
5. Make sure the code builds #
Before submitting a Pull Request, build it on your own machine. True, works on my machine isn't particularly useful, but it's a minimum bar. If it doesn't work on your machine, it's unlikely to work on other machines as well.
Watch out for compiler warnings. They may not prevent you from compiling, so you may not notice them if you don't explicitly look for them. However, if your Pull Request causes (more) compiler warnings, a reviewer may reject it; I do.
If the project has a build script, try to run that, and only submit your pull request if the build succeeds. In many of my open source projects, I have a build script that (among other things) treats warnings as errors. Such a build script may automate or implement various rules for that particular code base. Use it before submitting, because the reviewer most likely will use it before merging your branch.
6. Make sure all tests pass #
Assuming that the code base in question has automated tests, make sure all tests pass before submitting a Pull Request.
This should go without saying, but I regularly receive Pull Requests where one or more tests are failing.
7. Add tests #
Again, assuming that the code in question already has automated (unit) tests, do add tests for the code you submit.
It doesn't often happen that I receive a Pull Request without tests, but when I do, I often reject it.
This isn't a hard rule. There are various cases where you may need to add code without test coverage (e.g. when adding a Humble Object), but if it can be tested, it should be tested.
You'll need to follow the testing strategy already established for the code base in question.
8. Document your reasoning #
Self-documenting code rarely is.
Yes, code comments are apologies, and I definitely prefer well-named operations, types, and values over comments. Still, when writing code, you often have to make decisions that aren't self-evident (particularly when dealing with Business 'Logic').
Document why you wrote the code in the way you did; not what it does.
My preferred priority is this:
- Self-documenting code: You can make some decisions about the code self-documenting. Clean Code is literally a book on how to do that.
- Code comments: If you can't make the code sufficiently self-documenting, add a code comment. At least, the comment is co-located with the code, so even in the unlikely event that you decide to change version control system, the comment is still preserved. Here's an example where I found a comment more appropriate than attempting to design my way out of the problem.
- Commit messages: Most version control systems give you the opportunity to write a commit message. Most people don't bother putting anything other than a bare minimum into these, but you can document your reasoning here as well. Sometimes, you'll need to explain why you're doing things in a certain order. This doesn't fit well in code comments, but is a good fit for a commit message. As long as you keep using the same version control system, you preserve these commit messages, but they're once removed from the actual source code, and you may loose the messages if you change to another source control system. Here's an example where I felt the need to write an extensive commit message, but I don't always do that.
- Pull Request comments: Rarely, you may find yourself in a situation where none of the above options are appropriate. In Pull Request management systems such as GitHub or Stash, you can also add custom messages to the Pull Request itself. This message is twice removed from the actual source code, and will only persist as long as you keep using the same host. If you move from e.g. CodePlex to GitHub, you'll loose those Pull Request messages. Still, occasionally, I find that I need to explain myself to the reviewer, but the explanation involves something external to the source code anyway. Here's an example where I found that a reasonable approach.
9. Write well #
Write good code, but also write good prose. This is partly subjective, but there are rules for both code and prose. Code has correctness rules: if you break them, it doesn't compile (or, for interpreted languages, it fails at run-time).
The same goes for the prose you may add: Code comments. Commit messages. Pull Request messages.
Please use correct spelling, grammar, and punctuation. If you don't, your prose is harder to understand, and your reviewer is a human being.
10. Avoid thrashing #
Sometimes, a reviewer will point out various issues with your Pull Request, and you'll agree to address them.
This may cause you to add more commits to your Pull Request branch. There's nothing wrong with that per se. However, this can lead to unwarranted thrashing.
As an example, your pull request may contain five commits: A, B, C, D, and E. The reviewer doesn't like what you did in commits B and C, so she asks you to remove that code. Most people do that by checking out their pull request branch and deleting the offending code, adding yet another commit (F) to the commit list: [A, B, C, D, E, F]
Why should we have to merge a series of commits that first adds unwanted code, and then removes it again? It's just thrashing; it doesn't add any value.
Instead, remove the offending commits, and force push your modified branch: [A, D, E]. While under review, you're the sole owner of that branch, so you can modify and force push it all you want.
Another example of thrashing that I see a lot is when a Pull Request is becoming old (often due to lengthy discussions): in these cases, the author regularly merges his or her branch with master to keep the Pull Request branch up to date.
Again: why do I have to look at all those merge commits? You are the sole owner of that branch. Just rebase your Pull Request branch and force push it. The resulting commit history will be cleaner.
Summary #
One or more persons will review your Pull Request. Don't make your reviewer work.
The more you make your reviewer work, the greater the risk is that your Pull Request will be rejected.
Comments
How do you balance the advice to write small, focused Pull Requests with the practical necessity of sometimes bundling refactoring in with features? Especially given the fact that most workplaces inevitably prioritise merging features.
Sam, thank you for writing. Even without refactoring, it's common that a feature is so large that you can't implement it as a single, focused pull request. The best way to address that issue is to hide the work in progress behind a feature flag. You can do the same with refactoring.
As Kent Beck puts it:
You may need to first refactor to 'make room' for the new feature. I'd often put that in an isolated pull request and send that first. If anyone complains that I'm doing refactoring work instead of feature work, I'd truthfully respond that I'm doing the refactoring in order to be able to implement the feature."for each desired change, make the change easy (warning: this may be hard), then make the easy change"
I consider this to be part of being professional. It's how software should be developed, and I think that non-technical stakeholders should have little to say about how things are done. You don't have to tell them every little detail about how you write code. You shouldn't have to ask for permission to do this, and you shouldn't have to inform them that that's what you're doing.
My new book contains a realistic and practical example of a feature developed behind a feature flag.
Diamond kata with FsCheck
This post is a walk-through of doing the Diamond kata with FsCheck.
Recently, Nat Pryce tweeted:
The diamond kata, TDD'd only with property-based tests. https://github.com/npryce/property-driven-diamond-kata One commit for each step: add test/make test pass/refactorThis made me curious. First, I'd never heard about the Diamond kata, and second, I find Property-Based Testing quite interesting these days.
Digging a bit lead me to a blog post by Seb Rose; the Diamond kata is extremely easy to explain:
After having thought about it a little, I couldn't even begin to see how one could approach this problem using Property-Based Testing. It struck me as a problem inherently suited for Example-Driven Development, so I decided to do that first.Given a letter, print a diamond starting with āAā with the supplied letter at the widest point.
For example: print-diamond āCā prints
A B B C C B B A
Example-Driven Development #
The no-brain approach to Example-Driven Development is to start with 'A', then 'B', and so on. Exactly as Seb Rose predicts, when you approach the problem like this, when you reach 'C', it no longer seems reasonable to hard-code the responses, but then the entire complexity of the problem hits you all at once. It's quite hard to do incremental development by going through the 'A', 'B', 'C' progression.
This annoyed me, but I was curious about the implementation, so I spent an hours or so toying with making the 'C' case pass. After this, on the other hand, I had an implementation that works for all letters A-Z.
Property-Driven Development #
On my commute it subsequently struck me that solving the Diamond kata with Example-Driven Development taught me a lot about the problem itsef, and I could easily come up with the first 10 properties about it.
Therefore, I decided to give the kata another try, this time with FsCheck. I also wanted to see if it would be possible to make the development more incremental; while I didn't follow the Transformation Priority Premise (TPP) to the letter, I was inspired by it. My third 'rule' was to use Devil's Advocate to force me to write properties that completely describe the problem.
Ice Breaker #
To get started, it can be a good idea to write a simple (ice breaker) test, because there's always a little work involved in getting everything up and running. To meet that goal, I wrote this almost useless property:
[<Property(QuietOnSuccess = true)>] let ``Diamond is non-empty`` (letter : char) = let actual = Diamond.make letter not (String.IsNullOrWhiteSpace actual)
It only states that the string returned from the Diamond.make function isn't an empty string. Using Devil's Advocate, I created this implementation:
let make letter = "Devil's advocate."
This hard-coded result satisfies the single property. However, the only reason it works is because it ignores the input.
Constraining the input #
The kata only states what should happen for the inputs A-Z, but as currently written, FsCheck will serve all sorts of char values, including white space and funny characters like '<', ']', '?', etc. While I could write a run-time check in the make function, and return None
upon invalid input, I am, after all, only doing a kata, so I'd rather want to tell FsCheck to give me only the letters A-Z. Here's one way to do that:
type Letters = static member Char() = Arb.Default.Char() |> Arb.filter (fun c -> 'A' <= c && c <= 'Z') type DiamondPropertyAttribute() = inherit PropertyAttribute( Arbitrary = [| typeof<Letters> |], QuietOnSuccess = true) [<DiamondProperty>] let ``Diamond is non-empty`` (letter : char) = let actual = Diamond.make letter not (String.IsNullOrWhiteSpace actual)
The Letters type redefines how char values are generated, using the default generator of char, but then filtering the values so that they only fall in the range [A-Z].
To save myself from a bit of typing, I also defined the custom DiamondPropertyAttribute that uses the Letters type, and used it to adorn the test function instead of FsCheck's built-in PropertyAttribute.
Top and bottom #
Considering the TPP, I wondered which property I should write next, since I wanted to define a property that would force me to change my current implementation in the right direction, but only by a small step.
A good candidate seemed to state something about the top and bottom of the diamond: the first and the last line of the diamond must always contain a single 'A'. Here's how I expressed that in code:
let split (x : string) = x.Split([| Environment.NewLine |], StringSplitOptions.None) let trim (x : string) = x.Trim() [<DiamondProperty>] let ``First row contains A`` (letter : char) = let actual = Diamond.make letter let rows = split actual rows |> Seq.head |> trim = "A" [<DiamondProperty>] let ``Last row contains A`` (letter : char) = let actual = Diamond.make letter let rows = split actual rows |> Seq.last |> trim = "A"
Notice that I wrote the test-specific helper functions split
and trim
in order to make the code a bit more readable, and that I also decided to define the property for the top of the diamond separately from the property for the bottom.
In the degenerate case where the input is 'A', the first and the last rows are identical, but the properties still hold.
Using the Devil's Advocate, this implementation passes all properties defined so far:
let make letter = " A "
This is slightly better, but I purposely placed the 'A' slightly off-centre. In fact, the entire hard-coded string is 16 characters wide, so it can never have a single, centred letter. The next property should address this problem.
Vertical symmetry #
A fairly important property of the diamond is that it must be symmetric. Here's how I defined symmetry over the vertical axis:
let leadingSpaces (x : string) = let indexOfNonSpace = x.IndexOfAny [| 'A' .. 'Z' |] x.Substring(0, indexOfNonSpace) let trailingSpaces (x : string) = let lastIndexOfNonSpace = x.LastIndexOfAny [| 'A' .. 'Z' |] x.Substring(lastIndexOfNonSpace + 1) [<DiamondProperty>] let ``All rows must have a symmetric contour`` (letter : char) = let actual = Diamond.make letter let rows = split actual rows |> Array.forall (fun r -> (leadingSpaces r) = (trailingSpaces r))
Using the two new helper functions, this property states that the diamond should have a symmetric contour; that is, that it's external shape should be symmetric. The property doesn't define what's inside of the diamond.
Again, using the Devil's Advocate technique, this implementations passes all tests:
let make letter = " A "
At least the string is now symmetric, but it feels like we aren't getting anywhere, so it's time to define a property that will force me to use the input letter.
Letters, in correct order #
When considering the shape of the required diamond, we know that the first line should contain an 'A', the next line should contain a 'B', the third line a 'C', and so on, until the input letter is reached, after which the order is reversed. Here's my way of stating that:
[<DiamondProperty>] let ``Rows must contain the correct letters, in the correct order`` (letter : char) = let actual = Diamond.make letter let letters = ['A' .. letter] let expectedLetters = letters @ (letters |> List.rev |> List.tail) |> List.toArray let rows = split actual expectedLetters = (rows |> Array.map trim |> Array.map Seq.head)
The expression let letters = ['A' .. letter]
produces a list of letters up to, and including, the input letter. As an example, if letter
is 'D', then letters
will be ['A'; 'B'; 'C'; 'D']
. That's only the top and middle parts of the diamond, but we can use letters
again: we just have to reverse it (['D'; 'C'; 'B'; 'A']
) and throw away the first element ('D') in order to remove the duplicate in the middle.
This property is still quite loosely stated, because it only states that each row's first non-white space character should be the expected letter, but it doesn't say anything about subsequent letters. The reason I defined this property so loosely was that I didn't want to force too many changes on the implementation at once. The simplest implementation I could think of was this:
let make letter = let letters = ['A' .. letter] let letters = letters @ (letters |> List.rev |> List.tail) letters |> List.map string |> List.reduce (fun x y -> sprintf "%s%s%s" x System.Environment.NewLine y)
It duplicates the test code a bit, because it reuse the algorithm that generates the desired sequence of letters. However, I'm not too concerned about the occasional DRY violation.
For the input 'D', this implementation produces this output:
A B C D C B A
All properties still hold. Obviously this isn't correct yet, but I was happy that I was able to define a property that led me down a path where I could take a small, controlled step towards a more correct solution.
As wide as it's high #
While I already have various properties that examine the white space around the letters, I've now temporarily arrived at an implementation entirely without white space. This made me consider how I could take advantage of those, and combine them with a new property, to re-introduce the second dimension to the figure.
It's fairly clear that the figure must be as wide as it's high, if we count both width and height in number of letters. This property is easy to define:
[<DiamondProperty>] let ``Diamond is as wide as it's high`` (letter : char) = let actual = Diamond.make letter let rows = split actual let expected = rows.Length rows |> Array.forall (fun x -> x.Length = expected)
It simply verifies that each row has exactly the same number of letters as there are rows in the figure. My implementation then became this:
let make letter = let makeLine width letter = match letter with | 'A' -> let padding = String(' ', (width - 1) / 2) sprintf "%s%c%s" padding letter padding | _ -> String(letter, width) let letters = ['A' .. letter] let letters = letters @ (letters |> List.rev |> List.tail) let width = letters.Length letters |> List.map (makeLine width) |> List.reduce (fun x y -> sprintf "%s%s%s" x Environment.NewLine y)
This prompted me to introduce a private makeLine function, which produces the line for a single letter. It has a special case to handle the 'A', since this value is the only value where there's only a single letter on a line. For all other letters, there will be two letters - eventually with spaces between them.
This seemed a reasonable rationale for introducing a branch in the code, but after having completed the kata, I can see that Nat Pryce has a more elegant solution.
If the input is 'D' the output now looks like this:
A BBBBBBB CCCCCCC DDDDDDD CCCCCCC BBBBBBB A
There's still not much white space in the implementation, but at least we regained the second dimension of the figure.
Inner space #
The next incremental change I wanted to introduce was the space between two letters. It seemed reasonable that this would be a small step for the makeLine function.
[<DiamondProperty>] let ``All rows except top and bottom have two identical letters`` (letter : char) = let actual = Diamond.make letter let isTwoIdenticalLetters x = let hasIdenticalLetters = x |> Seq.distinct |> Seq.length = 1 let hasTwoLetters = x |> Seq.length = 2 hasIdenticalLetters && hasTwoLetters let rows = split actual rows |> Array.filter (fun x -> not (x.Contains("A"))) |> Array.map (fun x -> x.Replace(" ", "")) |> Array.forall isTwoIdenticalLetters
The property itself simply states that each row must consist of exactly two identical letters, and then white space to fill out the shape. The way to verify this is to first replace all spaces with the empty string, and then examine the remaining string. Each remaining string must contain exactly two letters, so its length must be 2, and if you perform a distinct
operation on its constituent char values, the resulting sequence of chars should have a length of 1.
This property only applies to the 'internal' rows, but not the top and bottom rows that contain a single 'A', so these rows are filtered out.
The new property itself only states that apart from the 'A' rows, each row must have exactly two identical letters. Because the tests for the 'A' rows, together with the tests for symmetric contours, already imply that each row must have a width of an uneven number, and again because of the symmetric contour requirement, I had to introduce at least a single space between the two characters.
let make letter = let makeLine width letter = match letter with | 'A' -> let padding = String(' ', (width - 1) / 2) sprintf "%s%c%s" padding letter padding | _ -> let innerSpace = String(' ', width - 2) sprintf "%c%s%c" letter innerSpace letter let letters = ['A' .. letter] let letters = letters @ (letters |> List.rev |> List.tail) let width = letters.Length letters |> List.map (makeLine width) |> List.reduce (fun x y -> sprintf "%s%s%s" x Environment.NewLine y)
Using the Devil's Advocate technique, it seems that the simplest way of passing all tests is to fill out the inner space completely. Here's an example of calling Diamond.make 'D' with the current implementation:
A B B C C D D C C B B A
Again, I like how this new property enabled me to do an incremental change to the implementation. Visually, we can see that the figure looks 'more correct' than it previously did.
Bottom triangle #
At this point I thought that it was appropriate to begin to address the diamond shape of the figure. After having spent some time considering how to express that without repeating the implementation code, I decided that the easiest step would be to verify that the lower left space forms a triangle.
[<DiamondProperty>] let ``Lower left space is a triangle`` (letter : char) = let actual = Diamond.make letter let rows = split actual let lowerLeftSpace = rows |> Seq.skipWhile (fun x -> not (x.Contains(string letter))) |> Seq.map leadingSpaces let spaceCounts = lowerLeftSpace |> Seq.map (fun x -> x.Length) let expected = Seq.initInfinite id spaceCounts |> Seq.zip expected |> Seq.forall (fun (x, y) -> x = y)
This one is a bit tricky. It examines the shape of the lower left white space. Getting that shape itself is easy enough, using the previously defined leadingSpaces helper function. For each row, spaceCounts
contains the number of leading spaces.
The expected
value contains an infinite sequence of numbers, {0; 1; 2; 3; 4; ...} because, due to the random nature of Property-Based Testing, I don't know exactly how many numbers to expect.
Zipping an infinite sequence with a finite sequence matches elements in each sequence, until the shortest sequence (that would be the finite sequence) ends. Each resulting element is a tuple, and if the lower left space forms a triangle, the sequence of tuples should look like this: {(0, 0); (1, 1); (2, 2); ...}. The final step in the property is therefore to verify that all of those tuples have identical elements.
The implementation uses Devil's Advocate, and goes quite a bit out of its way to make the top of the figure wrong. As you'll see shortly, it will actually be a simpler implementation to keep the figure symmetric around the horizontal axis as well, but we should have that as an explicit property.
let make letter = let makeLine width (letter, letterIndex) = match letter with | 'A' -> let padding = String(' ', (width - 1) / 2) sprintf "%s%c%s" padding letter padding | _ -> let innerSpaceWidth = letterIndex * 2 - 1 let padding = String(' ', (width - 2 - innerSpaceWidth) / 2) let innerSpace = String(' ', innerSpaceWidth) sprintf "%s%c%s%c%s" padding letter innerSpace letter padding let indexedLetters = ['A' .. letter] |> Seq.mapi (fun i l -> l, i) |> Seq.toList let indexedLetters = ( indexedLetters |> List.map (fun (l, _) -> l, 1) |> List.rev |> List.tail |> List.rev) @ (indexedLetters |> List.rev) let width = indexedLetters.Length indexedLetters |> List.map (makeLine width) |> List.reduce (fun x y -> sprintf "%s%s%s" x Environment.NewLine y)
The main change here is that now each letter is being indexed, but then I deliberately throw away the indexes for the top part, in order to force myself to add yet another property later. While I could have skipped this step, and gone straight for the correct solution at this point, I was, after all, doing a kata, so I also wanted to write one last property.
The current implementation produces the figure below when Diamond.make is called with 'D':
A B B C C D D C C B B A
The shape is almost there, but obviously, the top is wrong, because I deliberately made it so.
Horizontal symmetry #
Just as the figure must be symmetric over its vertical axis, it must also be symmetric over its horizontal axis:
[<DiamondProperty>] let ``Figure is symmetric around the horizontal axis`` (letter : char) = let actual = Diamond.make letter let rows = split actual let topRows = rows |> Seq.takeWhile (fun x -> not (x.Contains(string letter))) |> Seq.toList let bottomRows = rows |> Seq.skipWhile (fun x -> not (x.Contains(string letter))) |> Seq.skip 1 |> Seq.toList |> List.rev topRows = bottomRows
This property finally 'allows' me to simplify my implementation:
let make letter = let makeLine width (letter, letterIndex) = match letter with | 'A' -> let padding = String(' ', (width - 1) / 2) sprintf "%s%c%s" padding letter padding | _ -> let innerSpaceWidth = letterIndex * 2 - 1 let padding = String(' ', (width - 2 - innerSpaceWidth) / 2) let innerSpace = String(' ', innerSpaceWidth) sprintf "%s%c%s%c%s" padding letter innerSpace letter padding let indexedLetters = ['A' .. letter] |> Seq.mapi (fun i l -> l, i) |> Seq.toList let indexedLetters = indexedLetters @ (indexedLetters |> List.rev |> List.tail) let width = indexedLetters.Length indexedLetters |> List.map (makeLine width) |> List.reduce (fun x y -> sprintf "%s%s%s" x Environment.NewLine y)
Calling Diamond.make with 'D' now produces:
A B B C C D D C C B B A
It works with other letters, too.
Summary #
It turned out to be an interesting exercise to do this kata with Property-Based Testing. To me, the most surprising part was that it was much easier to approach the problem in an incremental fashion than it was with Example-Driven Development.
If you're interested in perusing the source code, including my detailed, step-by-step commit remarks, it's on GitHub. If you want to learn more about Property-Based Testing, you can watch my Introduction to Property-based Testing with F# Pluralsight course. There are more examples in some of my other F# Pluralsight courses - particularly Type-Driven Development with F#.
Composition Root Reuse
A Composition Root is application-specific. It makes no sense to reuse it across code bases.
At regular intervals, I get questions about how to reuse Composition Roots. The short answer is: you don't.
Since it appears to be a Frequently Asked Question, it seems that a more detailed answer is warranted.
Composition Roots define applications #
A Composition Root is application-specific; it's what defines a single application. After having written nice, decoupled code throughout your code base, the Composition Root is where you finally couple everything, from data access to (user) interfaces.
Here's a simplified example:
public class CompositionRoot : IHttpControllerActivator { public IHttpController Create( HttpRequestMessage request, HttpControllerDescriptor controllerDescriptor, Type controllerType) { if(controllerType == typeof(ReservationsController)) { var ctx = new ReservationsContext(); var repository = new SqlReservationRepository(ctx); request.RegisterForDispose(ctx); request.RegisterForDispose(repository); return new ReservationsController( new ApiValidator(), repository, new MaƮtreD(10), new ReservationMapper()); } // Handle more controller types here... throw new ArgumentException( "Unknown Controller type: " + controllerType, "controllerType"); } }
This is an excerpt of a Composition Root for an ASP.NET Web API service, but the hosting framework isn't particularly important. What's important is that this piece of code pulls in objects from all over an application's code base in order to compose the application.
The ReservationsContext class derives from DbContext; an instance of it is injected into a SqlReservationRepository object. From this, it's clear that this application uses SQL Server for it data persistence. This decision is hard-coded into the Composition Root. You can't change this unless you recompile the Composition Root.
At another boundary of this hexagonal/onion/whatever architecture is a ReservationsController, which derives from ApiController. From this, it's clear that this application is a REST API that uses ASP.NET Web API for its implementation. This decision, too, is hard-coded into the Composition Root.
From this, it should be clear that the Composition Root is what defines the application. It combines all the loosely coupled components into an application: software that can be executed in an OS process.
It makes no more sense to attempt to reuse a Composition Root than it does attempting to 'reuse' an application.
Container-based Composition Roots #
The above example uses Pure DI in order to explain how a Composition Root is equivalent to an application definition. What about DI Container-based Composition Roots, then?
It's essentially the same story, but with a twist. Imagine that you want to use Castle Windsor to compose your Web API; I've already previously explained how to do that, but here's the IHttpControllerActivator implementation repeated:
public class WindsorCompositionRoot : IHttpControllerActivator { private readonly IWindsorContainer container; public WindsorCompositionRoot(IWindsorContainer container) { this.container = container; } public IHttpController Create( HttpRequestMessage request, HttpControllerDescriptor controllerDescriptor, Type controllerType) { var controller = (IHttpController)this.container.Resolve(controllerType); request.RegisterForDispose( new Release( () => this.container.Release(controller))); return controller; } private class Release : IDisposable { private readonly Action release; public Release(Action release) { this.release = release; } public void Dispose() { this.release(); } } }
Indeed, there's no application-specific code there; this class is completely reusable. However, this reusability is accomplished because the container is injected into the object. The container itself must be configured in order to work, and configuration code is just as application-specific as the above Pure DI example.
Thus, with a DI Container, you may be able to decouple and reuse the part of the Composition Root that performs the run-time composition. However, the design-time composition is where you put the configuration code that select which concrete types should be used in which cases. This is the part that specifies the application, and that's not reusable.
Summary #
Composition Roots aren't reusable.
Sometimes, you might be building a suite of applications that share a substantial subset of dependencies. Imagine, for example, that you're building a REST API and a batch job that together solve a business problem. The Domain Model and the Data Access Layer may be shared between these two applications, but the boundary isn't. The batch job isn't going to need any Controllers, and the REST API isn't going to need whatever is going to kick off the batch job (e.g. a scheduler). These two applications may have a lot in common, but they're still different. Thus, they can't share a Composition Root.
What about the DRY principle, then?
Different applications have differing Composition Roots. Even if they start with substantial amounts of duplicated Composition Root code, in my experience, the 'common' code tends to diverge the more the applications are allowed to evolve. That duplication may be accidental, and attempting to eliminate it may increase coupling. It's better to understand why you want to avoid duplication in the first place.
When it comes to Composition Roots, duplication tends to be accidental. Composition Roots aren't reusable.
Comments
In the example where the domain layer and the data access layer are shared between applications, it really depends whether they will have a future need to be wired differently according to the application.
If that's not the case, you could argue that there's no need for two components either.
One solution would be to use a Facade which internally wires them up and exposes it as a single interface. This allows for reuse of the domain and the DAL as a single component, opaque to the client.
Kenneth, thank you for writing. If you create a Facade that internally wires up a Domain Model and DAL, you've created a tightly coupled bundle. What if you later discover that you need to instrument your DAL? You can't do that if you prematurely compose object graphs.
Clearly, there are disadvantages to doing such a thing. What are the advantages?
Dear Mark,
Thank you for another excellent post.
Some of the composition roots I wrote became really large, too, especially the more complex the application got. In these cases I often introduced factories and builders that created parts of the final object graph and the imperative statements in the composition root just wired these subgraphs together. And this leads me to my actual question:
What do you think about pushing these factories and builders to a library so that they can be reused by different composition roots? This would result in much less code that wires the object graph. I know that abstract factories belong to the client, but one could provide Simple Factories / Factory Methods or, even better I think, the Builder Pattern is predestined for these situations, because it provides default dependencies that can be exchanged before the Build method is called.
If I think about automated tests, I can see that the concept I just described is already in use: every test has an arrange phase which in my opinion is the composition root for the test. To share arrange code between tests, one can use e.g. builders or fixture objects (or of course AutoFixture ;-) ). This reduces the code needed for the arrange phase and improves readability.
What is your opinion on pushing Simple Factories / Builders to a reusable library?
With regards,
Kenny
In the production environment, the service is deployed as a Windows Service, but for easier testing and development, we also have a console application that hosts the same service. Both entry points are basically the same composition root and should use the same components. We want to share the composition root, because otherwise when a component needs a new dependency and a constructor parameter is added, we'd have to change the same code in two different places.
Kenny, Daniel, thank you for writing. In an attempt to answer your questions, I wrote a completely new blog post on the subject; I hope it answers some of your questions, although I suppose you aren't going to like the answer.
If you truly have a big Composition Root, it may be one of the cases where you should consider adopting a DI Container with a convention-based configuration.
Another option is to make your 'boundary library' (e.g. your UI, or your RESTful endpoints) a library in itself, and host that library in various contexts. The Composition Root resides in that boundary library, so various hosts can 'reuse' it, as they reuse the entire application. You can see an example of this in my Outside-In Test-Driven Development Pluralsight course.
Dear Mark, thank you for writing the other blog post. Now I finally have time to answer.
I think in our specific situation there is a subtle but important difference: Rather than reusing a composition root between two libraries, we essentially have two different ways of hosting the same library (and thus composition root). One is used primarily during development, while the other is used in production, but essentially they are (and always will be) the same thing. So in my opinion it would even be dangerous to not use the same composition root here, as it would just add a source of errors if you change the dependency graph setup in one project but forget to do so in the other.
But I guess this is what you meant with creating a boundary library, isn't it?
Daniel, if I understand you correctly, that's indeed the same technique that I demonstrate in my Outside-In Test-Driven Development Pluralsight course. Although it may only be me splitting hairs, I don't consider that as reusing the Composition Root as much as it's deploying the same application to two different hosts; one of the hosts just happens to be a test harnes :)
Placement of Abstract Factories
Where should you define an Abstract Factory? Where should you implement it? Not where you'd think.
An Abstract Factory is one of the workhorses of Dependency Injection, although it's a somewhat blunt instrument. As I've described in chapter 6 of my book, whenever you need to map a run-time value to an abstraction, you can use an Abstract Factory. However, often there are more sophisticated options available.
Still, it can be a useful pattern, but you have to understand where to define it, and where to implement it. One of my readers ask:
These are good question that deserve a more thorough treatment than a tweet in reply.
Situation #
Based on the above questions, I imagine that the situation can be depicted like this:
Two or more applications share a common library. These applications may be a web service and a batch job, a web site and a mobile app, or any other combination that makes sense to you.
Defining the Abstract Factory #
Where should an Abstract Factory be defined? In order to answer this question, first you must understand what an Abstract Factory is. Essentially, it's an interface (or Abstract Base Class - it's not important) that looks like this:
public interface IFactory<T> { T Create(object context); }
Sometimes, the Abstract Factory is a non-generic interface; sometimes it takes more than a single parameter; often, the parameter(s) have stronger types than object
.
Where do interfaces go?
From Agile Principles, Patterns, and Practices, chapter 11, we know that the Dependency Inversion Principle means that "clients [...] own the abstract interfaces". This makes sense, if you think about it.
Imagine that the (shared) library defines a concrete class Foo that takes three values in its constructor:
public Foo(Guid bar, int baz, string qux)
Why would the library ever need to define an interface like the following?
public interface IFooFactory { Foo Create(Guid bar, int baz, string qux); }
It could, but what would be the purpose? The library itself doesn't need the interface, because the Foo class has a nice constructor it can use.
Often, the Abstract Factory pattern is most useful if you have some of the values available at composition time, but can't fully compose the object because you're waiting for one of the values to materialize at run-time. If you think this sounds abstract, my article series on Role Hints contains some realistic examples - particularly the articles Metadata Role Hint, Role Interface Role Hint, and Partial Type Name Role Hint.
A client may, for example, wait for the bar
value before it can fully compose an instance of Foo. Thus, the client can define an interface like this:
public interface IFooFactory { Foo Create(Guid bar); }
This makes sense to the client, but not to the library. From the library's perspective, what's so special about bar
? Why should the library define the above Abstract Factory, but not the following?
public partial interface IFooFactory { Foo Create(int baz); }
Such Abstract Factories make no sense to the library; they are meaningful only to their clients. Since the client owns the interfaces, they should be defined together with their clients. A more detailed diagram illustrates these relationships:
As you can see, although both definitions of IFooFactory depend on the shared library (since they both return instances of Foo), they are two different interfaces. In Client 1, apparently, the run-time value to be mapped to Foo is bar
(a Guid), whereas in Client 2, the run-time value to be mapped to Foo is baz
(an int).
The bottom line is: by default, libraries shouldn't define Abstract Factories for their own concrete types. Clients should define the Abstract Factories they need (if any).
Implementing the Abstract Factory #
While I've previously described how to implement an Abstract Factory, I may then have given short shrift to the topic of where to put the implementation.
Perhaps a bit surprising, this (at least partially) depends on the return type of the Abstract Factory's Create method. In the case of both IFooFactory definitions above, the return type of the Create methods is the concrete Foo class, defined in the shared library. This means that both Client 1 and Client 2 already depends on the shared library. In such situations, they can each implement their Abstract Factories as Manually Coded Factories. They don't have to do this, but at least it's an option.
On the other hand, if the return type of an Abstract Factory is another interface defined by the client itself, it's a different story. Imagine, as an alternative scenario, that a client depends on an IPloeh interface that it has defined by itself. It may also define an Abstract Factory like this:
public interface IPloehFactory { IPloeh Create(Guid bar); }
This has nothing to do with the library that defines the Foo class. However, another library, somewhere else, may implement an Adapter of IPloeh over Foo. If this is the case, that implementing third party could also implement IPloehFactory. In such cases, the library that defines IPloeh and IPloehFactory must not implement either interface, because that creates the coupling it works so hard to avoid.
The third party that ultimately implements these interfaces is often the Composition Root. If this is the case, other implementation options are Container-based Factories or Dynamic Proxies.
Summary #
In order to answer the initial question: my default approach would be to implement the Abstract Factory multiple times, favouring decoupling over DRY. These days I prefer Pure DI, so I'd tend to go with the Manually Coded Factory.
Still, this answer of mine presupposes that Abstract Factory is the correct answer to a design problem. Often, it's not. It can lead to horrible interfaces like IFooManagerFactoryStrategyFactoryFactory, so I consider Abstract Factory as a last resort. Often, the Metadata, Role Interface, or Partial Type Name Role Hints are better options. In the degenerate case where there's no argument to the Abstract Factory's Create method, a Decoraptor is worth considering.
This article explains the matter in terms of relatively simple dependency graphs. In general, dependency graphs should be shallow, but if you want to learn about principles for composing more complex dependency graphs, Agile Principles, Patterns, and Practices contains a chapter on Principles of Package and Component Design (chapter 28) that I recommend.
Comments
[<JsonObject(MemberSerialization=MemberSerialization.OptOut)>]
to the type also works