A string is sometimes not a string. Model it accordingly.

Recently, I was reviewing some code that looked like this:

public IHttpActionResult Get(string userName)
{
    if (string.IsNullOrWhiteSpace(userName))
        return this.BadRequest("Invalid user name.");
 
    var user = this.repository.FindUser(userName.ToUpper());
    return this.Ok(user);
}

There was a few things with this that struck me as a bit odd; most notably the use of IsNullOrWhiteSpace. When I review code, IsNullOrWhiteSpace is one of the many things I look for, because most people use it incorrectly.

This made me ask the author of the code why he had chosen to use IsNullOrWhiteSpace, or, more specifically, what was wrong with a string with white space in it?

The answer wasn't what I expected, though. The answer was that it was a business rule that the user name can't be all white space.

You can't really argue about business logic.

In this case, the rule even seems quite reasonable, but I was just so ready to have a discussion about invariants (pre- and postconditions) that I didn't see that one coming. It got me thinking, though.

Where should business rules go?

It seems like a reasonable business rule that a user name can't consist entirely of white space, but is it reasonable to put that business rule in a Controller? Does that mean that everywhere you have a user name string, you must remember to add the business rule to that code, in order to validate it? That sounds like the kind of duplication that actually hurts.

Shouldn't business rules go in a proper Domain Model?

This is where many programmers would start to write extension methods for strings, but that's just putting lipstick on a pig. What if you forget to call the appropriate extension method? What if a new developer on the team doesn't know about the appropriate extension method to use?

The root problem here is Primitive Obsession. Just because you can represent a value as a string, it doesn't mean that you always should.

The string data type literally can represent any text. A user name (in the example domain above) can not be any text - we've already established that.

Make it a type

Instead of string, you can (and should) make user name a type. That type can encapsulate all the business rules in a single place, without violating DRY. This is what is meant by a Domain Model. In Domain-Driven Design terminology, a primitive like a string or a number can be turned into a Value Object. Jimmy Bogard already covered that ground years ago, but here's how I would define a UserName class:

public class UserName
{
    private readonly string value;
 
    public UserName(string value)
    {
        if (value == null)
            throw new ArgumentNullException("value");
        if (!UserName.IsValid(value))
            throw new ArgumentException("Invalid value.""value");
 
        this.value = value;
    }
 
    public static bool IsValid(string candidate)
    {
        if (string.IsNullOrEmpty(candidate))
            return false;
 
        return candidate.Trim().ToUpper() == candidate;
    }
 
    public static bool TryParse(string candidate, out UserName userName)
    {
        userName = null;
        if (string.IsNullOrWhiteSpace(candidate))
            return false;
 
        userName = new UserName(candidate.Trim().ToUpper());
        return true;
    }
 
    public static implicit operator string(UserName userName)
    {
        return userName.value;
    }
 
    public override string ToString()
    {
        return this.value.ToString();
    }
 
    public override bool Equals(object obj)
    {
        var other = obj as UserName;
        if (other == null)
            return base.Equals(obj);
 
        return object.Equals(this.value, other.value);
    }
 
    public override int GetHashCode()
    {
        return this.value.GetHashCode();
    }
}

As you can tell, this class protects its invariants. In case you were wondering about the use of ToUpper, it turns out that there's also another business rule that states that user names are case-insensitive, and one of the ways you can implement that is by converting the value to upper case letters. All business rules pertaining to user names are now nicely encapsulated in this single class, so you don't need to remember where to apply them to strings.

If you want to know the underlying string, you can either invoke ToString, or take advantage of the implicit conversion from UserName to string. You can also compare to UserName instances, because the class overrides Equals. If you have a string, and want to convert it to a UserName, you can use TryParse.

The original code example above can be refactored to use the UserName class instead:

public IHttpActionResult Get(string candidate)
{
    UserName userName;
    if (!UserName.TryParse(candidate, out userName))
        return this.BadRequest("Invalid user name.");
 
    var user = this.repository.FindUser(userName);
    return this.Ok(user);
}

This code has the same complexity as the original example, but now it's much clearer what's going on. You don't have to wonder about what looks like arbitrary rules; they're all nicely encapsulated in the UserName class.

Furthermore, as soon as you've left the not Object-Oriented boundary of your system, you can express the rest of your code in terms of the Domain Model; in this case, the UserName class. Here's the IUserRepository interface's Find method:

User FindUser(UserName userName);

As you can tell, it's expressed in terms of the Domain Model, so you can't accidentally pass it a string. From that point, since you're receiving a UserName instance, you know that it conforms to all business rules encapsulated in the UserName class.

Not only for OOP

While I've used the term encapsulation once or twice here, this way of thinking is in no way limited to Object-Oriented Programming. Scott Wlaschin describes how to wrap primitives in meaningful types in F#. The motivation and the advantages you gain are the same in Functional F# as what I've described here.

Over-engineering?

Isn't this over-engineering? A 56 lines of code class instead of a string? Really? The answer to such a question is always context-dependent, but I rarely find that it is (over-engineering). You can create a Value Object like the above UserName class in less than half an hour - even if you use Test-Driven Development. When I created this example, I wrote 27 test cases distributed over five test methods in order to make sure that I hadn't done something stupid. It took me 15 minutes.

You may argue that 15 minutes is a lot, compared to the 0 minutes it would take you if you'd 'just' use a string. On the surface, that seems like a valid counter-argument, but perhaps you're forgetting that with a primitive string, you still need to write validation and business logic 'around' the string, and you have to remember to apply that logic consistently across your entire code base. My guess is that you'll spend more than 15 minutes on doing this, and troubleshooting defects that occur when someone forgets to apply one of those rules to a string in some other part of the code base.

Summary

Primitive values such as strings, integers, decimal numbers, etc. often represent concepts that are constrained in some ways; they're not just any string, any integer, or any decimal number. Ask yourself if extreme values (like the entire APPP manuscript, Int32.MinValue, and so on) are suitable for such variables. If that's not the case, consider introducing a Value Object instead.

If you want to learn more about Encapsulation, you can watch my Encapsulation and SOLID Pluralsight course.


Comments

Johannes

Mark, I enjoyed your article. I feel that this is the way to go. Too often have I seen people using arrays if bytes for images and strings for email addresses.

An important aspect of DDD is within the language and when there are people talking about user names all the time, that usually indicates that usernames are an important aspect of what is going on and should be modeled explicitly and not use primitives.

The solution doesn't look overengineered to me, although I had to defend similar code before for the same reason. Not only is it now a good place to put further validation, but also allows further distinction using other types and a polymorphism (if carefully used, they can be a delight), each of which covers different invariants.

A UserName can be moved around and used to retrieve objects. But maybe you can't do that with a class of the type InvalidUsername. In another example, maybe there are functions that deal with email eddresses - which I have often seen modeled as strings. Some accept InvalidEmailAddress objects, others accept ValidEmail addresses. While it doesn't reduce the amount of boilerplate null checks, a SendEmail function wouldn't have to do


... SendEmail(EmailAddress email)
{
    if(email.IsValid()) {
        email.Send()
    }
}

One could simply make it to only accept valid email addresses. Their mere existence would guarantee, that the email address is valid. One could then write functions dealing with valid email addresses and some dealing with invalid ones. You can log and analyze bad account creation requests with the invalid ones, but not send email to them and prevent everyone else from doing so, depending on what kind of email address is being passed around.

2015-01-20 08:16 UTC

Johannes, thank you for writing. You are right, it's possible to take the idea further in the way you describe. Again, such techniques aren't even limited to OOP; for instance, Scott Wlaschin explains how to make illegal states unrepresentable in F#.

In fact, the Functional approach using Sum Types (Discriminated Unions) is nicer, because it doesn't rely on inheritance :)

2015-01-20 08:52 UTC

Interesting article. I have no qualms about creating domain types around primitives per se, but IMHO there should be a bit more complexity in the invariants than a single condition before resorting to that. In this case, I would think that the username constraint should be implemented in the User class. If other entities shared the constraint (e.g. Company.AccountName), the logic might be refactored into a NameValidator, and then referenced concretely by the entities.

A string holding a strong password would be a good candidate for promoting to something smarter.

2015-01-20 23:10 UTC
Suamere

Mark,

This is exactly right. I honestly have no comment on the concept because I have changed primitives to value objects a number of times and use something similar.

Quick side-note, and this may just be a preference: In the TryParse, a Trim()'d value is being passed into the constructor. So if I want to create two usernames " Suamere " and "Suamere", the second will fail since, unbeknownst to me, the first had been successfully created with Trim(). It seems to me that "Do not allow pre/post white-space" is a business rule. Therefore, if there is white-space, I consider that to be breaking a business rule.

My main motivation for this comment is that this UserName Object shows a perfect example of a pattern I've seen when using the TryParse pattern (or Monads). That is: The logic in the TryParse is frequently an exact duplicate of logic in another method such as the constructor. Though instead of throwing exceptions, TryParse gracefully informs the consumer of success/failure.

So business logic is duplicated in the CTOR(), IsValid(), and the TryParse(). Instead, I consider this pattern:

    public UserName(string value)
    {
        CheckIsValid(valuetrue);
        this.value = value;
    }

    public static bool TryParse(string candidate, out UserName userName)
    {
        userName = null;
        if (!CheckIsValid(candidate, false)) return false;
        userName = new UserName(candidate);
        return true;
    }

    public static bool IsValid(string candidate)
    {
        return CheckIsValid(candidate, false);
    }

    private static bool CheckIsValid(string candidate, bool throwExceptions)
    {
        //Brevity follows, but can be split up into more granular rules and exceptions
        if (string.IsNullOrWhiteSpace(candidate))
        {
            if (throwExceptions) throw new ArgumentNullException("candidate");
            return false;
        }
        if (!string.Equals(candidate, candidate.Trim(), StringComparison.OrdinalIgnoreCase))
        {
            if (throwExceptions) throw new ArgumentException("Invalid value.", "candidate");
            return false;
        }

        return true;
    }
		

Let me know what you think!
~Suamere, Steven Fletcher

2015-01-20 22:00 UTC

Mike, thank you for writing. A business rule doesn't have to be complex in order to be important. The point isn't only to encapsulate complex logic, but to make sure that (any) business logic is applied consistently.

In my experience, things often start simple, but evolve (or do they devolve?) into something more complex; when do you draw the line?

2015-01-21 12:57 UTC

Steven, thank you for writing. You exhibit traits of critical thinking, which is always good when dealing with business logic :) In the end, it all boils down to what exactly the business rule is. As you interpret the business rule, you rephrase it as "Do not allow pre/post white-space", but that's not the business rule I had in mind.

Since I'm the author of the blog post, I have the luxury of coming up with the example, and thus, in this case, defining the business rule. The business rule I had in mind is that user names are case-insensitive, and they're also insensitive to leading and trailing white space. Thus, " Suamere " and "Suamere" are considered to be two representations of the same user name, just as "FOO" and "foo" represent the same user name. The canonicalised representation of these two user names are "SUAMERE" and "FOO", respectively.

That's also the reason I chose the name TryParse; it's equivalent to DateTime.TryParse, which should be a well-known .NET idiom. If I parse the two strings "2015-01-21" and " 21. januar 2015 " on my machine (which is running with the da-DK locale), I get the exact same value from both strings. Notice that DateTime.TryParse also blissfully ignores leading and trailing white space. This is all correct according to Postel's law.

If all I'd wanted to do was to convert an already valid string into a UserName instance, I'd have implemented an explicit conversion, which is what Jimmy Bogard did in his article. In fact, such an explicit conversion doesn't conflict with the current TryParse method, so I'd find it quite reasonable to add that as well, if any client had the need for it.

2015-01-21 13:35 UTC
Jeff Soper

This question might be slightly besides the point, but I'm having trouble with the bit about UserName values being case-insensitive. I take that to mean that you can create a UserName with all uppercase, all lowercase, or any combination thereof, and they'll all be treated equally.

To me, that doesn't mean that you have to pass in an uppercase-only value to the constructor (or TryParse() method, for that matter) in order to successfully instantiate a UserName. Was that your intent?

I created a Gist with some tests to illustrate what I'm talking about: I can instantiate with "JEFF", but not with "jeff", using either the constructor directly or the TryParse method.

2015-01-23 17:27 UTC

Jeff, I attempted to run your unit tests, and the only one that fails is, as expected, if you attempt to use the constructor with "jeff"; TryParse with "jeff" works.

Would it have helped if the constructor was private?

2015-01-23 19:08 UTC
Jeff Soper

I stand corrected regarding the failing tests; I worked with the tests again, and indeed, only the constructor method failed passing in "jeff" in lowercase.

Making the constructor private would clarify things in that there would only be one way for a client to instantiate UserName, which happens to ensure anything passed into the private constructor is all upper-case.

That seems to better match the stated business rule, that UserName values are case-insensitive, rather than a rule that prohibits anything but upper-case values. In any case, it's not quite central to the point of your article, so I thank you for taking the time to reply anyway.

2015-01-28 19:42 UTC

Jeff, I think we fundamentally agree, and I do understand why you'd find it surprising that the constructor accepts "JEFF", but not "jeff". FWIW, I follow (a lot) of rules (of thumb) when I design APIs, and one of them is that the purpose of constructors is to initialise objects. A constructor may perform validation, but must not perform work in the sense of transforming the input or producing side-effects. In other words, constructors are used to initialise objects with valid values.

Whenever I need to transform data as part of initialisation, I often create some sort of factory method for that explicit purpose. A static TryParse method is one of many options available when such a need arises. The point is exactly that the input may not start out being 'valid', but it can be transformed into a valid representation. Going from "jeff" to UserName("JEFF") is a conversion.

This idea of constructors only accepting valid values may seem a bit foreign at first, but if you follow it as a general principle, you get a code base that's easier to reason about, because it's consistent. It's probably because I'm so used to this principle that I just left the constructor public, following another principle of mine, which is that there's no reason to make anything internal or private if the class or member can properly protect its invariants.

Sorry to rant; you just inspired me to explain some of my underlying design thoughts that caused me to arrive at this particular solution.

2015-01-28 21:19 UTC
Julien Vulliet

Thanks for your post, very informative as usual. I would add a few insights.

As much as it can feel Over engineering, I would totally agree and say it doesn't. I deal with GPU resource creation (Direct3D11) on a daily basis and this has helped me so much and saved me headaches (for example a Buffer size must have more than 0 elements, failing to do so gives you a cryptic runtime error than you can only catch error message by enabling Debug Device and looking at Debug output window!

Using a simple BufferElementCount in that case enforces the fact that I provide a correct value, and throw a meaningful exception early in the process. By passing a BufferElementCount I can (almost) safely expect that my buffer gets created.

On the lines of code, I would expect that the initial 50 lines would largely be overtaken by rewriting the same logic, but scattered across the whole code (plus likely adding try/catch blocks in many places on top of it), so I'm pretty sure doing the 0 lines version would lead to more code at the end. Plus 50 lines of code to save be a deep debug session on half million lines code base is a trade-off I'll happily take!

Also as a little bonus, for simple cases I wrote a little snippet so I guess I would share it here Domain Primitive. Now it takes even less than 2 minutes in most simple cases.

2015-03-18 14:41 UTC
Alan Evans

Your Equals is fragile. Because the class is not sealed, one could descend from it and break the equality contract.

Example of how to break through descending:

    public class Domain​UserName : UserName{
      private string domain;
      //Equals written to compare both username and domain
    }

    UserName u1 = new DomainUserName("BOB", "GITHUB");
    UserName u2 = new UserName("BOB");

    u1.Equals(u2) //will use DomainUserName's Equals and return false
    u2.Equals(u1) //will use UserName's Equals and return true

	

But equality contract must be symmetrical. So either seal the class or else compare GetTypes() as example here: msdn.microsoft.com/en-us/library/vstudio/336aedhh%28v=vs.100%29.aspx

2015-03-18 14:41 UTC

Alan, thank you for writing. You're right; I'd never considered this - most likely because I never use inheritance. Therefore, my default coping strategy would most likely be to make the class sealed by default, but otherwise, explicitly comparing GetType() return values would have to do.

Thank you for teaching me something today!

2015-04-10 16:08 UTC

Great article and completely agree on having primitive types wrapped up into its own class. I had few questions around how this would be applied to provide a good validation feedback to consuming clients. If we want to provide more details on why the 'user name is invalid', like say 'User name should not be small letters' and 'User name should not have trailing spaces' how would we handle this in the class. How would the class communicate out the details of the rules that it is abstracting to its consuming clients.

When accepting multiple such 'classes' in a endpoint, what would be a suggested validation approach. For instance if we are to look at a similar endpoint, which takes in 'Name' and 'Phone' number(or even more in the case of POST endpoint where we are creating a new user), wouldnt the controller soon be overladed with a lot of such TryParse calls.

From the above comment of constructor only accepting valid values, where would it make sense to use the constructor as opposed to TryParse. Is it at the persistence boundary of the application?

2015-11-21 0:51 UTC

Rahul, thank you for writing. As a general rule, at the boundary of an application, all input must be considered evil until proven otherwise. That means that if you try to pass unvalidated input values into constructors, you should expect exceptions to be thrown on a regular basis. Since exceptions are for exceptional cases, that's most likely not the best way to go forward.

Instead, domain objects could offer Validate methods. These would be a bit like TryParse methods, but instead of returning a primitive boolean value, they'd have to return a list of messages. If there are no messages, the input is good. If there are messages, the input is bad. You can encapsulate such a validation result in an object if you think the rule about messages/no messages is too implicit. Such Validate methods could still take out parameters like TryParse methods if you want to validate and potentially convert into a domain object in one go.

This isn't particularly elegant in languages like C# or Java, but due to the lack of sum types in these languages, that's the best we can do.

In languages with sum types, you can use the Either monad and applicative composition to address this type of problem much more elegantly. A good place to start would be Scott Wlaschin's article on Railway Oriented Programming.

2015-11-21 11:44 UTC
Eugene Grebenyuk

Mark, implicit conversion operator should never throw exception. In the example, it can throw NullReferenceException when UserName is null. What would you recommend to fix that issue?

2016-05-25 14:54 UTC

Eugene, thank you for writing. That's probably a good rule, but I admit I hadn't thought about it. One option is to convert it to an explicit conversion instead. Another option is to forgo the conversion operators all together. The conversion ability isn't the important point in this article.

A third option is to leave it as is. In the code bases I control, null is never an appropriate value, so if null appears anywhere, I'd consider it a bug somewhere else.

To be realistic, though, I think I'd prefer one of the two first options, as I tend to agree with the Zen of Python: Explicit is better than implicit.

2016-05-25 16:15 UTC


Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or Google Plus, or somewhere else with a permalink. Ping me with the link, and I may add it as a comment.

Published

Monday, 19 January 2015 08:15:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!