The IsNullOrWhiteSpace method may seem like a useful utility method, but poisons your design perspective.

The string.IsNullOrWhiteSpace method, together with its older sibling string.IsNullOrEmpty, may seem like useful utility methods. In reality, they aren't. In fact, they trick your mind into thinking that null is equivalent to white space, which it isn't.

Null isn't equivalent to anything; it's the absence of a value.

Various empty and white space strings ("", " ", etc), on the other hand, are values, although, perhaps, not particularly interesting values.

Example: search canonicalization

Imagine that you have to write a simple search canonicalization algorithm for a music search service. The problem you're trying to solve is that when users search for music, the may use variations of upper and lower case letters, as well as type the artist name before the song title, or vice versa. In order to make your system as efficient as possible, you may want to cache popular search results, but it means that you'll need to transform each search term into a canonical form.

In order to keep things simple, let's assume that you only need to convert all letters to upper case, and order words alphabetically.

Here are five test cases, represented as a Parameterized Test:

[Theory]
[InlineData("Seven Lions Polarized"  , "LIONS POLARIZED SEVEN"  )]
[InlineData("seven lions polarized"  , "LIONS POLARIZED SEVEN"  )]
[InlineData("Polarized seven lions"  , "LIONS POLARIZED SEVEN"  )]
[InlineData("Au5 Crystal Mathematics""AU5 CRYSTAL MATHEMATICS")]
[InlineData("crystal mathematics au5""AU5 CRYSTAL MATHEMATICS")]
public void CanonicalizeReturnsCorrectResult(
    string searchTerm,
    string expected)
{
    string actual = SearchTerm.Canonicalize(searchTerm);
    Assert.Equal(expected, actual);
}

Here's one possible implementation that passes all five test cases:

public static string Canonicalize(string searchTerm)
{
    return searchTerm
        .Split(new[] { ' ' })
        .Select(x => x.ToUpper())
        .OrderBy(x => x)
        .Aggregate((x, y) => x + " " + y);
}

This implementation uses the space character to split the string into an array, then converts each sub-string to upper case letters, sorts the sub-strings in ascending order, and finally concatenates them all together to a single string, which is returned.

Continued example: making the implementation more robust

The above implementation is quite naive, because it doesn't properly canonicalize if the user entered extra white space, such as in these extra test cases:

[InlineData("Seven  Lions   Polarized""LIONS POLARIZED SEVEN")]
[InlineData(" Seven  Lions Polarized ""LIONS POLARIZED SEVEN")]

Notice that these new test cases don't pass with the above implementation, because it doesn't properly remove all the white spaces. Here's a more robust implementation that passes all test cases:

public static string Canonicalize(string searchTerm)
{
    return searchTerm
        .Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
        .Select(x => x.ToUpper())
        .OrderBy(x => x)
        .Aggregate((x, y) => x + " " + y);
}

Notice the addition of StringSplitOptions.RemoveEmptyEntries.

Testing for null

If you consider the above implementation, does it have any other problems?

One, fairly obvious, problem is that if searchTerm is null, the method is going to throw a NullReferenceException, because you can't invoke the Split method on null.

Therefore, in order to protect the invariants of the method, you must test for null:

[Fact]
public void CanonicalizeNullThrows()
{
    Assert.Throws<ArgumentNullException>(
        () => SearchTerm.Canonicalize(null));
}

In this case, you've decided that null is simply invalid input, and I agree. Searching for null (the absence of a value) isn't meaningful; it must be a defect in the calling code.

Often, I see programmers implement their null checks like this:

public static string Canonicalize(string searchTerm)
{
    if (string.IsNullOrWhiteSpace(searchTerm))
        throw new ArgumentNullException("searchTerm");
 
    return searchTerm
        .Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
        .Select(x => x.ToUpper())
        .OrderBy(x => x)
        .Aggregate((x, y) => x + " " + y);
}

Notice the use of IsNullOrWhiteSpace. While it passes all tests so far, it's wrong for a number of reasons.

Problems with IsNullOrWhiteSpace

The first problem with this use of IsNullOrWhiteSpace is that it may give client programmers wrong messages. For example, if you pass the empty string ("") as searchTerm, you'll still get an ArgumentNullException. This is misleading, because it gives the wrong message: it states that searchTerm was null when it wasn't (it was "").

You may then argue that you could change the implementation to throw an ArgumentException.

if (string.IsNullOrWhiteSpace(searchTerm))
    throw new ArgumentException("Empty or null.""searchTerm");

This isn't incorrect per se, but not as explicit as it could have been. In other words, it's not as helpful to the client developer as it could have been. While it may not seem like a big deal in a single method like this, it's sloppy code like this that eventually wear client developers down; it's death by a thousand paper cuts.

Moreover, this implementation doesn't follow the Robustness Principle. Is there any rational reason to reject white space strings?

Actually, with a minor tweak, we can make the implementation work with white space as well. Consider these new test cases:

[InlineData("""")]
[InlineData(" """)]
[InlineData("  """)]

These currently fail because of the use of IsNullOrWhiteSpace, but they ought to succeed.

The correct implementation of the Canonicalize method is this:

public static string Canonicalize(string searchTerm)
{
    if (searchTerm == null)
        throw new ArgumentNullException("searchTerm");
 
    return searchTerm
        .Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
        .Select(x => x.ToUpper())
        .OrderBy(x => x)
        .Aggregate("", (x, y) => x + " " + y)
        .Trim();
}

First of all, the correct Guard Clause is to test only for null; null is the only invalid value. Second, the method uses another overload of the Aggregate method where an initial seed (in this case "") is used to initialize the Fold operation. Third, the final call to the Trim method ensures that there's no leading or trailing white space.

The IsNullOrWhiteSpace mental model

The overall problem with IsNullOrWhiteSpace and IsNullOrEmpty is that they give you the impression that null is equivalent to white space strings. This is the wrong mental model: white space strings are proper string values that you can very often manipulate just as well as any other string.

If you insist on the mental model that white space strings are equivalent to null, you'll tend to put them in the same bucket of 'invalid' data. However, if you take a hard look at the preconditions for your classes, methods, or functions, you'll find that often, a white space string is going to be perfectly acceptable input. Why reject input you can understand? That will only make your code more difficult to use.

In testing terms, it's my experience that null rarely falls in the same Equivalence Class as white space strings. Therefore, it's wrong to implicitly treat them as if they do.

The IsNullOrWhiteSpace and IsNullOrEmpty methods imply that null and white space strings are equivalent, and this will often give you the wrong mental model of the boundary cases of your software. Be careful when using these methods.


Comments

I agree if this is used at code library, which will be used by other programmer. However when directly used at application level layer, it is common to use them, at least the IsNullOrEmpty one, and they are quite powerful. And I don't find any problem in using that.

2014-11-28 06:47 UTC


Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or Google Plus, or somewhere else with a permalink. Ping me with the link, and I may add it as a comment.

Published

Tuesday, 18 November 2014 19:10:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!