This post examines the conflict between Iterators, LINQ, and the Liskov Substitution Principle.

As a reaction to my my previous post on Defensive Coding, one of my readers sent me this question:

"[One] very prominent scenario of defensive coding is used when the input argument is IEnumerable<T>

public class Publisher
{
    public Publisher(IEnumerable<Subscriber> subscribers)
    {
        // defensive copy -> good or bad?
        this.subscribers = subscribers.ToArray();
    }
 
    //  …
}

"You could argue: when the intention is to use IEnumerable<T>, the receiver shall not to believe this sequence is immutable. You could indicate an immutable sequence with ICollection for instance. In the example above, the caller might add a new subscriber silently to its own list and have it automatically injected into the 'publisher' (maybe that's the intention from the perspective of the caller). However, a defensive copy breaks this behavior because the injected sequence is now no longer under control of the caller. This shows how simple an implementation detail can change the behavior on the client.

"The reason you often see code like this is because we like immutable objects and second because of the unknown performance impact of IEnumerable. Once you take a copy of the input you can predict the performance of your class otherwise you can't.

"I tend to say it's 'bad' to take defensive copy (after reading many of your blog posts), but would be happy to hear your opinion on that."

This question warrants an in-depth answer.

Encapsulation #

IEnumerable<T> is one of the most misunderstood interfaces in .NET. It carries very few guarantees, and many method calls on it may actually break the Liskov Substitution Principle (LSP). ToArray is one of them, because it assumes that the sequence produced by the Iterator is finite, which it may not be. Thus, if you invoke ToArray on an infinite Iterator, you will eventually get an exception.

It doesn't really matter whether you call ToArray in the constructor, or in the class method(s) where you use the injected IEnumerable<T>. However, from a Fail First perspective, and in order to protect the class' invariant, if the class requires the sequence to be finite, you could argue that you should invoke ToArray (or ToList) in the constructor. However, this breaks Nikola Malovic's 4th law of IoC: constructors should do no work. This should make you stop and ponder: if you require an array, you should state that requirement up front:

public class Publisher
{
    public Publisher(Subscriber[] subscribers)
    {
        this.subscribers = subscribers;
    }
 
    //  …
}

Notice that instead of requesting IEnumerable<T>, this version requests an array and simply assigns the array to a private field.

However, the problem is that an array isn't quite the same as an Iterator; the most profound difference is that the Publisher class is actually able to mutate the array by replacing elements within the array. This could be a problem if the array is shared with other clients.

Another problem is that if the Publisher doesn't need the ability to mutate the array, it now breaks the Robustness Principle, because a finite Iterator would have been good enough for its needs, but still, it puts an unwarranted demand on its clients.

Asking for ICollection<T>, as my reader suggests, is an even greater violation of the Robustness Principle, because that interface adds seven new methods on top of IEnumerable<T> - three of which are only about mutation. The rest of the methods have been made redundant by LINQ.

LINQ and the LSP #

In a previous post, I've talked about the conflict between IQueryable and the LSP, but even constraining the discussion to LINQ to Objects, it turns out that LINQ has lots of built-in LSP violations.

Recall the essence of the LSP: you should be able to pass any implementation of an interface to a client without changing the correctness of the system. While 'correctness' is application-specific, the lowest common denominator must be that if a method works for one implementation, it mustn't throw exceptions for another implementation. However, consider these two implementations of IEnumerable<string>:

new[] { "foo", "bar", "baz" };

and this one:

public class InfiniteStrings : IEnumerable<string>
{
    public IEnumerator<string> GetEnumerator()
    {
        while (true)
            yield return "foo";
    }
 
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }
}

As I've already discussed, invoking ToArray (or ToList) on these two implementations changes the correctness of the system, because the second implementation (the infinite Iterator) will cause an exception to be thrown. In fact, as far as I can tell, the only LSP-compliant LINQ methods are:

  • Any()
  • AsEnumerable()
  • Concat(IEnumerable<T>)
  • DefaultIfEmpty()
  • DefaultIfEmpty(T)
  • Distinct (maybe...)
  • Distinct(IEqualityComparer<T>) (maybe...)
  • ElementAt(int)
  • ElementAtOrDefault(int)
  • First()
  • FirstOrDefault()
  • OfType<TResult>()
  • Select(Func<TSource, TResult>)
  • Select(Func<TSource, int, TResult>)
  • SelectMany(Func<TSource, IEnumerable<TResult>>)
  • SelectMany(Func<TSource, int, IEnumerable<TResult>>)
  • SelectMany(Func<TSource, IEnumerable<TCollection>>, Func<TSource, TCollection, TResult>)
  • SelectMany(Func<TSource, int, IEnumerable<TCollection>>, Func<TSource, TCollection, TResult>)
  • Single()
  • SingleOrDefault()
  • Skip(int)
  • Take(int)
  • Where(Func<TSource, bool>)
  • Where(Func<TSource, int, bool>)
  • Zip(IEnumerable<TSecond>, Func<TFirst, TSecond, TResult>)
If you can get by with this subset of LINQ, you should be safe. If not, you're back at the choice of either accepting IEnumerable<T>, or requesting an array; break the LSP or break the Robustness Principle.

This actually demonstrates a need for a 'Finite Iterator' interface, and I have to admit that, before researching for this article, I'd never heard about IReadOnlyCollection<T>, but there it is; it's seems to be a new interface in .NET 4.5. I think I'll begin to use this interface some more!

Summary #

The bottom line is that defensive copying of IEnumerable<T> into arrays should be avoided. If you can get by with the LSP-compliant subset of LINQ, then all is good (but consider writing a couple of unit tests that involve infinite Iterators). If you require a finite sequence and you're on .NET 4.5, require IReadOnlyCollection<T> as an argument instead of IEnumerable<T>. If you require a finite sequence and you're not on .NET 4.5, ask for an array as an argument (and consider adding some unit tests that verify that your implementation doesn't mutate the array).



Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Saturday, 20 July 2013 21:00:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Saturday, 20 July 2013 21:00:00 UTC