How do you get the value out of a collection? Mu. Which value?

The other day I was looking for documentation on how to write automated tests with a self-hosted ASP.NET Core 3 web application. I've done this numerous times with previous versions of the framework, but ASP.NET Core 3 is new, so I wanted to learn how I'm supposed to do it this year. I found official documentation that helped me figure it out.

One of the code examples in that article displays a motif that I keep encountering. It displays behaviour close enough to unit bias that I consider it reasonable to use that label. Unit bias is the cognitive tendency to consider a unit of something the preferred amount. Our brains don't like fractions, and they don't like multiples either.

Unit bias in action #

The sample code in the ASP.NET Core documentation differs in the type of dependency it looks for, but is otherwise identical to this:

var descriptor = services.SingleOrDefault(
    d => d.ServiceType ==
        typeof(IReservationsRepository));
 
if (descriptor != null)
{
    services.Remove(descriptor);
}

The goal is to enable an automated integration test to run against a Fake database instead of an actual relational database. My production Composition Root registers an implementation of IReservationsRepository that communicates with an actual database. In an automated integration test, I'd like to unregister the existing dependency and replace it with a Fake. Here's the code in context:

public class RestaurantApiFactory : WebApplicationFactory<Startup>
{
    protected override void ConfigureWebHost(IWebHostBuilder builder)
    {
        if (builder is null)
            throw new ArgumentNullException(nameof(builder));
 
        builder.ConfigureServices(services =>
        {
            var descriptor = services.SingleOrDefault(
                d => d.ServiceType ==
                    typeof(IReservationsRepository));
 
            if (descriptor != null)
            {
                services.Remove(descriptor);
            }
 
            services.AddSingleton<IReservationsRepository>(
                new FakeDatabase());
        });
    }
}

It works as intended, so what's the problem?

How do I get the value out of my collection? #

The problem is that it's fragile. What happens if there's more than one registration of IReservationsRepository?

This happens:

System.InvalidOperationException : Sequence contains more than one matching element

This is a completely avoidable error, stemming from unit bias.

A large proportion of programmers I meet seem to be fundamentally uncomfortable with thinking in multitudes. They subconsciously prefer thinking in procedures and algorithms that work on a single object. The programmer who wrote the above call to SingleOrDefault exhibits behaviour putting him or her in that category.

This is nothing but a specific instantiation of a more general programmer bias: How do I get the value out of the monad?

As usual, the answer is mu. You don't. The question borders on the nonsensical. How do I get the value out of my collection? Which value? The first? The last? Some arbitrary value at an unknown position? Which value do you want if the collection is empty? Which value do you want if there's more than one that fits a predicate?

If you can answer such questions, you can get 'the' value out of a collection, but often, you can't. In the current example, the code doesn't handle multiple IReservationsRepository registrations.

It easily could, though.

Inject the behaviour into the collection #

The best answer to the question of how to get the value out of the monad (in this case, the collection) is that you don't. Instead, you inject the desired behaviour into it.

In this case, the desired behaviour is to remove a descriptor. The monad in question is the collection of services. What does that mean in practice?

A first attempt might be something like this:

var descriptors = services
    .Where(d => d.ServiceType == typeof(IReservationsRepository));
foreach (var descriptor in descriptors)
    services.Remove(descriptor);

Unfortunately, this doesn't quite work:

System.InvalidOperationException : Collection was modified; enumeration operation may not execute.

This happens because descriptors is a lazily evaluated Iterator over services, and you're not allowed to remove elements from a collection while you enumerate it. It could lead to bugs if you could.

That problem is easily solved. Just copy the selected descriptors to an array or list:

var descriptors = services
    .Where(d => d.ServiceType == typeof(IReservationsRepository))
    .ToList();
foreach (var descriptor in descriptors)
    services.Remove(descriptor);

This achieves the desired outcome regardless of the number of matches to the predicate. This is a more robust solution, and it requires the same amount of code.

You can stop there, since the code now works, but if you truly want to inject the behaviour into the collection, you're not quite done yet.

But you're close. All you have to do is this:

services
    .Where(d => d.ServiceType == typeof(IReservationsRepository))
    .ToList()
    .ForEach(d => services.Remove(d));

Notice how this statement never produces an output. Instead, you 'inject' the call to services.Remove into the list, using the ForEach method, which then mutates the services collection.

Whether you prefer the version that uses the foreach keyword or the version that uses List<T>.ForEach doesn't matter. What matters is that you don't use the partial SingleOrDefault function.

Conclusion #

It's a common code smell when programmers try to extract a single value from a collection. Sometimes it's appropriate, but there are several edge cases you should be prepared to address. What should happen if the collection is empty? What should happen if the collection contains many elements? What should happen if the collection is infinite? (I didn't address that in this article.)

You can often both simplify your code and make it more robust by staying 'in' the collection, so to speak. Let the desired behaviour apply to all appropriate elements of the collection.

Don't be biased against collections.


Comments

Julius H #

I concur that often the first element of a collection is picked without thinking. Anecdotally, I experienced a system that would not work if set up freshly because in some places there was no consideration for empty collections. (The testing environment always contained some data)

Yet I would reverse the last change (towards .ForEach). For one, because (to my biased eye) it looks side effect free but isn't. And then it does add value compared to a forech loop, also both solutions are needlessy inefficient. If you want to avoid the foreach, go for the RemoveAll() method (also present on List<T>):

services.RemoveAll<IReservationsRepository>();

2020-04-29 8:52 UTC

Julius, thank you for writing. Yes, I agree that in C# it's more idiomatic to use foreach.

How would using RemoveAll work? Isn't that going to remove the entries from the List instead of from services?

2020-04-29 10:49 UTC
Julius H #

Hi Mark,
As you"re calling IServiceCollection.RemoveAll(), it will remove it from the collection. I tried it, and to me it seems to be working. (As long as you are not copying the services into a list beforehand)

But to your main point, I remember when I wrote .Single() once, and years later I got a bug report because of it. I see two approaches there: Fail as fast and hard as possible or check just as much as needed for the moment. Considering TDD in the former approach, one would need to write a lot of test code for scenarios, that should never happen to verify the exceptions happen. Still, in the latter approach, a subtle bug could stay in the system for quite some time... What do you prefer?

2020-05-01 18:07 UTC

Julius, thank you for writing. It turns out that RemoveAll are a couple of extension methods on IServiceCollection. One has to import Microsoft.Extensions.DependencyInjection.Extensions with a using directive before one can use them. I didn't know about these methods, but I agree that they seem to do their job. Thank you for the tip.

As for your question, my preference is for robustness. In my experience, there's rarely such a thing as a scenario that never happens. If the code allows something to happen, it'll likely happen sooner or later. Code changes, so even if you've analysed that some combination of input isn't possible today, a colleague may change the situation tomorrow. It pays to write that extra unit test or two to make sure that encapsulation isn't broken.

This is also one of the reasons I'm fond of property-based testing. You automatically get coverage of all sorts of boundary conditions you'd normally not think about testing.

2020-05-02 9:12 UTC


Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Monday, 20 April 2020 06:27:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Monday, 20 April 2020 06:27:00 UTC