Collecting and handling result values

The answer is traverse. It's always traverse.

I recently came across a Stack Overflow question about collecting and handling sum types (AKA discriminated unions or, in this case, result types). While the question was tagged functional-programming, the overall structure of the code was so imperative, with so much interleaved I/O, that it hardly qualified as functional architecture.

Instead, I gave an answer which involved a minimal change to the code. Subsequently, the original poster asked to see a more functional version of the code. That's a bit too large a task for a Stack Overflow answer, I think, so I'll do it here on the blog instead.

Further comments and discussion on the original post reveal that the poster is interested in two alternatives. I'll start with the alternative that's only discussed, but not shown, in the question. The motivation for this ordering is that this variation is easier to implement than the other one, and I consider it pedagogical to start with the simplest case.

I'll do that in this article, and then follow up with another article that covers the short-circuiting case.

Imperative outset #

To begin, consider this mostly imperative code snippet:

var storedItems = new List<ShoppingListItem>();
var failedItems = new List<ShoppingListItem>();
var errors = new List<Error>();
var state = (storedItems, failedItems, errors);
foreach (var item in itemsToUpdate)
{
    OneOf<ShoppingListItem, NotFound, Error> updateResult = await UpdateItem(item, dbContext);
    state = updateResult.Match<(List<ShoppingListItem>, List<ShoppingListItem>, List<Error>)>(
        storedItem => { storedItems.Add(storedItem); return state;  },
        notFound => { failedItems.Add(item); return state; },
        error => { errors.Add(error); return state; }
        );
}
 
await dbContext.SaveChangesAsync();
 
return Results.Ok(new BulkUpdateResult([.. storedItems], [.. failedItems], [.. errors]));

There's quite a few things to take in, and one has to infer most of the types and APIs, since the original post didn't show more code than that. If you're used to engaging with Stack Overflow questions, however, it's not too hard to figure out what most of the moving parts do.

The most non-obvious detail is that the code uses a library called OneOf, which supplies general-purpose, but rather abstract, sum types. Both the container type OneOf, as well as the two indicator types NotFound and Error are defined in that library.

The Match method implements standard Church encoding, which enables the code to pattern-match on the three alternative values that UpdateItem returns.

One more detail also warrants an explicit description: The itemsToUpdate object is an input argument of the type IEnumerable<ShoppingListItem>.

The implementation makes use of mutable state and undisciplined I/O. How do you refactor it to a more functional design?

Standard traversal #

I'll pretend that we only need to turn the above code snippet into a functional design. Thus, I'm ignoring that the code is most likely part of a larger code base. Because of the implied database interaction, the method isn't a pure function. Unless it's a top-level method (that is, at the boundary of the application), it doesn't exemplify larger-scale functional architecture.

That said, my goal is to refactor the code to an Impureim Sandwich: Impure actions first, then the meat of the functionality as a pure function, and then some more impure actions to complete the functionality. This strongly suggests that the first step should be to map over itemsToUpdate and call UpdateItem for each.

If, however, you do that, you get this:

IEnumerable<Task<OneOf<ShoppingListItem, NotFound, Error>>> results =
    itemsToUpdate.Select(item => UpdateItem(item, dbContext));

The results object is a sequence of tasks. If we consider Task as a surrogate for IO, each task should be considered impure, as it's either non-deterministic, has side effects, or both. This means that we can't pass results to a pure function, and that frustrates the ambition to structure the code as an Impureim Sandwich.

This is one of the most common problems in functional programming, and the answer is usually: Use a traversal.

IEnumerable<OneOf<ShoppingListItem, NotFound<ShoppingListItem>, Error>> results =
    await itemsToUpdate.Traverse(item => UpdateItem(item, dbContext));

Because this first, impure layer of the sandwich awaits the task, results is now an immutable value that can be passed to the pure step. This, by the way, assumes that ShoppingListItem is immutable, too.

Notice that I adjusted one of the cases of the discriminated union to NotFound<ShoppingListItem> rather than just NotFound. While the OneOf library ships with a NotFound type, it doesn't have a generic container of that name, so I defined it myself:

internal sealed record NotFound<T>(T Item);

I added it to make the next step simpler.

Aggregating the results #

The next step is to sort the results into three 'buckets', as it were.

// Pure
var seed =
    (
        Enumerable.Empty<ShoppingListItem>(),
        Enumerable.Empty<ShoppingListItem>(),
        Enumerable.Empty<Error>()
    );
var result = results.Aggregate(
    seed,
    (state, result) =>
        result.Match(
            storedItem => (state.Item1.Append(storedItem), state.Item2, state.Item3),
            notFound => (state.Item1, state.Item2.Append(notFound.Item), state.Item3),
            error => (state.Item1, state.Item2, state.Item3.Append(error))));

It's also possible to inline the seed value, but here I defined it in a separate expression in an attempt at making the code a little more readable. I don't know if I succeeded, because regardless of where it goes, it's hardly idiomatic to break tuple initialization over multiple lines. I had to, though, because otherwise the code would run too far to the right.

The lambda expression handles each result in results and uses Match to append the value to its proper 'bucket'. The outer result is a tuple of the three collections.

Saving the changes and returning the results #

The final, impure step in the sandwich is to save the changes and return the results:

// Impure
await dbContext.SaveChangesAsync();
return new OkResult(
    new BulkUpdateResult([.. result.Item1], [.. result.Item2], [.. result.Item3]));

To be honest, the last line of code is pure, but that's not unusual when it comes to Impureim Sandwiches.

Accumulating the bulk-update result #

So far, I've assumed that the final BulkUpdateResult class is just a simple immutable container without much functionality. If, however, we add some copy-and-update functions to it, we can use them to aggregate the result, instead of an anonymous tuple.

internal BulkUpdateResult Store(ShoppingListItem item) =>
    new([.. StoredItems, item], FailedItems, Errors);
 
internal BulkUpdateResult Fail(ShoppingListItem item) =>
    new(StoredItems, [.. FailedItems, item], Errors);
 
internal BulkUpdateResult Error(Error error) =>
    new(StoredItems, FailedItems, [.. Errors, error]);

I would have personally preferred the name NotFound instead of Fail, but I was going with the original post's failedItems terminology, and I thought that it made more sense to call a method Fail when it adds to a collection called FailedItems.

Adding these three instance methods to BulkUpdateResult simplifies the composing code:

// Impure
IEnumerable<OneOf<ShoppingListItem, NotFound<ShoppingListItem>, Error>> results =
    await itemsToUpdate.Traverse(item => UpdateItem(item, dbContext));
 
// Pure
var result = results.Aggregate(
    new BulkUpdateResult([], [], []),
    (state, result) =>
        result.Match(
            storedItem => state.Store(storedItem),
            notFound => state.Fail(notFound.Item),
            error => state.Error(error)));
 
// Impure
await dbContext.SaveChangesAsync();
return new OkResult(result);

This variation starts with an empty BulkUpdateResult and then uses Store, Fail, or Error as appropriate to update the state.

Parallel Sequence #

If the tasks you want to traverse are thread-safe, you might consider making the traversal concurrent. You can use Task.WhenAll for that. It has the same type as Sequence, so if you can live with the extra non-determinism that comes with parallel execution, you can use that instead:

internal static async Task<IEnumerable<T>> Sequence<T>(this IEnumerable<Task<T>> tasks)
{
    return await Task.WhenAll(tasks);
}

Since the method signature doesn't change, the rest of the code remains unchanged.

Conclusion #

One of the most common stumbling blocks in functional programming is when you have a collection of values, and you need to perform an impure action (typically I/O) for each. This leaves you with a collection of impure values (Task in C#, Task or Async in F#, IO in Haskell, etc.). What you actually need is a single impure value that contains the collection of results.

The solution to this kind of problem is to traverse the collection, rather than mapping over it (with Select, map, fmap, or similar). Note that computer scientists often talk about traversing a data structure like a tree. This is a less well-defined use of the word, and not directly related. That said, you can also write Traverse and Sequence functions for trees.

This article used a Stack Overflow question as the starting point for an example showing how to refactor imperative code to an Impureim Sandwich.

This completes the first variation requested in the Stack Overflow question.

Next: Short-circuiting an asynchronous traversal.

Published: Monday, 18 November 2024 07:39:00 UTC

Collecting and handling result values by Mark Seemann