Short-circuiting an asynchronous traversal

Another C# example.

This article is a continuation of an earlier post about refactoring a piece of imperative code to a functional architecture. It all started with a Stack Overflow question, but read the previous article, and you'll be up to speed.

Imperative outset #

To begin, consider this mostly imperative code snippet:

var storedItems = new List<ShoppingListItem>();
var failedItems = new List<ShoppingListItem>();
var state = (storedItems, failedItems, hasError: false);
foreach (var item in itemsToUpdate)
{
    OneOf<ShoppingListItem, NotFound, Error> updateResult = await UpdateItem(item, dbContext);
    state = updateResult.Match<(List<ShoppingListItem>, List<ShoppingListItem>, bool)>(
        storedItem => { storedItems.Add(storedItem); return state;  },
        notFound => { failedItems.Add(item); return state; },
        error => { state.hasError = true; return state; }
        );
    if (state.hasError)
        return Results.BadRequest();
}
 
await dbContext.SaveChangesAsync();
 
return Results.Ok(new BulkUpdateResult([.. storedItems], [.. failedItems]));

I'll recap a few points from the previous article. Apart from one crucial detail, it's similar to the other post. One has to infer most of the types and APIs, since the original post didn't show more code than that. If you're used to engaging with Stack Overflow questions, however, it's not too hard to figure out what most of the moving parts do.

The most non-obvious detail is that the code uses a library called OneOf, which supplies general-purpose, but rather abstract, sum types. Both the container type OneOf, as well as the two indicator types NotFound and Error are defined in that library.

The Match method implements standard Church encoding, which enables the code to pattern-match on the three alternative values that UpdateItem returns.

One more detail also warrants an explicit description: The itemsToUpdate object is an input argument of the type IEnumerable<ShoppingListItem>.

The major difference from before is that now the update process short-circuits on the first Error. If an error occurs, it stops processing the rest of the items. In that case, it now returns Results.BadRequest(), and it doesn't save the changes to dbContext.

The implementation makes use of mutable state and undisciplined I/O. How do you refactor it to a more functional design?

Short-circuiting traversal #

The standard Traverse function isn't lazy, or rather, it does consume the entire input sequence. Even various Haskell data structures I investigated do that. And yes, I even tried to traverse ListT. If there's a data structure that you can traverse with deferred execution of I/O-bound actions, I'm not aware of it.

That said, all is not lost, but you'll need to implement a more specialized traversal. While consuming the input sequence, the function needs to know when to stop. It can't do that on just any IEnumerable<T>, because it has no information about T.

If, on the other hand, you specialize the traversal to a sequence of items with more information, you can stop processing if it encounters a particular condition. You could generalize this to, say, IEnumerable<Either<L, R>>, but since I already have the OneOf library in scope, I'll use that, instead of implementing or pulling in a general-purpose Either data type.

In fact, I'll just use a three-way OneOf type compatible with the one that UpdateItem returns.

internal static async Task<IEnumerable<OneOf<T1, T2, Error>>> Sequence<T1, T2>(
    this IEnumerable<Task<OneOf<T1, T2, Error>>> tasks)
{
    var results = new List<OneOf<T1, T2, Error>>();
    foreach (var task in tasks)
    {
        var result = await task;
        results.Add(result);
        if (result.IsT2)
            break;
    }
    return results;
}

This implementation doesn't care what T1 or T2 is, so they're free to be ShoppingListItem and NotFound. The third type argument, on the other hand, must be Error.

The if conditional looks a bit odd, but as I wrote, the types that ship with the OneOf library have rather abstract APIs. A three-way OneOf value comes with three case tests called IsT0, IsT1, and IsT2. Notice that the library uses a zero-indexed naming convention for its type parameters. IsT2 returns true if the value is the third kind, in this case Error. If a task turns out to produce an Error, the Sequence method adds that one error, but then stops processing any remaining items.

Some readers may complain that the entire implementation of Sequence is imperative. It hardly matters that much, since the mutation doesn't escape the method. The behaviour is as functional as it's possible to make it. It's fundamentally I/O-bound, so we can't consider it a pure function. That said, if we hypothetically imagine that all the tasks are deterministic and have no side effects, the Sequence function does become a pure function when viewed as a black box. From the outside, you can't tell that the implementation is imperative.

It is possible to implement Sequence in a proper functional style, and it might make a good exercise. I think, however, that it'll be difficult in C#. In F# or Haskell I'd use recursion, and while you can do that in C#, I admit that I've lost sight of whether or not tail recursion is supported by the C# compiler.

Be that as it may, the traversal implementation doesn't change.

internal static Task<IEnumerable<OneOf<TResult, T2, Error>>> Traverse<T1, T2, TResult>(
    this IEnumerable<T1> items,
    Func<T1, Task<OneOf<TResult, T2, Error>>> selector)
{
    return items.Select(selector).Sequence();
}

You can now Traverse the itemsToUpdate:

// Impure
IEnumerable<OneOf<ShoppingListItem, NotFound<ShoppingListItem>, Error>> results =
    await itemsToUpdate.Traverse(item => UpdateItem(item, dbContext));

As the // Impure comment may suggest, this constitutes the first impure layer of an Impureim Sandwich.

Aggregating the results #

Since the above statement awaits the traversal, the results object is a 'pure' object that can be passed to a pure function. This does, however, assume that ShoppingListItem is an immutable object.

The next step must collect results and NotFound-related failures, but contrary to the previous article, it must short-circuit if it encounters an Error. This again suggests an Either-like data structure, but again I'll repurpose a OneOf container. I'll start by defining a seed for an aggregation (a left fold).

var seed =
    OneOf<(IEnumerable<ShoppingListItem>, IEnumerable<ShoppingListItem>), Error>
        .FromT0(([], []));

This type can be either a tuple or an error. The .NET tendency is often to define an explicit Result<TSuccess, TFailure> type, where TSuccess is defined to the left of TFailure. This, for example, is how F# defines Result types, and other .NET libraries tend to emulate that design. That's also what I've done here, although I admit that I'm regularly confused when going back and forth between F# and Haskell, where the Right case is idiomatically considered to indicate success.

As already discussed, OneOf follows a zero-indexed naming convention for type parameters, so FromT0 indicates the first (or leftmost) case. The seed is thus initialized with a tuple that contains two empty sequences.

As in the previous article, you can now use the Aggregate method to collect the result you want.

OneOf<BulkUpdateResult, Error> result = results
    .Aggregate(
        seed,
        (state, result) =>
            result.Match(
                storedItem => state.MapT0(
                    t => (t.Item1.Append(storedItem), t.Item2)),
                notFound => state.MapT0(
                    t => (t.Item1, t.Item2.Append(notFound.Item))),
                e => e))
    .MapT0(t => new BulkUpdateResult(t.Item1.ToArray(), t.Item2.ToArray()));

This expression is a two-step composition. I'll get back to the concluding MapT0 in a moment, but let's first discuss what happens in the Aggregate step. Since the state is now a discriminated union, the big lambda expression not only has to Match on the result, but it also has to deal with the two mutually exclusive cases in which state can be.

Although it comes third in the code listing, it may be easiest to explain if we start with the error case. Keep in mind that the seed starts with the optimistic assumption that the operation is going to succeed. If, however, we encounter an error e, we now switch the state to the Error case. Once in that state, it stays there.

The two other result cases map over the first (i.e. the success) case, appending the result to the appropriate sequence in the tuple t. Since these expressions map over the first (zero-indexed) case, these updates only run as long as the state is in the success case. If the state is in the error state, these lambda expressions don't run, and the state doesn't change.

After having collected the tuple of sequences, the final step is to map over the success case, turning the tuple t into a BulkUpdateResult. That's what MapT0 does: It maps over the first (zero-indexed) case, which contains the tuple of sequences. It's a standard functor projection.

Saving the changes and returning the results #

The final, impure step in the sandwich is to save the changes and return the results:

// Impure
return await result.Match(
    async bulkUpdateResult =>
    {
        await dbContext.SaveChangesAsync();
        return Results.Ok(bulkUpdateResult);
    },
    _ => Task.FromResult(Results.BadRequest()));

Note that it only calls dbContext.SaveChangesAsync() in case the result is a success.

Accumulating the bulk-update result #

So far, I've assumed that the final BulkUpdateResult class is just a simple immutable container without much functionality. If, however, we add some copy-and-update functions to it, we can use that to aggregate the result, instead of an anonymous tuple.

internal BulkUpdateResult Store(ShoppingListItem item) =>
    new([.. StoredItems, item], FailedItems);
 
internal BulkUpdateResult Fail(ShoppingListItem item) =>
    new(StoredItems, [.. FailedItems, item]);

I would have personally preferred the name NotFound instead of Fail, but I was going with the original post's failedItems terminology, and I thought that it made more sense to call a method Fail when it adds to a collection called FailedItems.

Adding these two instance methods to BulkUpdateResult simplifies the composing code:

// Pure
OneOf<BulkUpdateResult, Error> result = results
    .Aggregate(
        OneOf<BulkUpdateResult, Error>.FromT0(new([], [])),
        (state, result) =>
            result.Match(
                storedItem => state.MapT0(bur => bur.Store(storedItem)),
                notFound => state.MapT0(bur => bur.Fail(notFound.Item)),
                e => e));

This variation starts with an empty BulkUpdateResult and then uses Store or Fail as appropriate to update the state. The final, impure step of the sandwich remains the same.

Conclusion #

It's a bit more tricky to implement a short-circuiting traversal than the standard traversal. You can, still, implement a specialized Sequence or Traverse method, but it requires that the input stream carries enough information to decide when to stop processing more items. In this article, I used a specialized three-way union, but you could generalize this to use a standard Either or Result type.

Published: Monday, 02 December 2024 09:32:00 UTC

Short-circuiting an asynchronous traversal by Mark Seemann

Imperative outset #

Short-circuiting traversal #

Aggregating the results #

Saving the changes and returning the results #

Accumulating the bulk-update result #

Conclusion #

Wish to comment?

Published

Tags