Short-circuiting an asynchronous traversal by Mark Seemann
Another C# example.
This article is a continuation of an earlier post about refactoring a piece of imperative code to a functional architecture. It all started with a Stack Overflow question, but read the previous article, and you'll be up to speed.
Imperative outset #
To begin, consider this mostly imperative code snippet:
var storedItems = new List<ShoppingListItem>(); var failedItems = new List<ShoppingListItem>(); var state = (storedItems, failedItems, hasError: false); foreach (var item in itemsToUpdate) { OneOf<ShoppingListItem, NotFound, Error> updateResult = await UpdateItem(item, dbContext); state = updateResult.Match<(List<ShoppingListItem>, List<ShoppingListItem>, bool)>( storedItem => { storedItems.Add(storedItem); return state; }, notFound => { failedItems.Add(item); return state; }, error => { state.hasError = true; return state; } ); if (state.hasError) return Results.BadRequest(); } await dbContext.SaveChangesAsync(); return Results.Ok(new BulkUpdateResult([.. storedItems], [.. failedItems]));
I'll recap a few points from the previous article. Apart from one crucial detail, it's similar to the other post. One has to infer most of the types and APIs, since the original post didn't show more code than that. If you're used to engaging with Stack Overflow questions, however, it's not too hard to figure out what most of the moving parts do.
The most non-obvious detail is that the code uses a library called OneOf, which supplies general-purpose, but rather abstract, sum types. Both the container type OneOf
, as well as the two indicator types NotFound
and Error
are defined in that library.
The Match
method implements standard Church encoding, which enables the code to pattern-match on the three alternative values that UpdateItem
returns.
One more detail also warrants an explicit description: The itemsToUpdate
object is an input argument of the type IEnumerable<ShoppingListItem>
.
The major difference from before is that now the update process short-circuits on the first Error
. If an error occurs, it stops processing the rest of the items. In that case, it now returns Results.BadRequest()
, and it doesn't save the changes to dbContext
.
The implementation makes use of mutable state and undisciplined I/O. How do you refactor it to a more functional design?
Short-circuiting traversal #
The standard Traverse function isn't lazy, or rather, it does consume the entire input sequence. Even various Haskell data structures I investigated do that. And yes, I even tried to traverse
ListT. If there's a data structure that you can traverse
with deferred execution of I/O-bound actions, I'm not aware of it.
That said, all is not lost, but you'll need to implement a more specialized traversal. While consuming the input sequence, the function needs to know when to stop. It can't do that on just any IEnumerable<T>, because it has no information about T
.
If, on the other hand, you specialize the traversal to a sequence of items with more information, you can stop processing if it encounters a particular condition. You could generalize this to, say, IEnumerable<Either<L, R>>
, but since I already have the OneOf library in scope, I'll use that, instead of implementing or pulling in a general-purpose Either data type.
In fact, I'll just use a three-way OneOf
type compatible with the one that UpdateItem
returns.
internal static async Task<IEnumerable<OneOf<T1, T2, Error>>> Sequence<T1, T2>( this IEnumerable<Task<OneOf<T1, T2, Error>>> tasks) { var results = new List<OneOf<T1, T2, Error>>(); foreach (var task in tasks) { var result = await task; results.Add(result); if (result.IsT2) break; } return results; }
This implementation doesn't care what T1
or T2
is, so they're free to be ShoppingListItem
and NotFound
. The third type argument, on the other hand, must be Error
.
The if
conditional looks a bit odd, but as I wrote, the types that ship with the OneOf library have rather abstract APIs. A three-way OneOf
value comes with three case tests called IsT0
, IsT1
, and IsT2
. Notice that the library uses a zero-indexed naming convention for its type parameters. IsT2
returns true
if the value is the third kind, in this case Error
. If a task
turns out to produce an Error
, the Sequence
method adds that one error, but then stops processing any remaining items.
Some readers may complain that the entire implementation of Sequence
is imperative. It hardly matters that much, since the mutation doesn't escape the method. The behaviour is as functional as it's possible to make it. It's fundamentally I/O-bound, so we can't consider it a pure function. That said, if we hypothetically imagine that all the tasks
are deterministic and have no side effects, the Sequence
function does become a pure function when viewed as a black box. From the outside, you can't tell that the implementation is imperative.
It is possible to implement Sequence
in a proper functional style, and it might make a good exercise. I think, however, that it'll be difficult in C#. In F# or Haskell I'd use recursion, and while you can do that in C#, I admit that I've lost sight of whether or not tail recursion is supported by the C# compiler.
Be that as it may, the traversal implementation doesn't change.
internal static Task<IEnumerable<OneOf<TResult, T2, Error>>> Traverse<T1, T2, TResult>( this IEnumerable<T1> items, Func<T1, Task<OneOf<TResult, T2, Error>>> selector) { return items.Select(selector).Sequence(); }
You can now Traverse
the itemsToUpdate
:
// Impure IEnumerable<OneOf<ShoppingListItem, NotFound<ShoppingListItem>, Error>> results = await itemsToUpdate.Traverse(item => UpdateItem(item, dbContext));
As the // Impure
comment may suggest, this constitutes the first impure layer of an Impureim Sandwich.
Aggregating the results #
Since the above statement awaits the traversal, the results
object is a 'pure' object that can be passed to a pure function. This does, however, assume that ShoppingListItem
is an immutable object.
The next step must collect results and NotFound
-related failures, but contrary to the previous article, it must short-circuit if it encounters an Error
. This again suggests an Either-like data structure, but again I'll repurpose a OneOf
container. I'll start by defining a seed
for an aggregation (a left fold).
var seed = OneOf<(IEnumerable<ShoppingListItem>, IEnumerable<ShoppingListItem>), Error> .FromT0(([], []));
This type can be either a tuple or an error. The .NET tendency is often to define an explicit Result<TSuccess, TFailure>
type, where TSuccess
is defined to the left of TFailure
. This, for example, is how F# defines Result types, and other .NET libraries tend to emulate that design. That's also what I've done here, although I admit that I'm regularly confused when going back and forth between F# and Haskell, where the Right
case is idiomatically considered to indicate success.
As already discussed, OneOf follows a zero-indexed naming convention for type parameters, so FromT0
indicates the first (or leftmost) case. The seed is thus initialized with a tuple that contains two empty sequences.
As in the previous article, you can now use the Aggregate method to collect the result you want.
OneOf<BulkUpdateResult, Error> result = results .Aggregate( seed, (state, result) => result.Match( storedItem => state.MapT0( t => (t.Item1.Append(storedItem), t.Item2)), notFound => state.MapT0( t => (t.Item1, t.Item2.Append(notFound.Item))), e => e)) .MapT0(t => new BulkUpdateResult(t.Item1.ToArray(), t.Item2.ToArray()));
This expression is a two-step composition. I'll get back to the concluding MapT0
in a moment, but let's first discuss what happens in the Aggregate
step. Since the state
is now a discriminated union, the big lambda expression not only has to Match
on the result
, but it also has to deal with the two mutually exclusive cases in which state
can be.
Although it comes third in the code listing, it may be easiest to explain if we start with the error case. Keep in mind that the seed
starts with the optimistic assumption that the operation is going to succeed. If, however, we encounter an error e
, we now switch the state
to the Error
case. Once in that state, it stays there.
The two other result
cases map over the first (i.e. the success) case, appending the result to the appropriate sequence in the tuple t
. Since these expressions map over the first (zero-indexed) case, these updates only run as long as the state
is in the success case. If the state
is in the error state, these lambda expressions don't run, and the state
doesn't change.
After having collected the tuple of sequences, the final step is to map over the success case, turning the tuple t
into a BulkUpdateResult
. That's what MapT0
does: It maps over the first (zero-indexed) case, which contains the tuple of sequences. It's a standard functor projection.
Saving the changes and returning the results #
The final, impure step in the sandwich is to save the changes and return the results:
// Impure return await result.Match( async bulkUpdateResult => { await dbContext.SaveChangesAsync(); return Results.Ok(bulkUpdateResult); }, _ => Task.FromResult(Results.BadRequest()));
Note that it only calls dbContext.SaveChangesAsync()
in case the result
is a success.
Accumulating the bulk-update result #
So far, I've assumed that the final BulkUpdateResult
class is just a simple immutable container without much functionality. If, however, we add some copy-and-update functions to it, we can use that to aggregate the result, instead of an anonymous tuple.
internal BulkUpdateResult Store(ShoppingListItem item) => new([.. StoredItems, item], FailedItems); internal BulkUpdateResult Fail(ShoppingListItem item) => new(StoredItems, [.. FailedItems, item]);
I would have personally preferred the name NotFound
instead of Fail
, but I was going with the original post's failedItems
terminology, and I thought that it made more sense to call a method Fail
when it adds to a collection called FailedItems
.
Adding these two instance methods to BulkUpdateResult
simplifies the composing code:
// Pure OneOf<BulkUpdateResult, Error> result = results .Aggregate( OneOf<BulkUpdateResult, Error>.FromT0(new([], [])), (state, result) => result.Match( storedItem => state.MapT0(bur => bur.Store(storedItem)), notFound => state.MapT0(bur => bur.Fail(notFound.Item)), e => e));
This variation starts with an empty BulkUpdateResult
and then uses Store
or Fail
as appropriate to update the state. The final, impure step of the sandwich remains the same.
Conclusion #
It's a bit more tricky to implement a short-circuiting traversal than the standard traversal. You can, still, implement a specialized Sequence
or Traverse
method, but it requires that the input stream carries enough information to decide when to stop processing more items. In this article, I used a specialized three-way union, but you could generalize this to use a standard Either or Result type.