An F# example of reducing a workflow to an impureim sandwich.

The most common reaction to the impureim sandwich architecture (also known as functional core, imperative shell) is one of incredulity. How does this way of organising code generalise to arbitrary complexity?

The short answer is that it doesn't. Given sufficient complexity, you may not be able to 1. gather all data with impure queries, 2. call a pure function, and 3. apply the return value via impure actions. The question is: How much complexity is required before you have to give up on the impureim sandwich?

Axis with sandwich to the left, free monad to the right, and a tentative transition zone in between, but biased to the right.

There's probably a fuzzy transition zone where the sandwich may still apply, but where it begins to be questionable whether it's beneficial. In my experience, this transition seems to lie further to the right than most people think.

Once you have to give up on the impureim sandwich, in functional programming you may resort to using free monads. In object-oriented programming, you may use Dependency Injection. Depending on language and paradigm, still more options may be available.

My experience is mostly with web-based systems, but in that context, I find that a surprisingly large proportion of problems can be rephrased and organised in such a way that the impureim sandwich architecture applies. Actually, I surmise that most problems can be addressed in that way.

I am, however, often looking for good examples. As I wrote in a comment to Dependency rejection:

"I'd welcome a simplified, but still concrete example where the impure/pure/impure sandwich described here isn't going to be possible."

Such examples are, unfortunately, rare. While real production code may seem like an endless supply of examples, production code often contains irrelevant details that obscure the essence of the case. Additionally, production code is often proprietary, so I can't share it.

In 2019 Christer van der Meeren kindly supplied an example problem that I could refactor. Since then, there's been a dearth of more examples. Until now.

I recently ran into another fine example of a decision flow that at first glance seemed a poor fit for the functional core, imperative shell architecture. What follows is, actually, production code, here reproduced with the kind permission of Criipto.

Create a user if it doesn't exist #

As I've previously touched on, I'm helping Criipto integrate with the Fusebit API. This API has a user model where you can create users in the Fusebit services. Once you've created a user, a client can log on as that user and access the resources available to her, him, or it. There's an underlying security model that controls all of that.

Criipto's users may not all be provisioned as users in the Fusebit API. If they need to use the Fusebit system, we'll provision them just in time. On the other hand, there's no reason to create the user if it already exists.

But it gets more complicated than that. To fit our requirements, the user must have an associated issuer. This is another Fusebit resource that we may have to provision if it doesn't already exist.

The desired logic may be easier to follow if visualised as a flowchart.

Flowchart that illustrates how to provision a Fusebit user.

The user must have an issuer, but an appropriate issuer may already exist. If it doesn't, we must create the issuer before we create the user.

At first blush this seems like a workflow that doesn't fit the impureim sandwich architecture. After all, you should only check for the existence of the issuer if you find that the user doesn't exist. There's a decision between the first and second impure query.

Can we resolve this problem and implement the functionality as an impureim sandwich?

Speculative prefetching #

When looking for ways to apply the functional core, imperative shell architecture, it often pays to take a step back and look at a slightly larger picture. Another way to put it is that you should think less procedurally, and more declaratively. A flowchart like the above is essentially procedural. It may prevent you from seeing other opportunities.

One of the reasons I like functional programming is that it forces me to think in a more declarative way. This helps me identify better abstractions than I might otherwise be able to think of.

The above flowchart is representative of the most common counterargument I hear: The impureim sandwich doesn't work if the code has to make a decision about a secondary query based on the result of an initial query. This is also what's at stake here. The result of the user exists query determines whether the program should query about the issuer.

The assumption is that since the user is supposed to have an issuer, if the user exists, the issuer must also exist.

Even so, would it hurt so much to query the Fusebit API up front about the issuer?

Perhaps you react to such a suggestion with distaste. After all, it seems wasteful. Why query a web service if you don't need the result? And what about performance?

Whether or not this is wasteful depends on what kind of waste you measure. If you measure bits transmitted over the network, then yes, you may see this measure increase.

It may not be as bad as you think, though. Perhaps the HTTP GET request you're about to make has a cacheable result. Perhaps the result is already waiting in your proxy server's RAM.

Neither the Fusebit HTTP API's user resources nor its issuer resources, however, come with cache headers, so this last argument doesn't apply here. I still included it above because it's worth taking into account.

Another typical performance consideration is that this kind of potentially redundant traffic will degrade performance. Perhaps. As usual, if that's a concern: measure.

Querying the API whether a user exists is independent of the query to check if an issuer exists. This means that you could perform the two queries in parallel. Depending on the total load on the system, the difference between one HTTP request and two concurrent requests may be negligible. (It could still impact overall system performance if the system is already running close to capacity, so this isn't always a good strategy. Often, however, it's not really an issue.)

A third consideration is the statistical distribution of pathways through the system. If you consider the flowchart above, it indicates a cyclomatic complexity of 3; there are three distinct pathways.

If, however, it turns out that in 95 percent of cases the user doesn't exist, you're going to have to perform the second query (for issuer) anyway, then the difference between prefetching and conditional querying is minimal.

While some would consider this 'cheating', when aiming for the impureim sandwich architecture, these are all relevant questions to ponder. It often turns out that you can fetch all the data before passing them to a pure function. It may entail 'wasting' some electrons on queries that turn out to be unnecessary, but it may still be worth doing.

There's another kind of waste worth considering. This is the waste in developer hours if you write code that's harder to maintain than it has to be. As I recently described in an article titled Favor real dependencies for unit testing, the more you use functional programming, the less test maintenance you'll have.

Keep in mind that the scenario in this article is a server-to-server interaction. How much would a bit of extra bandwidth cost, versus wasted programmer hours?

If you can substantially simplify the code at the cost of a few dollars of hardware or network infrastructure, it's often a good trade-off.

Referential integrity #

The above flowchart implies a more subtle assumption that turns out to not hold in practice. The assumption is that all users in the system have been created the same: that all users are associated with an issuer. Thus, according to this assumption, if the user exists, then so must the issuer.

This turns out to be a false assumption. The Fusebit HTTP API doesn't enforce referential integrity. You can create a user with an issuer that doesn't exist. When creating a user, you supply only the issuer ID (a string), but the API doesn't check that an issuer with that ID exists.

Thus, just because a user exists you can't be sure that its associated issuer exists. To be sure, you'd have to check.

But this means that you'll have to perform two queries after all. The angst from the previous section turns out to be irrelevant. The flowchart is wrong.

Instead, you have two independent, but potentially parallelisable, processes:

Flowchart showing the actual parallel nature of Fusebit user provisioning.

You can't always be that lucky when you consider how to make requirements fit the impureim sandwich mould, but this is beginning to look really promising. Each of these two processes is near-trivial.

Idempotence #

Really, what's actually required is an idempotent create action. As the RESTful Web Services Cookbook describes, all HTTP verbs except POST should be regarded as idempotent in a well-designed API. Alas, creating users and issuers are (naturally) done with POST requests, so these operations aren't naturally idempotent.

In functional programming you often decouple decisions from effects. In order to be able to do that here, I created this discriminated union:

type Idempotent<'a> = UpToDate | Update of 'a

This type is isomorphic to option, but I found it worthwhile to introduce a distinct type for this particular purpose. Usually, if a query returns, say UserData option, you'd interpret the Some case as indicating that the user exists, and the None case as indicating that the user doesn't exist.

Here, I wanted the 'populated' case to indicate that an Update action is required. If I'd used option, then I would have had to map the user-doesn't-exist case to a Some value, and the user-exists case to a None value. I though that this might be confusing to other programmers, since it'd go against the usual idiomatic use of the type.

That's the reason I created a custom type.

The UpToDate case indicates that the value exists and is up to date. The Update case is worded in the imperative to indicate that the value (of type 'a) should be updated.

Establish #

The purpose of this entire exercise is to establish that a user (and issuer) exists. It's okay if the user already exists, but if it doesn't, we should create it.

I mulled over the terminology and liked the verb establish, to the consternation of many Twitter users.

CreateUserIfNotExists is a crude name 🤢

How about EstablishUser instead?

"Establish" can both mean to "set up on a firm or permanent basis" and "show (something) to be true or certain by determining the facts". That seems to say the same in a more succinct way 👌

Just read the comments to see how divisive that little idea is. Still, I decided to define a function called establish to convert a Boolean and an 'a value to an Idempotent<'a> value.

Don't forget the purpose of this entire exercise. The benefit that the impureim sandwich architecture can bring is that it enables you to drain the impure parts of the sandwich of logic. Pure functions are intrinsically testable, so the more you define decisions and algorithms as pure functions, the more testable the code will be.

It's even better when you can make the testable functions generic, because reusable functions has the potential to reduce cognitive load. Once a reader learns and understands an abstraction, it stops being much of a cognitive load.

The functions to create and manipulate Idempotent<'a> values should be covered by automated tests. The behaviour is quite trivial, though, so to increase coverage we can write the tests as properties. The code base in question already uses FsCheck, so I just went with that:

[<Property(QuietOnSuccess = true)>]
let ``Idempotent.establish returns UpToDate`` (x : int) =
    let actual = Idempotent.establish x true
    UpToDate =! actual
[<Property(QuietOnSuccess = true)>]
let ``Idempotent.establish returns Update`` (x : string) =
    let actual = Idempotent.establish x false
    Update x =! actual

These two properties also use Unquote for assertions. The =! operator means should equal, so you can read an expression like UpToDate =! actual as UpToDate should equal actual.

This describes the entire behaviour of the establish function, which is implemented this way:

// 'a -> bool -> Idempotent<'a>
let establish x isUpToDate = if isUpToDate then UpToDate else Update x

About as trivial as it can be. Unsurprising code is good.

Fold #

The establish function affords a way to create Idempotent values. It'll also be useful with a function to get the value out of the container, so to speak. While you can always pattern match on an Idempotent value, that'd introduce decision logic into the code that does that.

The goal is to cover as much decision logic as possible by tests so that we can leave the overall impureim sandwich as an untested declarative composition - a Humble Object, if you will. It'd be appropriate to introduce a reusable function (covered by tests) that can fulfil that role.

We need the so-called case analysis of Idempotent<'a>. In other terminology, this is also known as the catamorphism. Since Idempotent<'a> is isomorphic to option (also known as Maybe), the catamorphism is also isomorphic to the Maybe catamorphism. While we expect no surprises, we can still cover the function with automated tests:

[<Property(QuietOnSuccess = true)>]
let ``Idempotent.fold when up-to-date`` (expected : DateTimeOffset) =
    let actual =
        Idempotent.fold (fun _ -> DateTimeOffset.MinValue) expected UpToDate
    expected =! actual
[<Property(QuietOnSuccess = true)>]
let ``Idempotent.fold when update required`` (x : TimeSpan) =
    let f (ts : TimeSpan) = ts.TotalHours + float ts.Minutes
    let actual = Update x |> Idempotent.fold f 1.1
    f x =! actual

The most common catamorphisms are idiomatically called fold in F#, so that's what I called it as well.

The first property states that when the Idempotent value is already UpToDate, fold simply returns the 'fallback value' (here called expected) and the function doesn't run.

When the Idempotent is an Update value, the function f runs over the contained value x.

The implementation hardly comes as a surprise:

// ('a -> 'b) -> 'b -> Idempotent<'a> -> 'b
let fold f onUpToDate = function
    | UpToDate -> onUpToDate
    | Update x -> f x

Both establish and fold are general-purpose functions. I needed one more specialised function before I could compose a workflow to create a Fusebit user if it doesn't exist.

Checking whether an issuer exists #

As I've previously mentioned, I'd already developed a set of modules to interact with the Fusebit API. One of these was a function to read an issuer. This Issuer.get action returns a Task<Result<IssuerData, HttpResponseMessage>>.

The Result value will only be an Ok value if the issuer exists, but we can't conclude that any Error value indicates a missing resource. An Error may also indicate a genuine HTTP error.

A function to translate a Result<IssuerData, HttpResponseMessage> value to a Result<bool, HttpResponseMessage> by examining the HttpResponseMessage is just complex enough (cyclomatic complexity 3) to warrant unit test coverage. Here I just went with some parametrised tests rather than FsCheck properties.

The first test asserts that when the result is Ok it translates to Ok true:

[<InlineData ("""DN""")>]
[<InlineData ("""lga""")>]
[<InlineData (""null"")>]
let ``Issuer exists`` iid dn jwks =
    let issuer =
            Id = Uri iid
            DisplayName = dn |> Option.ofObj
            PKA = JwksEndpoint (Uri jwks)
    let result = Ok issuer
    let actual = Fusebit.issuerExists result
    Ok true =! actual

All tests here are structured according to the AAA formatting heuristic. This particular test may seem so obvious that you may wonder how there's actually any logic to test. Perhaps the next test throws a little more light on that question:

let ``Issuer doesn't exist`` () =
    use resp = new HttpResponseMessage (HttpStatusCode.NotFound)
    let result = Error resp
    let actual = Fusebit.issuerExists result
    Ok false =! actual

How do we know that the requested issuer doesn't exist? It's not just any Error result that indicates that, but a particular 404 Not Found result. Notice that this particular Error result translates to an Ok result: Ok false.

All other kinds of Error results, on the other hand, should remain Error values:

[<InlineData (HttpStatusCode.BadRequest)>]
[<InlineData (HttpStatusCode.Unauthorized)>]
[<InlineData (HttpStatusCode.Forbidden)>]
[<InlineData (HttpStatusCode.InternalServerError)>]
let ``Issuer error`` statusCode =
    use resp = new HttpResponseMessage (statusCode)
    let expected = Error resp
    let actual = Fusebit.issuerExists expected
    expected =! actual

All together, these tests indicate an implementation like this:

// Result<'a, HttpResponseMessage> -> Result<bool, HttpResponseMessage>
let issuerExists = function
    | Ok _ -> Ok true
    | Error (resp : HttpResponseMessage) ->
        if resp.StatusCode = HttpStatusCode.NotFound
        then Ok false
        else Error resp

Once again, I've managed to write a function more generic than its name implies. This seems to happen to me a lot.

In this context, what matters more is that this is another pure function - which also explains why it was so easy to unit test.

Composition #

It turned out that I'd now managed to extract all complexity to pure, testable functions. What remained was composing them together.

First, a couple of private helper functions:

// Task<Result<'a, 'b>> -> Task<Result<unit, 'b>>
let ignoreOk x = (fun _ -> ()) x
// ('a -> Task<Result<'b, 'c>>) -> Idempotent<'a> -> Task<Result<unit, 'c>>
let whenMissing f = Idempotent.fold (f >> ignoreOk) (task { return Ok () })

These only exist to make the ensuing composition more readable. Since they both have a cyclomatic complexity of 1, I found that it was okay to skip unit testing.

The same is true for the final composition:

let! comp = taskResult {
    let (issuer, identity, user) = gatherData dto
    let! issuerExists =
        Issuer.get client issuer.Id |> Fusebit.issuerExists
    let! userExists =
        User.find client (IdentityCriterion identity)
        |> (not << List.isEmpty)
    do! Idempotent.establish issuer issuerExists
        |> whenMissing (Issuer.create client)
    do! Idempotent.establish user userExists
        |> whenMissing (User.create client) }

The comp composition starts by gathering data from an incoming dto value. This code snippet is part of a slightly larger Controller Action that I'm not showing here. The rest of the surrounding method is irrelevant to the present example, since it only deals with translation of the input Data Transfer Object and from comp back to an IHttpActionResult object.

After a little pure hors d'œuvre the sandwich arrives with the first impure actions: Retrieving the issuerExists and userExists values from the Fusebit API. After that, the sandwich does fall apart a bit, I admit. Perhaps it's more like a piece of smørrebrød...

I could have written this composition with a more explicit sandwich structure, starting by exclusively calling Issuer.get and User.find. That would have been the first impure layer of the sandwich.

As the pure centre, I could then have composed a pure function from Fusebit.issuerExists, not << List.isEmpty and Idempotent.establish.

Finally, I could have completed the sandwich with the second impure layer that'd call whenMissing.

I admit that I didn't actually structure the code exactly like that. I mixed some of the pure functions (Fusebit.issuerExists and not << List.isEmpty) with the initial queries by adding them as continuations with and Likewise, I decided to immediately pipe the results of Idempotent.establish to whenMissing. My motivation was that this made the code more readable, since I wanted to highlight the symmetry of the two actions. That mattered more to me, as a writer, than highlighting any sandwich structure.

I'm not insisting I was right in making that choice. Perhaps I was; perhaps not. I'm only reporting what motivated me.

Could the code be further improved? I wouldn't be surprised, but at this time I felt that it was good enough to submit to a code review, which it survived.

One possible improvement, however, might be to parallelise the two actions, so that they could execute concurrently. I'm not sure it's worth the (small?) effort, though.

Conclusion #

I'm always keen on examples that challenge the notion of the impureim sandwich architecture. Usually, I find that by taking a slightly more holistic view of what has to happen, I can make most problems fit the pattern.

The most common counterargument is that subsequent impure queries may depend on decisions taken earlier. Thus, the argument goes, you can't gather all impure data up front.

I'm sure that such situations genuinely exist, but I also think that they are rarer than most people think. In most cases I've experienced, even when I initially think that I've encountered such a situation, after a bit of reflection I find that I can structure the code to fit the functional core, imperative shell architecture. Not only that, but the code becomes simpler in the process.

This happened in the example I've covered in this article. Initially, I though that ensuring that a Fusebit user exists would involve a process as illustrated in the first of the above flowcharts. Then, after thinking it over, I realised that defining a simple discriminated union would simplify matters and make the code testable.

I thought it was worthwhile sharing that journey of discovery with others.

Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.


Monday, 14 February 2022 07:44:00 UTC


"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Monday, 14 February 2022 07:44:00 UTC