A gentle introduction to modelling impure interactions with pure code.

Dependency Injection is a well-described concept in object-oriented programming, but as I've explained earlier, its not functional, because it makes everything impure. In general, you should reject the notion of dependencies by instead designing your application on the concept of an impure/pure/impure sandwich. This is possible more often than you'd think, but there's still a large group of applications where this will not work. If your application needs to interact with the impure world for an extended time, you need a way to model such interactions in a pure way.

This article introduces a way to do that.

Command line API #

Imagine that you have to write a command-line program that can ask a series of questions and print appropriate responses. In the general case, this is a (potentially) long-running series of interactions between the user and the program. To keep it simple, though, in this article we'll start by looking at a degenerate example:

Please enter your name.
Mark
Hello, Mark!

The program is simply going to request that you enter your name. Once you've done that, it prints the greeting. In object-oriented programming, using Dependency Injection, you might introduce an interface. Keeping it simple, you can restrict such an interface to two methods:

public interface ICommandLine
{
    string ReadLine();
    void WriteLine(string text);
}

Please note that this is clearly a toy example. In later articles, you'll see how to expand the example to cover some more complex interactions, but you could also read a more realistic example already. Initially, the example is degenerate, because there's only a single interaction. In this case, an impure/pure/impure sandwich is still possible, but such a design wouldn't scale to more complex interactions.

The problem with defining and injecting an interface is that it isn't functional. What's the functional equivalent, then?

Instruction set #

Instead of defining an interface, you can define a discriminated union that describes a limited instruction set for command-line interactions:

type CommandLineInstruction<'a> =
| ReadLine of (string -> 'a)
| WriteLine of string * 'a

You may notice that it looks a bit like the above C# interface. Instead of defining two methods, it defines two cases, but the names are similar.

The ReadLine case is an instruction that an interpreter can evaluate. The data contained in the case is a continuation function. After evaluating the instruction, an interpreter must invoke this function with a string. It's up to the interpreter to figure out which string to use, but it could, for example, come from reading an input string from the command line. The continuation function is the next step in whatever program you're writing.

The WriteLine case is another instruction for interpreters. The data contained in this case is a tuple. The first element of the tuple is input for the interpreter, which can choose to e.g. print the value on the command line, or ignore it, and so on. The second element of the tuple is a value used to continue whatever program this case is a part of.

This enables you to write a small, specialised Abstract Syntax Tree (AST), but there's currently no way to return from it. One way to do that is to add a third 'stop' case. If you're interested in that option, Scott Wlaschin covers this as one iteration in his excellent explanation of the AST design.

Instead of adding a third 'stop' case to CommandLineInstruction<'a>, another option is to add a new wrapper type around it:

type CommandLineProgram<'a> =
| Free of CommandLineInstruction<CommandLineProgram<'a>>
| Pure of 'a

The Free case contains a CommandLineInstruction that always continues to a new CommandLineProgram value. The only way you can escape the AST is via the Pure case, which simply contains the 'return' value.

Abstract Syntax Trees #

With these two types you can write specialised programs that contain instructions for an interpreter. Notice that the types are pure by intent, although in F# we can't really tell. You can, however, repeat this exercise in Haskell, where the instruction set looks like this:

data CommandLineInstruction next =
    ReadLine (String -> next)
  | WriteLine String next
  deriving (Functor)
 
type CommandLineProgram = Free CommandLineInstruction

Both of these types are pure, because IO is nowhere in sight. In Haskell, functions are pure by default. This also applies to the String -> next function contained in the ReadLine case.

Back in F# land, you can write an AST that implements the command-line interaction shown in the beginning of the article:

// CommandLineProgram<unit>
let program =
    Free (WriteLine (
            "Please enter your name.",
            Free (ReadLine (
                    fun s -> Free (WriteLine (
                                    sprintf "Hello, %s!" s,
                                    Pure ()))))))

This AST defines a little program. The first step is a WriteLine instruction with the input value "Please enter your name.". The WriteLine case constructor takes a tuple as input argument. The first tuple element is that prompt, and the second element is the continuation, which has to be a new CommandLineInstruction<CommandLineProgram<'a>> value.

In this example, the continuation value is a ReadLine case, which takes a continuation function as input. This function should return a new program value, which it does by returning a WriteLine.

This second WriteLine value creates a string from the outer value s. The second tuple element for the WriteLine case must, again, be a new program value, but now the program is done, so you can use the 'stop' value Pure ().

You probably think that I should quit the mushrooms. No one in their right mind will want to write code like this. Neither would I. Fortunately, you can make the coding experience much better, but you'll see how to do that later.

Interpretation #

The above program value is a small CommandLineProgram<unit>. It's a pure value. In itself, it doesn't do anything.

Clearly, we'd like it to do something. In order to make that happen, you can write an interpreter:

// CommandLineProgram<'a> -> 'a
let rec interpret = function
    | Pure x -> x
    | Free (ReadLine  next-> Console.ReadLine () |> next |> interpret
    | Free (WriteLine (s, next)) ->
        Console.WriteLine s
        next |> interpret

This interpreter is a recursive function that pattern-matches all the cases in any CommandLineProgram<'a>. When it encounters a Pure case, it simply returns the contained value.

When it encounters a ReadLine value, it calls Console.ReadLine (), which returns a string value read from the command line. It then pipes that input value to its next continuation function, which produces a new CommandLineInstruction<CommandLineProgram<'a>> value. Finally, it pipes that continuation value recursively to itself.

A similar treatment is given to the WriteLine case. Console.WriteLine s writes s to the command line, where after next is recursively piped to interpret.

When you run interpret program, you get an interaction like this:

Please enter your name.
ploeh
Hello, ploeh!

The program is pure; the interpret function is impure.

Syntactic sugar #

Clearly, you don't want to write programs as ASTs like the above. Fortunately, you don't have to. You can add syntactic sugar in the form of computation expressions. The way to do that is to turn your AST types into a monad. In Haskell, you'd already be done, because Free is a monad. In F#, some code is required.

Source functor #

The first step is to define a map function for the underlying instruction set union type. Conceptually, when you can define a map function for a type, you've created a functor (if it obeys the functor laws, that is). Functors are common, so it often pays off being aware of them.

// ('a -> 'b) -> CommandLineInstruction<'a> -> CommandLineInstruction<'b>
let private mapI f = function
    | ReadLine next -> ReadLine (next >> f)
    | WriteLine (x, next) -> WriteLine (x, next |> f)

The mapI function takes a CommandLineInstruction<'a> value and maps it to a new value by mapping the 'underlying value'. I decided to make the function private because later, I'm also going to define a map function for CommandLineProgram<'a>, and I don't want to confuse users of the API with two different map functions. This is also the reason that the name of the function isn't simply map, but rather mapI, where the I stands for instruction.

mapI pattern-matches on the (implicit) input argument. If it's a ReadLine case, it returns a new ReadLine value, but it uses the mapper function f to translate the next function. Recall that next is a function of the type string -> 'a. When you compose it with f (which is a function of the type 'a -> 'b), you get (string -> 'a) >> ('a -> 'b), or string -> 'b. You've transformed the 'a to a 'b for the ReadLine case. If you can do the same for the WriteLine case, you'll have a functor.

Fortunately, the WriteLine case is similar, although a small tweak is required. This case contains a tuple of data. The first element (x) isn't a generic type (it's a string), so there's nothing to map. You can use it as-is in the new WriteLine value that you'll return. The WriteLine case is degenerate because next isn't a function, but rather a value. It has a type of 'a, and f is a function of the type 'a -> 'b, so piping next to f returns a 'b.

That's it: now you have a functor.

(In order to keep the category theorists happy, I should point out that such functors are really a sub-type of functors called endo-functors. Additionally, functors must obey some simple and intuitive laws in order to be functors, but that's all I'll say about that here.)

Free monad #

There's a reason I spend so much time talking about functors. The goal is syntactic sugar. You can get that with computation expressions. In order to create a computation expression builder, you need a monad.

You need a recipe for creating a monad. Fortunately, there's a type of monad called a free monad. It has the virtue that it enables you to create a monad from any functor.

Just what you need.

In Haskell, this happens automatically when you declare type CommandLineProgram = Free CommandLineInstruction. Thanks to Haskell's type system, Free is automatically a Monad when the underlying type is a Functor. In F#, you have to work for your monads, but the fact that Haskell can automate this means that there's a recipe that you can follow.

Earlier in this article, I mentioned in passing that there are alternative ways in which you can define a 'stop' case for your instruction set. The reason I chose to separate the API into two types (an 'instruction set', and a 'program') is that the instruction set is the underlying functor. The 'program' is part of the free monad. The other part is a bind function (that obeys the monad laws).

// ('a -> CommandLineProgram<'b>) -> CommandLineProgram<'a>
// -> CommandLineProgram<'b>
let rec bind f = function
    | Free instruction -> instruction |> mapI (bind f) |> Free
    | Pure x -> f x

This recursive function pattern-matches on the (implicit) CommandLineProgram<'a> argument. In the Pure case, the 'return' value x has the type 'a, which fits as input for the f function. The result is a value of type CommandLineProgram<'b>.

In the Free case, the instruction is a functor with the map function mapI. The first argument to the mapI function must be a function with the type 'a -> 'b. How can you compose such a function?

If you partially apply the recursive bind function with f (that is: bind f), you get a function of the type CommandLineProgram<'a> -> CommandLineProgram<'b>. This fits with mapI, because instruction has the type CommandLineInstruction<CommandLineProgram<'a>> (refer back to the definition of the Free case if need to convince yourself of that). The result of calling mapI with instruction is a CommandLineInstruction<CommandLineProgram<'b>> value. In order to turn it into a CommandLineProgram<'b> value, you wrap it in a new Free case.

Although this required a bit of explanation, defining a bind function for a free monad is a repeatable process. After all, in Haskell it's automated. In F#, you have to explicitly write the code, but it follows a recipe. Once you get the hang of it, there's not much to it.

Functor #

You'll occasionally need to explicitly use the bind function, but often it'll 'disappear' into a computation expression. There are other building blocks to an API than a bind function, though. You may, for example, need a map function:

// ('a -> 'b) -> CommandLineProgram<'a> -> CommandLineProgram<'b>
let map f = bind (f >> Pure)

This makes CommandLineProgram<'a> a functor as well. This is the reason I made mapI private, because mapI makes the instruction set a functor, but the API is expressed in terms of AST programs, and it should be consistent. Within the same module, map should work on the same data type as bind.

Notice that map can be defined as a composition of bind and Pure. This is part of the recipe. For a free monad, the map function always looks like that. The f function is a function with the type 'a -> 'b, and Pure is a case constructor with the type 'b -> CommandLineProgram<'b>. Notice that I've used 'b for the generic type argument instead of the usual 'a. Hopefully, this makes it clear that when you compose these two functions together (f >> Pure), you get a function of the type ('a -> 'b) >> ('b -> CommandLineProgram<'b>), or 'a -> CommandLineProgram<'b>. That's just the type of function needed for the bind function, so the whole composition turns out to type-check and work as intended.

API #

In order to work with an API, you need the ability to create values of the API's type(s). In this case, you must be able to create CommandLineProgram<'a> values. While you can create them explicitly using the ReadLine, WriteLine, Free, and Pure case constructors, it'll be more convenient if you have some predefined functions and values for that.

// CommandLineProgram<string>
let readLine = Free (ReadLine Pure)
 
// string -> CommandLineProgram<unit>
let writeLine s = Free (WriteLine (s, Pure ()))

In the ReadLine case, there's no input to the instruction, so you can define readLine as a predefined CommandLineProgram<string> value.

The WriteLine case, on the other hand, takes as an input argument a string to write, so you can define writeLine as a function that returns a CommandLineProgram<unit> value.

Computation expression #

The addition of map and supporting API is, to be honest, a bit of digression. You're going to use these functions later, but they aren't required in order to create a computation expression builder. All you need is a bind function and a way to lift a raw value into the monad. All of these are in place, so the builder is a matter of delegation:

type CommandLineBuilder () =
    member this.Bind (x, f) = CommandLine.bind f x
    member this.Return x = Pure x
    member this.ReturnFrom x = x
    member this.Zero () = Pure ()

This is a fairly minimal builder, but in my experience, most of times, this is all you need.

Create an instance of the CommandLineBuilder class, and you can write computation expressions:

let commandLine = CommandLineBuilder ()

I usually put such an object in a module with an [<AutoOpen>] attribute, so that it's available as a global object.

Producing ASTs with pretty code #

Using the commandLine computation expression is like using the built-in async or seq expressions. You can use it to rewrite the above AST as readable code:

// CommandLineProgram<unit>
let program =
    commandLine {
        do!  CommandLine.writeLine "Please enter your name."
        let! name = CommandLine.readLine
        do!  sprintf "Hello, %s!" name |> CommandLine.writeLine }

This produces the same AST as before, but with much more readable syntax. The AST is the same, and you can use the above interpret function to run it. The interaction is the same as before:

Please enter your name.
Free
Hello, Free!

This is, obviously, a toy example, but in coming articles, you'll see how to gradually enhance the code to perform some more complex interactions.

Summary #

Functional programming emphasises pure functions, and a separation of pure and impure code. The simplest way to achieve such a separation is to design your code as an impure/pure/impure sandwich, but sometimes this isn't possible. When it's not possible, an alternative is to define an instruction set for an AST, and turn it into a free monad in order to enable enough syntactic sugar to keep the code readable.

While this may seem complicated, it has the benefit of making impurities explicit in the code. Whenever you see a CommandLineProgram value, you know that, at run-time, something impure is likely to happen. It's not uncontrolled impurity, though. Inside a CommandLineProgram, only reading from, and writing to, the command line will happen. It's not going to generate random values, change global variables, send an email, or any other unpredictable operation - that is, unless the interpreter does that...

Next: A pure command-line wizard.



Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Tuesday, 11 July 2017 12:48:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Tuesday, 11 July 2017 12:48:00 UTC