How to refactor from a loop using mutable state to a recursive implementation.

One of the most compelling reasons to adopt Functional Programming (FP) is the emphasis on immutable values. All the dread and angst associated with state that can implicitly change while you're not looking, is gone.

One of the most frustrating aspects of FP for people coming from other paradigms is the emphasis on immutable values. You know that you ought to be able to implement a given algorithm using immutable values, but given your background in Object-Oriented Programming, you can't quite figure out how to do it.

In FP, loops are implemented with recursion, and mutable values are replaced with accumulator arguments. This article describes how to refactor from a procedural, mutable loop implementation to a pure, tail-recursive implementation.

Motivation #

You want your loop implementation to be 'Functional' instead of procedural. There can be many reasons for this. Perhaps you want to learn FP. Perhaps you want to eliminate mutable state in order to make your implementation thread-safe. Perhaps you think that getting rid of mutation will make the code more readable and maintainable. Perhaps you want to port your implementation to a language that doesn't support mutability at all (like Haskell).

Mechanics #

Start with a procedural implementation, using a mutable loop variable. This obviously only works in multi-paradigmatic languages where mutable variables are possible, even if they're not ideal. Examples include F#, Scala, and Clojure, but not Haskell, which isn't multi-paradigmatic.

  1. Instead of your imperative loop, introduce a recursive function.
  2. Replace each mutable loop variable with an argument for the recursive function.
Ultimately, your refactored implementation should be tail-recursive, but you can always address that concern once you've refactored to recursion.

Example: backspace characters #

Imagine that you receive a stream of characters from someone typing on a keyboard. Sometimes, the typist mistypes and uses the backspace key, which sends the character '\b'. Whenever you encounter '\b', you should remove the preceding character, as well as the backspace character itself. This example is based on this Stack Overflow question.

The original F# implementation is procedural, using a for loop and a single mutable variable:

open System
open System.Collections.Generic
 
let handleBackspaces textToProcess : string =
    let stack = Stack<char>()
    for c in textToProcess do
        if c = '\b' then stack.Pop() |> ignore
        else stack.Push c
    stack |> Seq.rev |> Seq.toArray |> String

While this implementation doesn't explicitly use the mutable keyword, the stack variable is mutable because Stack<T> is mutable. Since textToProcess is a string, and string implements IEnumerable<char>, you can loop over each char value, pushing the value on the stack unless it's a backspace character; in that case, the previous value is instead popped and thrown away.

According to the rules of the Recurse refactoring, you should introduce a recursive function instead of the loop, and add an argument that will replace the stack. To make it easy, call the recursive function imp, and the function argument acc. The name acc is popular; it's short for accumulator. This argument is used to accumulate the final value, just like stack in the above example.

let handleBackspaces' textToProcess : string =
    let rec imp acc = function
        | [] -> acc
        | '\b'::cs -> imp (acc |> List.tail) cs
        | c::cs -> imp (c::acc) cs
    textToProcess |> Seq.toList |> imp [] |> List.rev |> List.toArray |> String

The imp function is declared with the rec keyword to mark it as a recursive function. It has the type char list -> char list -> char list. acc is a char list, which is also the case for the second argument implied by the function keyword. The function returns the accumulator if the input list is empty; otherwise, it matches on the head of the list. If the head is explicitly the backspace character, the imp function calls itself recursively, but replaces acc with the tail of acc; it uses List.tail to get the tail. This effectively removes the most recent character from the accumulator. In all other cases, the function also calls itself by consing c on acc.

The imp function is tail-recursive, since all other values are already computed when a recursive call to imp takes place.

You can further refactor this implementation to use a fold instead of a recursive function.

Example: Graham Scan #

This second example is a more complex example that further illustrates how to apply the Recurse refactoring. If you already feel that you understand how to apply this refactoring, you can skip reading the following example.

Some time ago, I was attempting to implement the Graham Scan algorithm to find the convex hull for a set of points. As I've described before, this turned out to be quite difficult for me. One of my problems was that I was trying to implement the algorithm in Haskell, and while I understood the algorithm, I couldn't figure out how to implement it in a functional way - and when you can't do it the functional way, you can't do it at all in Haskell.

In F#, on the other hand, you can implement algorithms using procedural code if you must, so I decided to implement the algorithm in F# first, using a procedural approach, and then subsequently figure out how to refactor to immutable values. Once I had a pure F# implementation, I could always back-port it to Haskell.

This is one of the reasons I think F# is a better language for learning FP if you come from an Object-Oriented background: you can gradually refactor towards more Functional implementations as you become better at FP.

The part that caused me particular difficulty was the scan part, where the algorithm examines all points to identify which points to discard from the hull (because they're in the interior of the hull, and not on the hull).

After sorting all candidates according to special rules, the algorithm must consider each point in turn. If that new point is 'to the right' of the previous two points, the previous point is in the interior of the hull and should be discarded. The previous-to-previous point could also be 'to the right' of the new point, so the algorithm needs to check again, and so on.

My imperative solution looked like this:

let inline hullPoints points =
    let mutable ps = []
    for p in points do
        ps <- ps @ [p]
        let mutable shouldCheck = true
        while shouldCheck do
            let wasDiscarded, newPoints = check ps
            shouldCheck <- wasDiscarded
            if wasDiscarded then ps <- newPoints
    ps

(You can see the full code base on GitHub. The start is at 5290abd3c31c162ee6c4b21b82494ce97ecf7fa5, and the end state that this post describes is at e3efd1b457a46112cff6f06b8cbb100d153f0ef1.)

Due to the inline keyword, the hullPoints function has a complex type, but for practical purposes, think of it as having the type (int * int) seq -> (int * int) list. The points argument is a sequence of coordinates: (int * int) seq.

As you can see, this implementation has a nested loop. The outer loop traverses all points and appends the point in consideration to the mutable list variable ps. At this stage, p is only a candidate. What the algorithm must now determine is whether p is in the interior or might be a hull point.

In order to do that, it calls another function called check. The check function is another inline function, but you can think about it as having the type (int * int) list -> bool * (int * int) list. The return type is peculiar, but the idea is that it returns true in the first tuple element if points were discarded from the input, and false if no points were discarded. The second tuple element contains the points (that may or may not have had points removed compared to the input points). (I later refactored this function to a function called tryDiscard with the type (int * int) list -> (int * int) list option.)

If points were discarded, the algorithm must check again, as there may be more points to discard. Only when no more points were discarded can the outer loop move on to the next candidate.

According to the Recurse refactoring, you need to define a recursive function for each loop. There are two loops here, but do the inner loop first. Each mutable variable should be replaced with a function argument, but fortunately there's only one:

let inline hullPoints points =
    let rec update candidates =
        let wasDiscarded, newCandidates = check candidates
        if wasDiscarded
        then update newCandidates
        else candidates
 
    let mutable candidates = []
    for p in points do
        candidates <- candidates @ [p]
        candidates <- update candidates
    candidates

The new update function calls the check function. If wasDiscarded is true, it calls itself recursively with the new candidates; otherwise, it returns the input candidates.

The update function is now a recursive function without mutable variables, but the containing hullPoints function still has a mutable candidates variable. You'll need to apply the Recursive refactoring again:

let hullPoints points =
    let rec update candidates =
        let wasDiscarded, newCandidates = check candidates
        if wasDiscarded
        then update newCandidates
        else candidates
 
    let rec hpImp candidates = function
        | [] -> candidates
        | p :: tail ->
            let cs = candidates @ [p]
            let updatedCandidates = update cs
            hpImp updatedCandidates tail
        
    hpImp [] points

The hpImp function replaces the remaining loop, and candidates is now a function argument instead of mutable variable.

As long as candidates has contents, the head of the list p is appended to candidates, and update is invoked. Subsequently, hpImp is invoked recursively with the updated candidates and the tail of the list.

The hullPoints function returns the value of calling hpImp with empty hull candidates, and the points argument. This implementation has no mutable variables.

You can refactor this implementation to make it more readable, but that's not the point of this article. You can see what I then did in the GitHub repository.

(Now that I had a pure implementation, I could also port it to Haskell, which met my original goal.)

Summary #

For programmers used to imperative programming, it can be difficult to apply pure Functional techniques. When you have loops that update a mutable variable in each step of the loop, you can refactor to use a recursive function; the mutable variable is replaced with a function argument. When you've tried it a couple of times, you'll get the hang of it.

Once you have a recursive function with an accumulator argument, you can often further refactor it to use a fold, instead of a recursive function.

(This post is the December 1st entry in the 2015 F# Advent Calendar.)



Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Tuesday, 01 December 2015 09:12:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Tuesday, 01 December 2015 09:12:00 UTC