Tail Recurse

Tips on refactoring recursive functions to tail-recursive functions.

In a recent article, I described how to refactor an imperative loop to a recursive function. If you're coming from C# or Java, however, you may have learned to avoid recursion, since it leads to stack overflows when the recursion is too deep.

In some Functional languages, like F# and Haskell, such stack overflows can be prevented with tail recursion. If the last function call being made is a recursive call to the function itself, the current stack frame can be eliminated because execution is effectively completed. This allows a tail-recursive function to keep recursing without ever overflowing.

When you have a recursive function that's not tail recursive, however, it can sometimes be difficult to figure out how to refactor it so that it becomes tail recursive. In this article, I'll outline some techniques I've found to be effective.

Introduce an accumulator #

It seems as though the universal trick related to recursion is to introduce an accumulator argument. This is how you refactor an imperative loop to a recursive function, but it can also be used to make a non-tail-recursive function tail recursive. This will also require you to introduce an 'implementation function' that does the actual work.

Example: discarding candidate points when finding a convex hull #

In previous articles I've described various problems I had when implementing the Graham Scan algorithm to find the convex hull of a set of points.

One of my functions examined the three last points in a sequence of points in order to determine if the next-to-last point is in the interior of the set, or if that point could potentially be on the hull. If the point is positively known to be in the interior of the set, it should be discarded. At one stage, the function was implemented this way:

let tryDiscard points =
    let rec tryDiscardImp = function
        | [p1; p2; p3] when turn p1 p2 p3 = Direction.Right -> [p1; p3]
        | [p1; p2; p3] -> [p1; p2; p3]
        | p :: ps -> p :: tryDiscardImp ps
        | [] -> []
    let newPoints = tryDiscardImp points
    if newPoints.Length <> points.Length
    then Some newPoints
    else None

This function was earlier called check, which is the name used in the article about refactoring to recursion. The tryDiscard function is actually an inner function in a more complex function that's defined with the inline keyword, so the type of tryDiscard is somewhat complicated, but think of it as having the type (int * int) list -> (int * int) list option. If a point was discarded, the new, reduced list of points is returned in a Some case; otherwise, None is returned.

The tryDiscard function already has an 'implementation function' called tryDiscardImp, but while tryDiscardImp is recursive, it isn't tail recursive. The problem is that in the p :: ps case, the recursive call to tryDiscardImp isn't the tail call. Rather, the stack frame has to wait for the recursive call tryDiscardImp ps to complete, because only then can it cons p onto its return value.

Since an 'implementation function' already exists, you can make it tail recursive by adding an accumulator argument to tryDiscardImp:

let tryDiscard points =
    let rec tryDiscardImp acc = function
        | [p1; p2; p3] when turn p1 p2 p3 = Direction.Right ->
            acc @ [p1; p3]
        | [p1; p2; p3] -> acc @ [p1; p2; p3]
        | p :: ps -> tryDiscardImp (acc @ [p]) ps
        | [] -> acc
    let newPoints = tryDiscardImp [] points
    if newPoints.Length <> points.Length
    then Some newPoints
    else None

As you can see, I added the acc argument to tryDiscardImp; it has the type (int * int) list (again: not really, but close enough). Instead of returning from each case, the tryDiscardImp function now appends points to acc until it reaches the end of the list, which is when it returns the accumulator. The p :: ps case now first appends the point in consideration to the accumulator (acc @ [p]), and only then recursively calls tryDiscardImp. This puts the call to tryDiscardImp in the tail position.

The repeated use of the append operator (@) is terribly inefficient, though, but I'll return to this issue later in this article. For now, let's take a step back.

Example: implementing map with recursion #

A common exercise for people new to Functional Programming is to implement a map function (C# developers will know it as Select) using recursion. This function already exists in the List module, but it can be enlightening to do the exercise.

An easy, but naive implementation is only two lines of code, using pattern matching:

let rec mapNaive f = function
    | [] -> []
    | h::t -> f h :: mapNaive f t

Is mapNaive tail recursive? No, it isn't. The last operation happening is that f h is consed unto the return value of mapNaive f t. While mapNaive f t is a recursive call, it's not in the tail position. For long lists, this will create a stack overflow.

How can you create a tail-recursive map implementation?

Example: inefficient tail-recursive map implementation #

According to my introduction, adding an accumulator and an 'implementation' function should do the trick. Here's the straightforward application of that technique:

let mapTailRecursiveUsingAppend f xs =
    let rec mapImp f acc = function
        | [] -> acc
        | h::t -> mapImp f (acc @ [f h]) t
    mapImp f [] xs

The mapImp function does the actual work, and it's tail recursive. It appends the result of mapping the head of the list unto the accumulator: acc @ [f h]. Only then does it recursively call itself with the new accumulator and the tail of the list.

While this version is tail recursive, it's horribly inefficient, because appending to the tail of a (linked) list is inefficient. In theory, this implementation would never result in a stack overflow, but the question is whether anyone has the patience to wait for that to happen?

> mapTailRecursiveUsingAppend ((*) 2) [1..100000];;
Real: 00:02:46.104, CPU: 00:02:44.750, GC gen0: 13068, gen1: 6380, gen2: 1
val it : int list =
  [2; 4; 6; 8; 10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30; 32; 34; 36; ...]

Doubling 100,000 integers this way takes nearly 3 minutes on my (admittedly mediocre) laptop. A better approach is required.

Example: efficient tail-recursive map implementation #

The problem with mapTailRecursiveUsingAppend is that appending to a list is slow when the left-hand list is long. This is because lists are linked lists, so the append operation has to traverse the entire list and copy all the element to link to the right-hand list.

Consing a single item unto an existing list, on the other hand, is efficient:

let mapTailRecursiveUsingRev f xs =
    let rec mapImp f acc = function
        | [] -> acc
        | h::t -> mapImp f (f h :: acc) t
    mapImp f [] xs |> List.rev

This function conses unto the accumulator (f h :: acc) instead of appending to the accumulator. The only problem with this is that acc is in reverse order compared to the input, so the final step must be to reverse the output of mapImp. While there's a cost to reversing a list, you pay it only once. In practice, it turns out to be efficient:

> mapTailRecursiveUsingRev ((*) 2) [1..100000];;
Real: 00:00:00.017, CPU: 00:00:00.015, GC gen0: 1, gen1: 0, gen2: 0
val it : int list =
  [2; 4; 6; 8; 10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30; 32; 34; 36; ...]

From nearly three minutes to 17 milliseconds! That's a nice performance improvement.

The only problem, from a point of view where learning is in focus, is that it feels a bit like cheating: we've delegated an important step in the algorithm to List.rev. If we think it's OK to use the library functions, we could have directly used List.map. The whole point of this exercise, however, is to learn something about how to write tail-recursive functions.

At this point, we have two options:

Learn how to write an efficient, tail-recursive implementation of rev.
Consider alternatives.

Ultimately, it turns out that List.rev is implemented in a fashion similar to above, using an 'implementation function' and an accumulator. Let us, therefore, turn our attention to another alternative.

Example: efficient tail-recursive map using a difference list #

The mapTailRecursiveUsingAppend function is attractive because of its simplicity. If only there was an efficient way to append a single item to the tail of a (long) list, like acc appendSingle (f h)! (appendSingle is a hypothetical function that we wish existed.)

So far, we've treated data as data, and functions as functions, but in Functional Programming, functions are data as well!

What if we could partially apply a cons operation?

Unfortunately, the cons operator (::) can't be used as a function, so you'll have to introduce a little helper function:

// 'a -> 'a list -> 'a list
let cons x xs = x :: xs

This enables you to partially apply a cons operation:

> cons 1;;
val it : (int list -> int list) = <fun:it@4-5>

The expression cons 1 is a function that awaits an int list argument. You can, for example, call it with the empty list, or another list:

> (cons 1) [];;
val it : int list = [1]
> (cons 1) [2];;
val it : int list = [1; 2]

That hardly seems useful, but what happens if you start composing such partially applied functions?

> cons 1 >> cons 2;;
val it : (int list -> int list) = <fun:it@7-8>

Notice that the result is another function with the same signature as cons 1. A way to read it is: cons 1 is a function that takes a list as input, appends the list after 1, and returns that new list. The return value of cons 1 is passed to cons 2, which takes that input, appends that list after 2, and returns that list. Got it? Try it out:

> (cons 1 >> cons 2) [];;
val it : int list = [2; 1]

Not what you expected? Try going through the data flow again. The input is the empty list ([]), which, when applied to cons 1 produces [1]. That value is then passed to cons 2, which puts 2 at the head of [1], yielding the final result of [2; 1].

This still doesn't seem to help, because it still reverses the list. True, but you can reverse the composition:

> (cons 2 >> cons 1) [];;
val it : int list = [1; 2]
> (cons 1 << cons 2) [];;
val it : int list = [1; 2]

Notice that in the first line, I reversed the composition by changing the order of partially applied functions. This, however, is equivalent to keeping the order, but using the reverse composition operator (<<).

You can repeat this composition:

> (cons 1 << cons 2 << cons 3 << cons 4) [];;
val it : int list = [1; 2; 3; 4]

That's exactly what you need, enabling you to write acc << (cons (f h)) in order to efficiently append a single element to the tail of a list!

let mapTailRecursiveUsingDifferenceList f xs =
    let cons x xs = x :: xs
    let rec mapImp f acc = function
        | [] -> acc []
        | h::t -> mapImp f (acc << (cons (f h))) t
    mapImp f id xs

This implementation of map uses this technique, introduced by John Hughes in 1985. Real World Haskell calls the technique a difference list, without explaining why.

This mapImp function's accumulator is no longer a list, but a function. For every item, it composes a new accumulator function from the old one, effectively appending the mapped item to the tail of the accumulated list. Yet, because acc isn't a list, but rather a function, the 'append' operation doesn't trigger a list traversal.

When the recursive function finally reaches the end of the list (the [] case), it invokes the acc function with the empty list ([]) as the initial input.

This implementation is also tail recursive, because the accumulator is being completely composed (acc << (cons (f h))) before mapImp is recursively invoked.

Is it efficient, then?

> mapTailRecursiveUsingDifferenceList ((*) 2) [1..100000];;
Real: 00:00:00.024, CPU: 00:00:00.031, GC gen0: 1, gen1: 0, gen2: 0
val it : int list =
  [2; 4; 6; 8; 10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30; 32; 34; 36; ...]

24 milliseconds is decent. It's not as good as mapTailRecursiveUsingRev (17 milliseconds), but it's close.

In practice, you'll probably find that mapTailRecursiveUsingRev is not only more efficient, but also easier to understand. The advantage of using the difference list technique, however, is that now mapImp has a shape that almost screams to be refactored to a fold.

Example: implementing map with fold #

The mapImp function in mapTailRecursiveUsingDifferenceList almost has the shape required by the accumulator function in List.fold. This enables you to rewrite mapImp using fold:

let mapUsingFold f xs =
    let cons x xs = x :: xs
    let mapImp = List.fold (fun acc h -> acc << (cons (f h))) id
    mapImp xs []

As usual in Functional Programming, the ultimate goal seems to be to avoid writing recursive functions after all!

The mapUsingFold function is as efficient as mapTailRecursiveUsingDifferenceList:

> mapUsingFold ((*) 2) [1..100000];;
Real: 00:00:00.025, CPU: 00:00:00.031, GC gen0: 2, gen1: 1, gen2: 1
val it : int list =
  [2; 4; 6; 8; 10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30; 32; 34; 36; ...]

Not only does 25 milliseconds seem fast, but it's also comparable with the performance of the built-in map function:

> List.map ((*) 2) [1..100000];;
Real: 00:00:00.011, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
val it : int list =
  [2; 4; 6; 8; 10; 12; 14; 16; 18; 20; 22; 24; 26; 28; 30; 32; 34; 36; ...]

Granted, List.map seems to be twice as fast, but it's also been the subject of more scrutiny than the above fun exercise.

Summary #

In Functional Programming, recursive functions take the place of imperative loops. In order to be efficient, they must be tail recursive.

You can make a function tail recursive by introducing an accumulator argument to an 'implementation function'. This also tends to put you in a position where you can ultimately refactor the implementation to use a fold instead of explicit recursion.

Comments

Tyson Williams #

I verified the behavior you showed for mapTailRecursiveUsingDifferenceList and mapUsingFold: when executed in the F# Interactive window, both functions have the correct behavior. I further verified that both functions have the same correct behavior when executed as a unit test compiled in RELEASE mode. However, when executed as a unit test compiled in DEBUG mode, both tests fail (technically aborted when using XUnit) because each overflows the stack. In contrast, List.map has the correct behavior in all three situations.

I haven't tested the same in Haskell. I don't even know if Haskell has complication modes like DEBUG and RELEASE.

Do you know how to implement map for list in F# so that it does not stack overflow the stack even when executed in DEBUG mode? Of course your function mapTailRecursiveUsingRev achieves that goal. The problem that I actually want to solve is figure out how to implement map for the type 'a -> 'b as a (covariant) functor in 'b in such a way that the stack does not overflow. See this issue for more details. I am thinking that seeing such a map function implemented for list without using List.rev will help me figure out how to correctly implement this for the type 'a -> 'b as a (covariant) functor in 'b.

2021-01-24 17:21 UTC

Tyson Williams #

Ultimately, it turns out that List.rev is implemented in a fashion similar to above, using an 'implementation function' and an accumulator.

Oh, that code partially answers my question. First, that is legacy code now. The current version is hosted in the dotnet/fsharp repository. Then List.map delegates to Microsoft.FSharp.Primitives.Basics.List.map, which, by way of mapToFreshConsTail calls the impure function setFreshConsTail that is allowed to mutate the tail pointer of a cons cell.

Not only does 25 milliseconds seem fast [for mapUsingFold], but it's also comparable with the performance of the built-in map function:

[...]

Granted, List.map seems to be twice as fast, but it's also been the subject of more scrutiny than the above fun exercise.

List.map is "twice as fast" because it does "half as much work". mapUsingFold calls List.fold, which executes in linear time, to essentially compute the elements that will be in the final list. Then it calls mapImp, which also executes in linear time, to construct that final list. In contrast, List.map does both of those in one pass with its ability to mutate the tail pointer of a cons cell.

I wonder if I can use mutability in a similarly pure way to achieve my goal. **Goes away to think**

2021-01-24 19:20 UTC

Tyson Williams #

To avoid overflowing the stack, it is a necessary condition that all recursive calls are tail calls. However, this is not sufficient. In mapTailRecursiveUsingDifferenceList, it is crucial (at least in F#) that the function composition acc << (cons (f h)) (which I prefer to write as (cons (f h)) >> acc) happened in that order. If John Hughes had designed the code differently so that the composition was acc >> (cons (f h)) (which, as you demonstrated, would be the correct output if only this change were made except with the order reversed), then the stack would still overflow even though all recursive fucntions only involve tail calls. See this comment for more information.

2021-01-25 04:43 UTC

Tyson Williams #

I figured out how to use the difference list technique to avoid overflowing the stack when composing on the right. See this comment. I wonder if my implementation can be simplified. This problem of avoiding stack overflows when composing on the right has to be a standard problem with a well-known answer.

Thanks for being a great place for me to share this partial stream of consciousness that I have had over the last couple days.

P.S. You can simplify mapTailRecursiveUsingDifferenceList a bit. The local function mapImp is always given the same value of f. Therefore, this arguemnt can be removed (since the same f is also in scope). The resulting code does not overflow the stack (which, again, is a stronger statement than staying that all recursive calls in mapImp are tail calls). You essentially made this simplification when you implemented mapUsingFold.

2021-01-26 05:34 UTC

Mark Seemann #

Tyson, thank you for writing. I have to admit that I've now lost track... Did you manage to unravel all you wanted to know by yourself, or is there still a question left?

2021-01-26 16:41 UTC

Published: Tuesday, 22 December 2015 09:11:00 UTC

Tail Recurse by Mark Seemann

Introduce an accumulator #

Example: discarding candidate points when finding a convex hull #

Example: implementing map with recursion #

Example: inefficient tail-recursive map implementation #

Example: efficient tail-recursive map implementation #

Example: efficient tail-recursive map using a difference list #

Example: implementing map with fold #

Summary #

Comments

Wish to comment?

Published

Tags