Collatz sequences by function composition

Mostly in C#, with a few lines of Haskell code.

A recent article elicited more comments than usual, and I've been so unusually buried in work that only now do I have a little time to respond to some of them. In one comment Struan Judd offers a refactored version of my Collatz sequence in order to shed light on the relationship between cyclomatic complexity and test case coverage.

Struan Judd's agenda is different from what I have in mind in this article, but the comment inspired me to refactor my own code. I wanted to see what it would look like with this constraint: It should be possible to test odd input numbers without exercising the code branches related to even numbers.

The problem with more naive implementations of Collatz sequence generators is that (apart from when the input is 1) the sequence ends with a tail of even numbers halving down to 1. I'll start with a simple example to show what I mean.

Standard recursion #

At first I thought that my confusion originated from the imperative structure of the original example. For more than a decade, I've preferred functional programming (FP), and even when I write object-oriented code, I tend to use concepts and patterns from FP. Thus I, naively, rewrote my Collatz generator as a recursive function:

public static IReadOnlyCollection<int> Sequence(int n)
{
    if (n < 1)
        throw new ArgumentOutOfRangeException(
            nameof(n),
            $"Only natural numbers allowed, but given {n}.");
 
    if (n == 1)
        return new[] { n };
    else
        if (n % 2 == 0)
            return new[] { n }.Concat(Sequence(n / 2)).ToArray();
        else
            return new[] { n }.Concat(Sequence(n * 3 + 1)).ToArray();
}

Recursion is usually not recommended in C#, because a sufficiently long sequence could blow the call stack. I wouldn't write production C# code like this, but you could do something like this in F# or Haskell where the languages offer solutions to that problem. In other words, the above example is only for educational purposes.

It doesn't, however, solve the problem that confused me: If you want to test the branch that deals with odd numbers, you can't avoid also exercising the branch that deals with even numbers.

Calculating the next value #

In functional programming, you solve most problems by decomposing them into smaller problems and then compose the smaller Lego bricks with standard combinators. It seemed like a natural refactoring step to first pull the calculation of the next value into an independent function:

public static int Next(int n)
{
    if ((n % 2) == 0)
        return n / 2;
    else
        return n * 3 + 1;
}

This function has a cyclomatic complexity of 2 and no loops or recursion. Test cases that exercise the even branch never touch the odd branch, and vice versa.

A parametrised test might look like this:

[Theory]
[InlineData( 2,  1)]
[InlineData( 3, 10)]
[InlineData( 4,  2)]
[InlineData( 5, 16)]
[InlineData( 6,  3)]
[InlineData( 7, 22)]
[InlineData( 8,  4)]
[InlineData( 9, 28)]
[InlineData(10,  5)]
public void NextExamples(int n, int expected)
{
    int actual = Collatz.Next(n);
    Assert.Equal(expected, actual);
}

The NextExamples test obviously defines more than the two test cases that are required to cover the Next function, but since code coverage shouldn't be used as a target measure, I felt that more than two test cases were warranted. This often happens, and should be considered normal.

A Haskell proof of concept #

While I had a general idea about the direction in which I wanted to go, I felt that I lacked some standard functional building blocks in C#: Most notably an infinite, lazy sequence generator. Before moving on with the C# code, I threw together a proof of concept in Haskell.

The next function is just a one-liner (if you ignore the optional type declaration):

next :: Integral a => a -> a
next n = if even n then n `div` 2 else n * 3 + 1

A few examples in GHCi suggest that it works as intended:

ghci> next 2
1
ghci> next 3
10
ghci> next 4
2
ghci> next 5
16

Haskell comes with enough built-in functions that that was all I needed to implement a Colaltz-sequence generator:

collatz :: Integral a => a -> [a]
collatz n = (takeWhile (1 <) $ iterate next n) ++ [1]

Again, a few examples suggest that it works as intended:

ghci> collatz 1
[1]
ghci> collatz 2
[2,1]
ghci> collatz 3
[3,10,5,16,8,4,2,1]
ghci> collatz 4
[4,2,1]
ghci> collatz 5
[5,16,8,4,2,1]

I should point out, for good measure, that since this is a proof of concept I didn't add a Guard Clause against zero or negative numbers. I'll keep that in the C# code.

Generator #

While C# does come with a TakeWhile function, there's no direct equivalent to Haskell's iterate function. It's not difficult to implement, though:

public static IEnumerable<T> Iterate<T>(Func<T, T> f, T x)
{
    var current = x;
    while (true)
    {
        yield return current;
        current = f(current);
    }
}

While this Iterate implementation has a cyclomatic complexity of only 2, it exhibits the same kind of problem as the previous attempts at a Collatz-sequence generator: You can't test one branch without testing the other. Here, it even seems as though it's impossible to test the branch that skips the loop.

In Haskell the iterate function is simply a lazily-evaluated recursive function, but that's not going to solve the problem in the C# case. On the other hand, it helps to know that the yield keyword in C# is just syntactic sugar over a compiler-generated Iterator.

Just for the exercise, then, I decided to write an explicit Iterator instead.

Iterator #

For the sole purpose of demonstrating that it's possible to refactor the code so that branches are independent of each other, I rewrote the Iterate function to return an explicit IEnumerable<T>:

public static IEnumerable<T> Iterate<T>(Func<T, T> f, T x)
{
    return new Iterable<T>(f, x);
}

The Iterable<T> class is a private helper class, and only exists to return an IEnumerator<T>:

private sealed class Iterable<T> : IEnumerable<T>
{
    private readonly Func<T, T> f;
    private readonly T x;
 
    public Iterable(Func<T, T> f, T x)
    {
        this.f = f;
        this.x = x;
    }
 
    public IEnumerator<T> GetEnumerator()
    {
        return new Iterator<T>(f, x);
    }
 
    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

The Iterator<T> class does the heavy lifting:

private sealed class Iterator<T> : IEnumerator<T>
{
    private readonly Func<T, T> f;
    private readonly T original;
    private bool iterating;
 
    internal Iterator(Func<T, T> f, T x)
    {
        this.f = f;
        original = x;
        Current = x;
    }
 
    public T Current { get; private set; }
 
    [MaybeNull]
    object IEnumerator.Current => Current;
 
    public void Dispose()
    {
    }
 
    public bool MoveNext()
    {
        if (iterating)
            Current = f(Current);
        else
            iterating = true;
        return true;
    }
 
    public void Reset()
    {
        Current = original;
        iterating = false;
    }
}

I can't think of a situation where I would write code like this in a real production code base. Again, I want to stress that this is only an exploration of what's possible. What this does show is that all members have low cyclomatic complexity, and none of them involve looping or recursion. Only one method, MoveNext, has a cyclomatic complexity greater than one, and its branches are independent.

Composition #

All Lego bricks are now in place, enabling me to compose the Sequence like this:

public static IReadOnlyCollection<int> Sequence(int n)
{
    if (n < 1)
        throw new ArgumentOutOfRangeException(
            nameof(n),
            $"Only natural numbers allowed, but given {n}.");
 
    return Generator.Iterate(Next, n).TakeWhile(i => 1 < i).Append(1).ToList();
}

This function has a cyclomatic complexity of 2, and each branch can be exercised independently of the other.

Which is what I wanted to accomplish.

Conclusion #

I'm still re-orienting myself when it comes to understanding the relationship between cyclomatic complexity and test coverage. As part of that work, I wanted to refactor the Collatz code I originally showed. This article shows one way to decompose and reassemble the function in such a way that all branches are independent of each other, so that each can be covered by test cases without exercising the other branch.

I don't know if this is useful to anyone else, but I found the hours well-spent.

Comments

Tyson Williams #

I really like this article. So much so that I tried to implement this approach for a recursive function at my work. However, I realized that there are some required conditions.

First, the recusrive funciton must be tail recursive. Second, the recursive function must be closed (i.e. the output is a subset/subtype of the input). Neither of those were true for my function at work. An example of a function that doesn't satisfy either of these conditions is the function that computes the depth of a tree.

A less serious issue is that your code, as currently implemented, requires that there only be one base case value. The issue is that you have duplicated code: the unique base case value appears both in the call to TakeWhile and in the subsequent call to Append. Instead of repeating yourself, I recommend defining an extension method on Enumerable called TakeUntil that works like TakeWhile but also returns the first value on which the predicate returned false. Here is an implementation of that extension method.

2023-06-22 13:45 UTC

Mark Seemann #

Tyson, thank you for writing. I suppose that you can't share the function that you mention, so I'll have to discuss it in general terms.

As far as I can tell you can always(?) refactor non-tail-recursive functions to tail-recursive implementations. In practice, however, there's rarely need for that, since you can usually separate the problem into a general-purpose library function on the one hand, and your special function on the other. Examples of general-purpose functions are the various maps and folds. If none of the standard functions do the trick, the type's associated catamorphism ought to.

One example of that is computing the depth of a tree, which we've already discussed.

I don't insist that any of this is universally true, so if you have another counter-example, I'd be keen to see it.

You are, of course, right about using a TakeUntil extension instead. I was, however, trying to use as many built-in components as possible, so as to not unduly confuse casual readers.

2023-06-27 12:35 UTC

Published: Monday, 12 June 2023 05:27:00 UTC

Collatz sequences by function composition by Mark Seemann