The catamorphism for a full binary tree is a pair of functions.

This article is part of an article series about catamorphisms. A catamorphism is a universal abstraction that describes how to digest a data structure into a potentially more compact value.

This article presents the catamorphism for a full binary tree, as well as how to identify it. The beginning of this article presents the catamorphism in C#, with examples. The rest of the article describes how to deduce the catamorphism. This part of the article presents my work in Haskell. Readers not comfortable with Haskell can just read the first part, and consider the rest of the article as an optional appendix.

A full binary tree (also known as a proper or plane binary tree) is a tree in which each node has either two or no branches.

A full binary tree example diagram, with each node containing integers.

The diagram shows an example of a tree of integers. The left branch contains two children, of which the right branch again contains two sub-branches. The rest of the nodes are leaf-nodes with no sub-branches.

C# catamorphism #

As a C# representation of a full binary tree, I'll start with the IBinaryTree<T> API from A Visitor functor. The catamorphism is the Accept method:

TResult Accept<TResult>(IBinaryTreeVisitor<TTResult> visitor);

So far in this article series, you've mostly seen Church-encoded catamorphisms, so a catamorphism represented as a Visitor may be too big of a cognitive leap. We know, however, from Visitor as a sum type that a Visitor representation is isomorphic to a Church encoding. Since these are isomorphic, it's possible to refactor IBinaryTree<T> to a Church encoding. The GitHub repository contains a series of commits that demonstrates how that refactoring works. Once you're done, you arrive at this Match method, which is the refactored Accept method:

TResult Match<TResult>(Func<TResultTTResultTResult> node, Func<TTResult> leaf);

This method takes a pair of functions as arguments. The node function deals with an internal node in the tree (the blue nodes in the above diagram), whereas the leaf function deals with the leaf nodes (the green nodes in the diagram).

The leaf function may be the easiest one to understand. A leaf node only contains a value of the type T, so the only operation the function has to support is translating the T value to a TResult value. This is also the premise of the Leaf class' implementation of the method:

private readonly T item;
 
public TResult Match<TResult>(Func<TResultTTResultTResult> node, Func<TTResult> leaf)
{
    return leaf(item);
}

The node function is more tricky. It takes three input arguments, of the types TResult, T, and TResult. The roles of these are respectively left, item, and right. This is a typical representation of a binary node. Since there's always a left and a right branch, you put the node's value in the middle. As was the case with the tree catamorphism, the catamorphism function receives the branches as already-translated values; that is, both the left and right branch have already been translated to TResult when node is called. While it looks like magic, as always it's just the result of recursion:

private readonly IBinaryTree<T> left;
private readonly T item;
private readonly IBinaryTree<T> right;
 
public TResult Match<TResult>(Func<TResultTTResultTResult> node, Func<TTResult> leaf)
{
    return node(left.Match(node, leaf), item, right.Match(node, leaf));
}

This is the Node<T> class implementation of the Match method. It calls node and returns whatever it returns, but notice that as the left and right arguments, if first, recursively, calls left.Match and right.Match. This is how it can call node with the translated branches, as well as with the basic item.

The recursion stops and unwinds on left and right whenever one of those are Leaf instances.

Examples #

You can use Match to implement most other behaviour you'd like IBinaryTree<T> to have. In the original article on the full binary tree functor you saw how to implement Select with a Visitor, but now that the API is Church-encoded, you can derive Select from Match:

public static IBinaryTree<TResult> Select<TResultT>(
    this IBinaryTree<T> tree,
    Func<TTResult> selector)
{
    if (tree == null)
        throw new ArgumentNullException(nameof(tree));
    if (selector == null)
        throw new ArgumentNullException(nameof(selector));
 
    return tree.Match(
        node: (l, x, r) => Create(l, selector(x), r),
        leaf: x => Leaf(selector(x)));
}

In the leaf case, the Select method simply calls selector with the x value it receives, and puts the resulting TResult object into a new Leaf object.

In the node case, the lambda expression receives three arguments: l and r are the already-translated left and right branches, so you only need to call selector on x and call the Create helper method to produce a new Node object.

You can also implement more specialised functionality, like calculating the sum of nodes, measuring the depth of the tree, and similar functions. You saw equivalent examples in the previous article.

For the examples in this article, I'll use the tree shown in the above diagram. Using static helper methods, you can write it like this:

var tree =
    BinaryTree.Create(
        BinaryTree.Create(
            BinaryTree.Leaf(42),
            1337,
            BinaryTree.Create(
                BinaryTree.Leaf(2112),
                5040,
                BinaryTree.Leaf(1984))),
        2,
        BinaryTree.Leaf(90125));

To calculate the sum of all nodes, you can write a function like this:

public static int Sum(this IBinaryTree<int> tree)
{
    return tree.Match((l, x, r) => l + x + r, x => x);
}

The leaf function just returns the value of the node, while the node function adds the numbers together. It works for the above tree:

> tree.Sum()
100642

To find the maximum value, you can write another extension method:

public static int Max(this IBinaryTree<int> tree)
{
    return tree.Match((l, x, r) => Math.Max(Math.Max(l, r), x), x => x);
}

Again, the leaf function just returns the value of the node. The node function receives the value of the current node x, as well as the already-found maximum value of the left branch and the right branch; it then returns the maximum of these three values:

> tree.Max()
90125

As was also the case for trees, both of these operations are part of the standard repertoire available via a data structure's fold. That's not the case for the next two functions, which can't be implemented using a fold, but which can be defined with the catamorphism. The first is a function to count the leaves of a tree:

public static int CountLeaves<T>(this IBinaryTree<T> tree)
{
    return tree.Match((l, _, r) => l + r, _ => 1);
}

Since the leaf function handles a leaf node, the number of leaf nodes in a leaf node is, by definition, one. Thus, that function can ignore the value of the node and always return 1. The node function, on the other hand, receives the number of leaf nodes on the left-hand side (l), the value of the current node, and the number of leaf nodes on the right-hand side (r). Notice that since an internal node is never a leaf node, it doesn't count; instead, just add l and r together. Notice that, again, the value of the node itself is irrelevant.

How many leaf nodes does the above tree have?

> tree.CountLeaves()
4

You can also measure the maximum depth of a tree:

public static int MeasureDepth<T>(this IBinaryTree<T> tree)
{
    return tree.Match((l, _, r) => 1 + Math.Max(l, r), _ => 0);
}

Like in the previous article, I've arbitrarily decided that the depth of a leaf node is zero; therefore, the leaf function always returns 0. The node function receives the depth of the left and right branches, and returns the maximum of those two values, plus one, since the current node adds one level of depth.

> tree.MeasureDepth()
3

You may not have much need for working with full binary trees in your normal, day-to-day C# work, but I found it worthwhile to include this example for a couple of reasons. First, because the original of the API shows that a catamorphism may be hiding in a Visitor. Second, because binary trees are interesting, in that they're foldable functors, but not monads.

Where does the catamorphism come from, though? How can you trust that the Match method is the catamorphism?

Binary tree F-Algebra #

As in the previous article, I'll use Fix and cata as explained in Bartosz Milewski's excellent article on F-Algebras.

As always, start with the underlying endofunctor. You can think of this one as a specialisation of the rose tree from the previous article:

data FullBinaryTreeF a c = LeafF a | NodeF c a c deriving (ShowEqRead)
 
instance Functor (FullBinaryTreeF a) where
  fmap _     (LeafF x) = LeafF x
  fmap f (NodeF l x r) = NodeF (f l) x (f r)

As usual, I've called the 'data' type a and the carrier type c (for carrier). The Functor instance as usual translates the carrier type; the fmap function has the type (c -> c1) -> FullBinaryTreeF a c -> FullBinaryTreeF a c1.

As was the case when deducing the recent catamorphisms, Haskell isn't too happy about defining instances for a type like Fix (FullBinaryTreeF a). To address that problem, you can introduce a newtype wrapper:

newtype FullBinaryTreeFix a =
  FullBinaryTreeFix { unFullBinaryTreeFix :: Fix (FullBinaryTreeF a) }
  deriving (ShowEqRead)

You can define Functor, Foldable, and Traversable instances (but not Monad) for this type without resorting to any funky GHC extensions. Keep in mind that ultimately, the purpose of all this code is just to figure out what the catamorphism looks like. This code isn't intended for actual use.

A pair of helper functions make it easier to define FullBinaryTreeFix values:

fbtLeafF :: a -> FullBinaryTreeFix a
fbtLeafF = FullBinaryTreeFix . Fix . LeafF
 
fbtNodeF :: FullBinaryTreeFix a -> a -> FullBinaryTreeFix a -> FullBinaryTreeFix a
fbtNodeF (FullBinaryTreeFix l) x (FullBinaryTreeFix r) = FullBinaryTreeFix $ Fix $ NodeF l x r

In order to distinguish these helper functions from the ones that create TreeFix a values, I prefixed them with fbt (for Full Binary Tree). fbtLeafF creates a leaf node:

Prelude Fix FullBinaryTree> fbtLeafF "fnaah"
FullBinaryTreeFix {unFullBinaryTreeFix = Fix (LeafF "fnaah")}

fbtNodeF is a helper function to create an internal node:

Prelude Fix FullBinaryTree> fbtNodeF (fbtLeafF 1337) 42 (fbtLeafF 2112)
FullBinaryTreeFix {unFullBinaryTreeFix = Fix (NodeF (Fix (LeafF 1337)) 42 (Fix (LeafF 2112)))}

The FullBinaryTreeFix type, or rather the underlying FullBinaryTreeF a functor, is all you need to identify the catamorphism.

Haskell catamorphism #

At this point, you have two out of three elements of an F-Algebra. You have an endofunctor (FullBinaryTreeF a), and an object c, but you still need to find a morphism FullBinaryTreeF a c -> c. Notice that the algebra you have to find is the function that reduces the functor to its carrier type c, not the 'data type' a. This takes some time to get used to, but that's how catamorphisms work. This doesn't mean, however, that you get to ignore a, as you'll see.

As in the previous articles, start by writing a function that will become the catamorphism, based on cata:

fullBinaryTreeF = cata alg . unFullBinaryTreeFix
  where alg     (LeafF x) = undefined
        alg (NodeF l x r) = undefined

While this compiles, with its undefined implementation of alg, it obviously doesn't do anything useful. I find, however, that it helps me think. How can you return a value of the type c from alg? You could pass a function argument to the fullBinaryTreeF function and use it with x:

fullBinaryTreeF fl = cata alg . unFullBinaryTreeFix
  where alg     (LeafF x) = fl x
        alg (NodeF l x r) = undefined

I called the function fl for function, leaf, because we're also going to need a function for the NodeF case:

fullBinaryTreeF :: (c -> a -> c -> c) -> (a -> c) -> FullBinaryTreeFix a -> c
fullBinaryTreeF fn fl = cata alg . unFullBinaryTreeFix
  where alg     (LeafF x) = fl x
        alg (NodeF l x r) = fn l x r

This works. Since cata has the type Functor f => (f a -> a) -> Fix f -> a, that means that alg has the type f a -> a. In the case of FullBinaryTreeF, the compiler infers that the alg function has the type FullBinaryTreeF a c -> c, which is just what you need!

You can now see what the carrier type c is for. It's the type that the algebra extracts, and thus the type that the catamorphism returns.

This, then, is the catamorphism for a full binary tree. As always, it's not the only possible catamorphism, since you can easily reorder the arguments to both fullBinaryTreeF, fn, and fl. These would all be isomorphic, though.

Basis #

You can implement most other useful functionality with treeF. Here's the Functor instance:

instance Functor FullBinaryTreeFix where
  fmap f = fullBinaryTreeF (\l x r -> fbtNodeF l (f x) r) (fbtLeafF . f)

The fl function first invokes f, followed by fbtLeafF. The fn function uses the fbtNodeF helper function to create a new internal node. l and r are already-translated branches, so you just need to call f with the node value x.

There's no Monad instance for binary trees, because you can't flatten a binary tree of binary trees. You can, on the other hand, define a Foldable instance:

instance Foldable FullBinaryTreeFix where
  foldMap f = fullBinaryTreeF (\l x r -> l <> f x <> r) f

The f function passed to foldMap has the type Monoid m => (a -> m), so the fl function that handles leaf nodes simply calls f with the contents of the node. The fn function receives two branches already translated to m, so it just has to call f with x and combine all the m values using the <> operator.

The Traversable instance follows right on the heels of Foldable:

instance Traversable FullBinaryTreeFix where
  sequenceA = fullBinaryTreeF (liftA3 fbtNodeF) (fmap fbtLeafF)

There are operations on binary trees that you can implement with a fold, but some that you can't. Consider the tree shown in the diagram at the beginning of the article. This is also the tree that the above C# examples use. In Haskell, using FullBinaryTreeFix, you can define that tree like this:

tree = 
  fbtNodeF
    (fbtNodeF
      (fbtLeafF 42)
      1337
      (fbtNodeF
        (fbtLeafF 2112)
        5040
        (fbtLeafF 1984)))
    2
    (fbtLeafF 90125)

Since FullBinaryTreeFix is Foldable, and that type class already comes with sum and maximum functions, no further work is required to repeat the first two of the above C# examples:

Prelude Fix FullBinaryTree> sum tree
100642
Prelude Fix FullBinaryTree> maximum tree
90125

Counting leaves, or measuring the depth of a tree, on the other hand, is impossible with the Foldable instance, but can be implemented using the catamorphism:

countLeaves :: Num n => FullBinaryTreeFix a -> n
countLeaves = fullBinaryTreeF (\l _ r -> l + r) (const 1)
 
treeDepth :: (Ord n, Num n) => FullBinaryTreeFix a -> n
treeDepth = fullBinaryTreeF (\l _ r -> 1 + max l r) (const 0)

The reasoning is the same as already explained in the above C# examples. The functions also produce the same results:

Prelude Fix FullBinaryTree> countLeaves tree
4
Prelude Fix FullBinaryTree> treeDepth tree
3

This, hopefully, illustrates that the catamorphism is more capable, and that the fold is just a (list-biased) specialisation.

Summary #

The catamorphism for a full binary tree is a pair of functions. One function handles internal nodes, while the other function handles leaf nodes.

I thought it was interesting to show this example for two reasons: First, the original example was a Visitor implementation, and I think it's worth realising that a Visitor's Accept method can also be viewed as a catamorphism. Second, a binary tree is an example of a data structure that has a fold, but isn't a monad.

All articles in the article series have, so far, covered data structures well-known from computer science. The next example will, on the other hand, demonstrate that even completely ad-hoc domain-specific data structures have catamorphisms.

Next: Payment types catamorphism.



Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Monday, 24 June 2019 06:00:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Monday, 24 June 2019 06:00:00 UTC