Threading context through a catamorphism

A problem solved after 1½ years.

You've probably noticed that it's easier to learn something new if it looks or sounds like something you already know. As a native Dane, I've found it easier to learn English and German than Russian and Japanese. If you originally were a Java or C# developer, you probably find JavaScript more approachable than Clojure or APL.

I believe that this extends to design patterns and universal abstractions as well. If code new to you follows well-known abstractions, it may be easier to learn than if it's structured in an entirely ad-hoc manner. This is my motivation for learning such universal abstractions as monoids, functors, and catamorphisms.

I particularly enjoy when it's possible to apply such abstractions to a proper problem. This occasionally happens. One example is my small article series on a functional file system.

A fly in the ointment #

In those articles, I described how you could base most of the code on the rose tree catamorphism. There was just one snag. There was one function, calculateMoves, that I was unable to implement with the catamorphism. In the article, I acknowledged my failure:

"Earlier, I wrote that you can implement desired Tree functionality with the foldTree function, but that was a simplification. If you can implement the functionality of calculateMoves with foldTree, I don't know how."

This was true for both the Haskell proof of concept as well as the F# port.

Tyson Williams and I discussed this wart without getting closer to a solution.

As the idiom goes, perfect is the enemy of good, so I decided to move on, although it nagged me.

Problem, condensed #

The problem with the calculateMoves function was that it needed to thread a 'context' recursively through the entire data structure. In this case, the context was a file path.

When calculateMoves runs over the input tree, it needs to thread a relative path through the function, building it up as it traverses the data structure.

For example, if a leaf node named 1 is in a directory named b, which itself is a subdirectory of a, the relative path should be a/b/1. This example is straight from the test cases shown in both articles. You can also find the tests in the GitHub repository.

Each time calculateMoves visits a Node or Leaf it needs to know the parent path to calculate the destination path. As the articles show, this isn't too hard to do with regular pattern matching and recursion.

I couldn't figure out, however, how to thread the path through the function when I tried to implement it with the catamorphism.

Breakthrough #

While I'm ready to walk away from problems when I'm stuck, I tend to remember them. Sometimes, I run into a solution much later.

This happened to me yesterday. I was trying to answer a Stack Overflow question which was explicitly about the application of universal abstractions. Once more, I was stuck by being unable to thread a 'context' through a catamorphism. This time, instead of a path, the context was an indentation depth. Basically, the question was how to render a tree with proper indentation.

Again, this isn't hard if you resort to explicit pattern matching and recursion, but I couldn't figure out how to do it via the data structure's catamorphism.

Fortunately, the user danidiaz posted an awesome answer while I was struggling. The answer uses a trick that I hadn't noticed before: It threads the indentation depth through the data structure by using the catamorphism to produce a structure map with a function as the carrier type. Specifically, danidiaz defines the algebra Todo' (Int -> String) -> Int -> String to reduce a Todo' (Int -> String) to an Int -> String function. This function then gets initialised with the depth 0.

While I've been doing functional programming for years, I sometimes still tend to forget that functions are first-class values...

This trick, though, seems to be universally applicable. If you need to thread a context through a catamorphism, define the algebra to work on functions that take the context as an argument.

If this is a universally applicable trick, it also ought to work with the calculateMoves function.

Haskell re-implementation #

In my Haskell proof of concept, the calculateMoves function originally looked like this:

calculateMoves :: Tree FilePath FilePath -> Tree FilePath Move
calculateMoves = imp ""
  where imp path    (Leaf x) = Leaf $ Move x $ replaceDirectory x path
        imp path (Node x xs) = Node (path </> x) $ imp (path </> x) <$> xs

It uses an imp (for implementation) function to explicitly recurse over a Tree FilePath FilePath. Until yesterday, I couldn't come up with a better solution to thread the path through the data structure.

The new trick suggests that it'd be possible to implement the function on foldTree (the catamorphism) by using a function as the carrier type. Since the context to be threaded through the catamorphism is a String (the path), the catamorphism should produce a function that takes a String as argument. In other words, the carrier type of the Tree should be String -> Tree FilePath Move.

Let's expand on this: The type of foldTree is foldTree :: (a -> [c] -> c) -> (b -> c) -> Tree a b -> c. Usually, I tend to think of the type parameter c as the type of some value, but since it's unconstrained, it can also be a function. That's what we need here: c should be String -> Tree FilePath Move.

That's not too hard to do, because of currying. Just write functions that take an extra String argument and pass them to foldTree:

calculateMoves :: Tree FilePath FilePath -> Tree FilePath Move
calculateMoves t = foldTree fNode fLeaf t ""
  where
    fLeaf :: FilePath -> String -> Tree FilePath Move
    fLeaf x    path = Leaf $ Move x $ replaceDirectory x path
    fNode :: FilePath -> [String -> Tree FilePath Move] -> String -> Tree FilePath Move
    fNode x fs path = Node (path </> x) $ ($ path </> x) <$> fs

Here I've used type annotations for the local functions, but that's entirely optional:

calculateMoves :: Tree FilePath FilePath -> Tree FilePath Move
calculateMoves t = foldTree fNode fLeaf t ""
  where
    fLeaf x    path = Leaf $ Move x $ replaceDirectory x path
    fNode x fs path = Node (path </> x) $ ($ path </> x) <$> fs

I included the type annotations to make it a little clearer what's going on. Recall that the type of foldTree is foldTree :: (a -> [c] -> c) -> (b -> c) -> Tree a b -> c. First consider the second of the two function arguments, the one I call fLeaf in the above code. It's the simplest of the two, so it makes sense to start with that one.

The generic type of fLeaf is b -> c. How does that map to the type of fLeaf, which is FilePath -> String -> Tree FilePath Move?

Well, the Tree that the catamorphism runs on is a Tree FilePath FilePath. Mapped to the parametrically polymorphic type of foldTree that's Tree a b. In other words, b maps to FilePath. Thus, in order to fit the type of b -> c, the type corresponding to b in fLeaf must be FilePath. What's left? String -> Tree FilePath Move is what's left. The function takes a FilePath as input and returns a String -> Tree FilePath Move. In other words, c ~ String -> Tree FilePath Move.

How does that fit with fNode?

Generically, this function must have the type a -> [c] -> c. We've already established that c must be String -> Tree FilePath Move. Since the catamorphism runs on a Tree FilePath FilePath, we also know that a must be FilePath. Thus, plugging in all the types, fNode must have the type FilePath -> [String -> Tree FilePath Move] -> String -> Tree FilePath Move. Note, particularly, that the second argument is a list of functions. That's why I decided to name the parameter fs, for functions.

The entire expression foldTree fNode fLeaf t, then, has the type String -> Tree FilePath Move, since c is String -> Tree FilePath Move and the return type of foldTree is c.

The final trick is to apply this function to the initial relative path "", which returns a Tree FilePath Move.

This compiles and passes all tests. calculateMoves is now implemented using the Tree catamorphism, a goal that eluded me for more than one and a half years.

F# re-implementation #

With the Haskell proof of concept in place, it's fairly trivial to port the new implementation to the F# code base.

The calculateMoves function originally looked like this:

// Tree<string,FileInfo> -> Tree<string,Move>
let calculateMoves =
    let replaceDirectory (f : FileInfo) d =
        FileInfo (Path.Combine (d, f.Name))
    let rec imp path = function
        | Leaf x ->
            Leaf { Source = x; Destination = replaceDirectory x path }
        | Node (x, xs) ->
            let newNPath = Path.Combine (path, x)
            Tree.node newNPath (List.map (imp newNPath) xs)
    imp ""

In the F# code base, the catamorphism is called Tree.cata, but otherwise looks like the Haskell foldTree function. The refactoring is also similar:

// Tree<string, FileInfo> -> Tree<string, Move>
let calculateMoves t =
    // FileInfo -> string -> FileInfo
    let replaceDirectory (f : FileInfo) d = FileInfo (Path.Combine (d, f.Name))
    // FileInfo -> string -> Tree<'a, Move>
    let fLeaf x path = Leaf { Source = x; Destination = replaceDirectory x path }
    // string -> (string -> Tree<string, 'a>) list -> string -> Tree<string, 'a>
    let fNode x fs path =
        let newNPath = Path.Combine (path, x)
        Tree.node newNPath (List.map (fun f -> f newNPath) fs)
    Tree.cata fNode fLeaf t ""

Again, the expression Tree.cata fNode fLeaf t has the type string -> Tree<string, Move>, so applying it to "" produces a Tree<string, Move> return value.

Conclusion #

I don't recall where I read the following, but I was under the impression that a data structure's catamorphism was its 'universal API', upon which you could implement any other functionality. I'd love it if it was true, but after my 2019 failure to implement calculateMoves via the Tree catamorphism, I wasn't sure if such a conjecture would hold.

I still don't know if that assertion holds universally, but at least one reason to doubt it has now been removed.

Comments

Tyson Williams #

Excellent work Mark! I too had not forgotten about this, and it nagged me as well.

To some extent, I feel like your explanation of how to implement calculateMoves via Tree.cata is top-down. By top-down, I mean it might depend on discovering the key idea of having Tree.cata return a function and then figuring out the correct type for that function. A good thing about such top-down approaches is being immediately aware that a better solution likely exists even if it takes some time and effort to find it.

I was curious if a bottom-up approach would work. By bottom-up, I mean applying small refacorings to the code that are motivated by the principles, conventions, or style of functional programming. I do think I found such an approach. Of course it is a bit contradictory of me to only be able to find this approach after I read your presentation of the top-down approach. However, I am thinking of it like a kata. I now know such a bottom-up approach should be possible, and I want to find it.

My bottom-up approach is in this branch. Here is a brief summary of how I want myself to think of making those commits in that order.

Each case of the discriminated union could be extracted to its own function. This is easy to do in the Leaf case (so do it now), but it is not as easy to do in the Node case because of recursion, so delay that change for a bit. If we did extract both functions though, both functions would include the argument that I called pathToParent. Since it is passed in everywhere, it should be passed in nowhere (by eta reducing). To do that, we need it to be the last parameter to imp. After switching this order, we now deal with the recursion by doing it as soon as possible. Then the remaining code in that case can be extracted, and imp is essentially Tree.cata.

In this approach, I never thought about the possibility of Tree.cata returning a function. It just sort of fell out as a consequence of my other changes.

2021-04-12 17:49 UTC

Gonzalo Waszczuk #

Very nice!

In Haskell there is a library called recursion-schemes that showcases these types of recursion with catamorphisms, but also with many others recursion schemes. You can check it out and see if it gives you any new ideas.

Regarding this use of catamorphism, the library itself I believe shows a very similar example here, using the Reader type (which is isomorphic to the function you used in your example):

>>> :{
let pprint2 :: Tree Int -> String
    pprint2 = flip runReader 0 . cataA go
      where
	go :: TreeF Int (Reader Int String)
	   -> Reader Int String
	go (NodeF i rss) = do
	  -- rss :: [Reader Int String]
	  -- ss  :: [String]
	  ss <- local (+ 2) $ sequence rss
	  indent <- ask
	  let s = replicate indent ' ' ++ "* " ++ show i
	  pure $ intercalate "\n" (s : ss)
:}

>>> putStrLn $ pprint2 myTree
* 0
  * 1
  * 2
  * 3
    * 31
      * 311
	* 3111
	* 3112

2021-04-14 02:27 UTC

Mark Seemann #

Gonzalo, thank you for reminding me of the recursion-schemes library. It's one of those tomes of knowledge of which I'm aware, but never really have gotten around to look at...

2021-04-16 6:29 UTC

Published: Monday, 12 April 2021 11:09:00 UTC

Threading context through a catamorphism by Mark Seemann