Threading context through a catamorphism by Mark Seemann
A problem solved after 1½ years.
You've probably noticed that it's easier to learn something new if it looks or sounds like something you already know. As a native Dane, I've found it easier to learn English and German than Russian and Japanese. If you originally were a Java or C# developer, you probably find JavaScript more approachable than Clojure or APL.
I believe that this extends to design patterns and universal abstractions as well. If code new to you follows well-known abstractions, it may be easier to learn than if it's structured in an entirely ad-hoc manner. This is my motivation for learning such universal abstractions as monoids, functors, and catamorphisms.
I particularly enjoy when it's possible to apply such abstractions to a proper problem. This occasionally happens. One example is my small article series on a functional file system.
A fly in the ointment #
In those articles, I described how you could base most of the code on the rose tree catamorphism. There was just one snag. There was one function, calculateMoves
, that I was unable to implement with the catamorphism. In the article, I acknowledged my failure:
"Earlier, I wrote that you can implement desiredThis was true for both the Haskell proof of concept as well as the F# port.Tree
functionality with thefoldTree
function, but that was a simplification. If you can implement the functionality ofcalculateMoves
withfoldTree
, I don't know how."
Tyson Williams and I discussed this wart without getting closer to a solution.
As the idiom goes, perfect is the enemy of good, so I decided to move on, although it nagged me.
Problem, condensed #
The problem with the calculateMoves
function was that it needed to thread a 'context' recursively through the entire data structure. In this case, the context was a file path.
When calculateMoves
runs over the input tree, it needs to thread a relative path
through the function, building it up as it traverses the data structure.
For example, if a leaf node named 1
is in a directory named b
, which itself is a subdirectory of a
, the relative path should be a/b/1
. This example is straight from the test cases shown in both articles. You can also find the tests in the GitHub repository.
Each time calculateMoves
visits a Node
or Leaf
it needs to know the parent path
to calculate the destination path. As the articles show, this isn't too hard to do with regular pattern matching and recursion.
I couldn't figure out, however, how to thread the path
through the function when I tried to implement it with the catamorphism.
Breakthrough #
While I'm ready to walk away from problems when I'm stuck, I tend to remember them. Sometimes, I run into a solution much later.
This happened to me yesterday. I was trying to answer a Stack Overflow question which was explicitly about the application of universal abstractions. Once more, I was stuck by being unable to thread a 'context' through a catamorphism. This time, instead of a path
, the context was an indentation depth. Basically, the question was how to render a tree with proper indentation.
Again, this isn't hard if you resort to explicit pattern matching and recursion, but I couldn't figure out how to do it via the data structure's catamorphism.
Fortunately, the user danidiaz posted an awesome answer while I was struggling. The answer uses a trick that I hadn't noticed before: It threads the indentation depth through the data structure by using the catamorphism to produce a structure map with a function as the carrier type. Specifically, danidiaz defines the algebra Todo' (Int -> String) -> Int -> String
to reduce a Todo' (Int -> String)
to an Int -> String
function. This function then gets initialised with the depth 0
.
While I've been doing functional programming for years, I sometimes still tend to forget that functions are first-class values...
This trick, though, seems to be universally applicable. If you need to thread a context through a catamorphism, define the algebra to work on functions that take the context as an argument.
If this is a universally applicable trick, it also ought to work with the calculateMoves
function.
Haskell re-implementation #
In my Haskell proof of concept, the calculateMoves
function originally looked like this:
calculateMoves :: Tree FilePath FilePath -> Tree FilePath Move calculateMoves = imp "" where imp path (Leaf x) = Leaf $ Move x $ replaceDirectory x path imp path (Node x xs) = Node (path </> x) $ imp (path </> x) <$> xs
It uses an imp
(for implementation) function to explicitly recurse over a Tree FilePath FilePath
. Until yesterday, I couldn't come up with a better solution to thread the path
through the data structure.
The new trick suggests that it'd be possible to implement the function on foldTree
(the catamorphism) by using a function as the carrier type. Since the context to be threaded through the catamorphism is a String
(the path
), the catamorphism should produce a function that takes a String
as argument. In other words, the carrier type of the Tree
should be String -> Tree FilePath Move
.
Let's expand on this: The type of foldTree
is foldTree :: (a -> [c] -> c) -> (b -> c) -> Tree a b -> c
. Usually, I tend to think of the type parameter c
as the type of some value, but since it's unconstrained, it can also be a function. That's what we need here: c
should be String -> Tree FilePath Move
.
That's not too hard to do, because of currying. Just write functions that take an extra String
argument and pass them to foldTree
:
calculateMoves :: Tree FilePath FilePath -> Tree FilePath Move calculateMoves t = foldTree fNode fLeaf t "" where fLeaf :: FilePath -> String -> Tree FilePath Move fLeaf x path = Leaf $ Move x $ replaceDirectory x path fNode :: FilePath -> [String -> Tree FilePath Move] -> String -> Tree FilePath Move fNode x fs path = Node (path </> x) $ ($ path </> x) <$> fs
Here I've used type annotations for the local functions, but that's entirely optional:
calculateMoves :: Tree FilePath FilePath -> Tree FilePath Move calculateMoves t = foldTree fNode fLeaf t "" where fLeaf x path = Leaf $ Move x $ replaceDirectory x path fNode x fs path = Node (path </> x) $ ($ path </> x) <$> fs
I included the type annotations to make it a little clearer what's going on. Recall that the type of foldTree
is foldTree :: (a -> [c] -> c) -> (b -> c) -> Tree a b -> c
. First consider the second of the two function arguments, the one I call fLeaf
in the above code. It's the simplest of the two, so it makes sense to start with that one.
The generic type of fLeaf
is b -> c
. How does that map to the type of fLeaf
, which is FilePath -> String -> Tree FilePath Move
?
Well, the Tree
that the catamorphism runs on is a Tree FilePath FilePath
. Mapped to the parametrically polymorphic type of foldTree
that's Tree a b
. In other words, b
maps to FilePath
. Thus, in order to fit the type of b -> c
, the type corresponding to b
in fLeaf
must be FilePath
. What's left? String -> Tree FilePath Move
is what's left. The function takes a FilePath
as input and returns a String -> Tree FilePath Move
. In other words, c ~ String -> Tree FilePath Move
.
How does that fit with fNode
?
Generically, this function must have the type a -> [c] -> c
. We've already established that c
must be String -> Tree FilePath Move
. Since the catamorphism runs on a Tree FilePath FilePath
, we also know that a
must be FilePath
. Thus, plugging in all the types, fNode
must have the type FilePath -> [String -> Tree FilePath Move] -> String -> Tree FilePath Move
. Note, particularly, that the second argument is a list of functions. That's why I decided to name the parameter fs
, for functions.
The entire expression foldTree fNode fLeaf t
, then, has the type String -> Tree FilePath Move
, since c
is String -> Tree FilePath Move
and the return type of foldTree
is c
.
The final trick is to apply this function to the initial relative path ""
, which returns a Tree FilePath Move
.
This compiles and passes all tests. calculateMoves
is now implemented using the Tree
catamorphism, a goal that eluded me for more than one and a half years.
F# re-implementation #
With the Haskell proof of concept in place, it's fairly trivial to port the new implementation to the F# code base.
The calculateMoves
function originally looked like this:
// Tree<string,FileInfo> -> Tree<string,Move> let calculateMoves = let replaceDirectory (f : FileInfo) d = FileInfo (Path.Combine (d, f.Name)) let rec imp path = function | Leaf x -> Leaf { Source = x; Destination = replaceDirectory x path } | Node (x, xs) -> let newNPath = Path.Combine (path, x) Tree.node newNPath (List.map (imp newNPath) xs) imp ""
In the F# code base, the catamorphism is called Tree.cata
, but otherwise looks like the Haskell foldTree
function. The refactoring is also similar:
// Tree<string, FileInfo> -> Tree<string, Move> let calculateMoves t = // FileInfo -> string -> FileInfo let replaceDirectory (f : FileInfo) d = FileInfo (Path.Combine (d, f.Name)) // FileInfo -> string -> Tree<'a, Move> let fLeaf x path = Leaf { Source = x; Destination = replaceDirectory x path } // string -> (string -> Tree<string, 'a>) list -> string -> Tree<string, 'a> let fNode x fs path = let newNPath = Path.Combine (path, x) Tree.node newNPath (List.map (fun f -> f newNPath) fs) Tree.cata fNode fLeaf t ""
Again, the expression Tree.cata fNode fLeaf t
has the type string -> Tree<string, Move>
, so applying it to ""
produces a Tree<string, Move>
return value.
Conclusion #
I don't recall where I read the following, but I was under the impression that a data structure's catamorphism was its 'universal API', upon which you could implement any other functionality. I'd love it if it was true, but after my 2019 failure to implement calculateMoves
via the Tree
catamorphism, I wasn't sure if such a conjecture would hold.
I still don't know if that assertion holds universally, but at least one reason to doubt it has now been removed.
Comments
Excellent work Mark! I too had not forgotten about this, and it nagged me as well.
To some extent, I feel like your explanation of how to implement
calculateMoves
viaTree.cata
is top-down. By top-down, I mean it might depend on discovering the key idea of havingTree.cata
return a function and then figuring out the correct type for that function. A good thing about such top-down approaches is being immediately aware that a better solution likely exists even if it takes some time and effort to find it.I was curious if a bottom-up approach would work. By bottom-up, I mean applying small refacorings to the code that are motivated by the principles, conventions, or style of functional programming. I do think I found such an approach. Of course it is a bit contradictory of me to only be able to find this approach after I read your presentation of the top-down approach. However, I am thinking of it like a kata. I now know such a bottom-up approach should be possible, and I want to find it.
My bottom-up approach is in this branch. Here is a brief summary of how I want myself to think of making those commits in that order.
Each case of the discriminated union could be extracted to its own function. This is easy to do in the
Leaf
case (so do it now), but it is not as easy to do in theNode
case because of recursion, so delay that change for a bit. If we did extract both functions though, both functions would include the argument that I calledpathToParent
. Since it is passed in everywhere, it should be passed in nowhere (by eta reducing). To do that, we need it to be the last parameter toimp
. After switching this order, we now deal with the recursion by doing it as soon as possible. Then the remaining code in that case can be extracted, andimp
is essentiallyTree.cata
.In this approach, I never thought about the possibility of
Tree.cata
returning a function. It just sort of fell out as a consequence of my other changes.Very nice!
In Haskell there is a library called recursion-schemes that showcases these types of recursion with catamorphisms, but also with many others recursion schemes. You can check it out and see if it gives you any new ideas.
Regarding this use of catamorphism, the library itself I believe shows a very similar example here, using the Reader type (which is isomorphic to the function you used in your example):
Gonzalo, thank you for reminding me of the recursion-schemes library. It's one of those tomes of knowledge of which I'm aware, but never really have gotten around to look at...