List catamorphism by Mark Seemann
The catamorphism for a list is the same as its fold.
This article is part of an article series about catamorphisms. A catamorphism is a universal abstraction that describes how to digest a data structure into a potentially more compact value.
This article presents the catamorphism for (linked) lists, and other collections in general. It also shows how to identify it. The beginning of this article presents the catamorphism in C#, with an example. The rest of the article describes how to deduce the catamorphism. This part of the article presents my work in Haskell. Readers not comfortable with Haskell can just read the first part, and consider the rest of the article as an optional appendix.
The C# part of the article will discuss IEnumerable<T>
, while the Haskell part will deal specifically with linked lists. Since C# is a less strict language anyway, we have to make some concessions when we consider how concepts translate. In my experience, the functionality of IEnumerable<T>
closely mirrors that of Haskell lists.
C# catamorphism #
The .NET base class library defines this Aggregate overload:
public static TAccumulate Aggregate<TSource, TAccumulate>( this IEnumerable<TSource> source, TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> func);
This is the catamorphism for linked lists (and, I'll conjecture, for IEnumerable<T>
in general). The introductory article already used it to show several motivating examples, of which I'll only repeat the last:
> new[] { 42, 1337, 2112, 90125, 5040, 7, 1984 } . .Aggregate(Angle.Identity, (a, i) => a.Add(Angle.FromDegrees(i))) [{ Angle = 207° }]
In short, the catamorphism is, similar to the previous catamorphisms covered in this article series, a pair made from an initial value and a function. This has been true for both the Peano catamorphism and the Maybe catamorphism. An initial value is just a value in all three cases, but you may notice that the function in question becomes increasingly elaborate. For IEnumerable<T>
, it's a function that takes two values. One of the values are of the type of the input list, i.e. for IEnumerable<TSource>
it would be TSource
. By elimination you can deduce that this value must come from the input list. The other value is of the type TAccumulate
, which implies that it could be the seed
, or the result from a previous call to func
.
List F-Algebra #
As in the previous article, I'll use Fix
and cata
as explained in Bartosz Milewski's excellent article on F-Algebras. The ListF
type comes from his article as well, although I've renamed the type arguments:
data ListF a c = NilF | ConsF a c deriving (Show, Eq, Read) instance Functor (ListF a) where fmap _ NilF = NilF fmap f (ConsF a c) = ConsF a $ f c
Like I did with MaybeF
, I've named the 'data' type argument a
, and the carrier type c
(for carrier). Once again, notice that the Functor
instance maps over the carrier type c
; not over the 'data' type a
.
As was also the case when deducing the Maybe catamorphism, Haskell isn't too happy about defining instances for a type like Fix (ListF a)
. To address that problem, you can introduce a newtype
wrapper:
newtype ListFix a = ListFix { unListFix :: Fix (ListF a) } deriving (Show, Eq, Read)
You can define Functor
, Applicative
, Monad
, etc. instances for this type without resorting to any funky GHC extensions. Keep in mind that ultimately, the purpose of all this code is just to figure out what the catamorphism looks like. This code isn't intended for actual use.
A few helper functions make it easier to define ListFix
values:
nilF :: ListFix a nilF = ListFix $ Fix NilF consF :: a -> ListFix a -> ListFix a consF x = ListFix . Fix . ConsF x . unListFix
With those functions, you can create ListFix
linked lists:
Prelude Fix List> nilF ListFix {unListFix = Fix NilF} Prelude Fix List> consF 42 $ consF 1337 $ consF 2112 nilF ListFix {unListFix = Fix (ConsF 42 (Fix (ConsF 1337 (Fix (ConsF 2112 (Fix NilF))))))}
The first example creates an empty list, whereas the second creates a list of three integers, corresponding to [42,1337,2112]
.
That's all you need to identify the catamorphism.
Haskell catamorphism #
At this point, you have two out of three elements of an F-Algebra. You have an endofunctor (ListF
), and an object a
, but you still need to find a morphism ListF a c -> c
. Notice that the algebra you have to find is the function that reduces the functor to its carrier type c
, not the 'data type' a
. This takes some time to get used to, but that's how catamorphisms work. This doesn't mean, however, that you get to ignore a
, as you'll see.
As in the previous article, start by writing a function that will become the catamorphism, based on cata
:
listF = cata alg . unListFix where alg NilF = undefined alg (ConsF h t) = undefined
While this compiles, with its undefined
implementations, it obviously doesn't do anything useful. I find, however, that it helps me think. How can you return a value of the type c
from the NilF
case? You could pass an argument to the listF
function:
listF n = cata alg . unListFix where alg NilF = n alg (ConsF h t) = undefined
The ConsF
case, contrary to NilF
, contains both a head h
(of type a
) and a tail t
(of type c
). While you could make the code compile by simply returning t
, it'd be incorrect to ignore h
. In order to deal with both, you'll need a function that turns both h
and t
into a value of the type c
. You can do this by passing a function to listF
and using it:
listF :: (a -> c -> c) -> c -> ListFix a -> c listF f n = cata alg . unListFix where alg NilF = n alg (ConsF h t) = f h t
This works. Since cata
has the type Functor f => (f a -> a) -> Fix f -> a
, that means that alg
has the type f a -> a
. In the case of ListF
, the compiler infers that the alg
function has the type ListF a c -> c
, which is just what you need!
You can now see what the carrier type c
is for. It's the type that the algebra extracts, and thus the type that the catamorphism returns.
This, then, is the catamorphism for lists. As has been consistent so far, it's a pair made from an initial value and a function. Once more, this isn't the only possible catamorphism, since you can, for example, trivially flip the arguments to listF
:
listF :: c -> (a -> c -> c) -> ListFix a -> c listF n f = cata alg . unListFix where alg NilF = n alg (ConsF h t) = f h t
You can also flip the arguments of f
:
listF :: c -> (c -> a -> c) -> ListFix a -> c listF n f = cata alg . unListFix where alg NilF = n alg (ConsF h t) = f t h
These representations are all isomorphic to each other, but notice that the last variation is similar to the above C# Aggregate
overload. The initial n
value is the seed
, and the function f
has the same shape as func
. Thus, I consider it reasonable to conjecture that that Aggregate
overload is the catamorphism for IEnumerable<T>
.
Basis #
You can implement most other useful functionality with listF
. The rest of this article uses the first of the variations shown above, with the type (a -> c -> c) -> c -> ListFix a -> c
. Here's the Semigroup
instance:
instance Semigroup (ListFix a) where xs <> ys = listF consF ys xs
The initial value passed to listF
is ys
, and the function to apply is simply the consF
function, thus 'consing' the two lists together. Here's an example of the operation in action:
Prelude Fix List> consF 42 $ consF 1337 nilF <> (consF 2112 $ consF 1 nilF) ListFix {unListFix = Fix (ConsF 42 (Fix (ConsF 1337 (Fix (ConsF 2112 (Fix (ConsF 1 (Fix NilF))))))))}
With a Semigroup
instance, it's trivial to also implement the Monoid
instance:
instance Monoid (ListFix a) where mempty = nilF
While you could implement mempty
with listF
(mempty = listF (const id) nilF nilF
), that'd be overcomplicated. Just because you can implement all functionality using listF
, it doesn't mean that you should, if a simpler alternative exists.
You can, on the other hand, use listF
for the Functor
instance:
instance Functor ListFix where fmap f = listF (\h l -> consF (f h) l) nilF
You could write the function you pass to listF
in a point-free fashion as consF . f
, but I thought it'd be easier to follow what happens when written as an explicit lambda expression. The function receives a 'current value' h
, as well as the part of the list which has already been translated l
. Use f
to translate h
, and consF
the result unto l
.
You can add Applicative
and Monad
instances in a similar fashion:
instance Applicative ListFix where pure x = consF x nilF fs <*> xs = listF (\f acc -> (f <$> xs) <> acc) nilF fs instance Monad ListFix where xs >>= f = listF (\x acc -> f x <> acc) nilF xs
What may be more interesting, however, is the Foldable
instance:
instance Foldable ListFix where foldr = listF
The demonstrates that listF
and foldr
is the same.
Next, you can also add a Traversable
instance:
instance Traversable ListFix where sequenceA = listF (\x acc -> consF <$> x <*> acc) (pure nilF)
Finally, you can implement conversions to and from the standard list []
type, using ana
as the dual of cata
:
toList :: ListFix a -> [a] toList = listF (:) [] fromList :: [a] -> ListFix a fromList = ListFix . ana coalg where coalg [] = NilF coalg (h:t) = ConsF h t
This demonstrates that ListFix
is isomorphic to []
, which again establishes that listF
and foldr
are equivalent.
Summary #
The catamorphism for lists is a pair made from an initial value and a function. One variation is equal to foldr
. Like Maybe, the catamorphism is the same as the fold.
In C#, this function corresponds to the Aggregate
extension method identified above.
You've now seen two examples where the catamorphism coincides with the fold. You've also seen examples (Boolean catamorphism and Peano catamorphism) where there's a catamorphism, but no fold at all. In a subsequent article, you'll see an example of a container that has both catamorphism and fold, but where the catamorphism is more general than the fold.
Next: NonEmpty catamorphism.