Another functional model of a file system, with code examples in C#.

This article is part of a series about Zippers. In this one, I port the FSZipper data structure from the Learn You a Haskell for Great Good! article Zippers.

A word of warning: I'm assuming that you're familiar with the contents of that article, so I'll skip the pedagogical explanations; I can hardly do it better that it's done there. Additionally, I'll make heavy use of certain standard constructs to port Haskell code, most notably Church encoding to model sum types in languages that don't natively have them. Such as C#. In some cases, I'll implement the Church encoding using the data structure's catamorphism. Since the cyclomatic complexity of the resulting code is quite low, you may be able to follow what's going on even if you don't know what Church encoding or catamorphisms are, but if you want to understand the background and motivation for that style of programming, you can consult the cited resources.

The code shown in this article is available on GitHub.

File system item initialization and structure #

If you haven't already noticed, Haskell (and other statically typed functional programming languages like F#) makes heavy use of sum types, and the FSZipper example is no exception. It starts with a one-liner to define a file system item, which may be either a file or a folder. In C# we must instead use a class:

public sealed class FSItem

Contrary to the two previous examples, the FSItem class has no generic type parameter. This is because I'm following the Haskell example code as closely as possible, but as I've previously shown, you can model a file hierarchy with a general-purpose rose tree.

Staying consistent with the two previous articles, I'll use Church encoding to model a sum type, and as discussed in the previous article I use a private implementation for that.

private readonly IFSItem imp;
 
private FSItem(IFSItem imp)
{
    this.imp = imp;
}
 
public static FSItem CreateFile(string namestring data)
{
    return new(new File(namedata));
}
 
public static FSItem CreateFolder(string nameIReadOnlyCollection<FSItemitems)
{
    return new(new Folder(nameitems));
}

Two static creation methods enable client developers to create a single FSItem object, or an entire tree, like the example from the Haskell code, here ported to C#:

private static readonly FSItem myDisk =
    FSItem.CreateFolder("root",
    [
        FSItem.CreateFile("goat_yelling_like_man.wmv""baaaaaa"),
        FSItem.CreateFile("pope_time.avi""god bless"),
        FSItem.CreateFolder("pics",
        [
            FSItem.CreateFile("ape_throwing_up.jpg""bleargh"),
            FSItem.CreateFile("watermelon_smash.gif""smash!!"),
            FSItem.CreateFile("skull_man(scary).bmp""Yikes!")
        ]),
        FSItem.CreateFile("dijon_poupon.doc""best mustard"),
        FSItem.CreateFolder("programs",
        [
            FSItem.CreateFile("fartwizard.exe""10gotofart"),
            FSItem.CreateFile("owl_bandit.dmg""mov eax, h00t"),
            FSItem.CreateFile("not_a_virus.exe""really not a virus"),
            FSItem.CreateFolder("source code",
            [
                FSItem.CreateFile("best_hs_prog.hs""main = print (fix error)"),
                FSItem.CreateFile("random.hs""main = print 4")
            ])
        ])
    ]);

Since the imp class field is just a private implementation detail, a client developer needs a way to query an FSItem object about its contents.

File system item catamorphism #

Just like the previous article, I'll start with the catamorphism. This is essentially the rose tree catamorphism, just less generic, since FSItem doesn't have a generic type parameter.

public TResult Aggregate<TResult>(
    Func<stringstringTResultwhenFile,
    Func<stringIReadOnlyCollection<TResult>, TResultwhenFolder)
{
    return imp.Aggregate(whenFilewhenFolder);
}

The Aggregate method delegates to its internal implementation class field, which is defined as the private nested interface IFSItem:

private interface IFSItem
{
    TResult Aggregate<TResult>(
        Func<stringstringTResultwhenFile,
        Func<stringIReadOnlyCollection<TResult>, TResultwhenFolder);
}

As discussed in the previous article, the interface is hidden away because it's only a vehicle for polymorphism. It's not intended for client developers to be used (although that would be benign) or implemented (which could break encapsulation). There are only, and should ever only be, two implementations. The one that represents a file is the simplest:

private sealed record File(string Namestring Data) : IFSItem
{
    public TResult Aggregate<TResult>(
        Func<stringstringTResultwhenFile,
        Func<stringIReadOnlyCollection<TResult>, TResultwhenFolder)
    {
        return whenFile(Name, Data);
    }
}

The File record's Aggregate method unconditionally calls the supplied whenFile function argument with the Name and Data that was originally supplied via its constructor.

The Folder implementation is a bit trickier, mostly due to its recursive nature, but also because I wanted it to have structural equality.

private sealed class Folder : IFSItem
{
    private readonly string name;
    private readonly IReadOnlyCollection<FSItem> items;
 
    public Folder(string NameIReadOnlyCollection<FSItemItems)
    {
        name = Name;
        items = Items;
    }
 
    public TResult Aggregate<TResult>(
        Func<stringstringTResultwhenFile,
        Func<stringIReadOnlyCollection<TResult>, TResultwhenFolder)
    {
        return whenFolder(
            name,
            items.Select(i => i.Aggregate(whenFilewhenFolder)).ToList());
    }
 
    public override bool Equals(objectobj)
    {
        return obj is Folder folder &&
               name == folder.name &&
               items.SequenceEqual(folder.items);
    }
 
    public override int GetHashCode()
    {
        return HashCode.Combine(name, items);
    }
}

It, too, unconditionally calls one of the two functions passed to its Aggregate method, but this time whenFolder. It does that, however, by first recursively calling Aggregate within a Select expression. It needs to do that because the whenFolder function expects the subtree to have been already converted to values of the TResult return type. This is a common pattern with catamorphisms, and takes a bit of time getting used to. You can see similar examples in the articles Tree catamorphism, Rose tree catamorphism, Full binary tree catamorphism, as well as the previous one in this series.

I also had to make Folder a class rather than a record, because I wanted the type to have structural equality, and you can't override Equals on records (and if the base class library has any collection type with structural equality, I'm not aware of it).

File system item Church encoding #

True to the structure of the previous article, the catamorphism doesn't look quite like a Church encoding, but it's possible to define the latter from the former.

public TResult Match<TResult>(
    Func<stringstringTResultwhenFile,
    Func<stringIReadOnlyCollection<FSItem>, TResultwhenFolder)
{
    return Aggregate(
        whenFile: (namedata) =>
            (item: CreateFile(namedata), result: whenFile(namedata)),
        whenFolder: (namepairs) =>
        {
            var items = pairs.Select(i => i.item).ToList();
            return (CreateFolder(nameitems), whenFolder(nameitems));
        }).result;
}

The trick is the same as in the previous article: Build up an intermediate tuple that contains both the current item as well as the result being accumulated. Once the Aggregate method returns, the Match method returns only the result part of the resulting tuple.

I implemented the whenFolder expression as a code block, because both tuple elements needed the items collection. You can inline the Select expression, but that would cause it to run twice. That's probably a premature optimization, but it also made the code a bit shorter, and, one may hope, a bit more readable.

Fily system breadcrumb #

Finally, things seem to be becoming a little easier. The port of FSCrumb is straightforward.

public sealed class FSCrumb
{
    public FSCrumb(
        string name,
        IReadOnlyCollection<FSItemleft,
        IReadOnlyCollection<FSItemright)
    {
        Name = name;
        Left = left;
        Right = right;
    }
 
    public string Name { get; }
    public IReadOnlyCollection<FSItem> Left { get; }
    public IReadOnlyCollection<FSItem> Right { get; }
 
    public override bool Equals(objectobj)
    {
        return obj is FSCrumb crumb &&
               Name == crumb.Name &&
               Left.SequenceEqual(crumb.Left) &&
               Right.SequenceEqual(crumb.Right);
    }
 
    public override int GetHashCode()
    {
        return HashCode.Combine(Name, Left, Right);
    }
}

The only reason this isn't a record is, once again, that I want to override Equals so that the type can have structural equality. Visual Studio wants me to convert to a primary constructor. That would simplify the code a bit, but actually not that much.

(I'm still somewhat conservative in my choice of new C# language features. Not that I have anything against primary constructors which, after all, F# has had forever. The reason I'm holding back is for didactic reasons. Not every reader is on the latest language version, and some readers may be using another programming language entirely. On the other hand, primary constructors seem natural and intuitive, so I may start using them here on the blog as well. I don't think that they're going to be much of a barrier to understanding.)

Now that we have both the data type we want to zip, as well as the breadcrumb type we need, we can proceed to add the Zipper.

File system Zipper #

The FSZipper C# class fills the position of the eponymous Haskell type alias. Data structure and initialization is straightforward.

public sealed class FSZipper
{
    private FSZipper(FSItem fSItemIReadOnlyCollection<FSCrumbbreadcrumbs)
    {
        FSItem = fSItem;
        Breadcrumbs = breadcrumbs;
    }
 
    public FSZipper(FSItem fSItem) : this(fSItem, [])
    {
    }
 
    public FSItem FSItem { get; }
    public IReadOnlyCollection<FSCrumb> Breadcrumbs { get; }
 
    // Methods follow here...

True to the style I've already established, I've made the master constructor private in order to highlight that the Breadcrumbs are the responsibility of the FSZipper class itself. It's not something client code need worry about.

Going down #

The Haskell Zippers article introduces fsUp before fsTo, but if we want to see some example code, we need to navigate to somewhere before we can navigate up. Thus, I'll instead start with the function that navigates to a child node.

public FSZipperGoTo(string name)
{
    return FSItem.Match(
        (__) => null,
        (folderNameitems) =>
        {
            FSItemitem = null;
            var ls = new List<FSItem>();
            var rs = new List<FSItem>();
            foreach (var i in items)
            {
                if (item is null && i.IsNamed(name))
                    item = i;
                else if (item is null)
                    ls.Add(i);
                else
                    rs.Add(i);
            }
 
            if (item is null)
                return null;
 
            return new FSZipper(
                item,
                Breadcrumbs.Prepend(new FSCrumb(folderNamelsrs)).ToList());
        });
}

This is by far the most complicated navigation we've seen so far, and I've even taken the liberty of writing an imperative implementation. It's not that I don't know how I could implement it in a purely functional fashion, but I've chosen this implementation for a couple of reasons. The first of which is that, frankly, it was easier this way.

This stems from the second reason: That the .NET base class library, as far as I know, offers no functionality like Haskell's break function. I could have written such a function myself, but felt that it was too much of a digression, even for me. Maybe I'll do that another day. It might make for a nice little exercise.

The third reason is that C# doesn't afford pattern matching on sequences, in the shape of destructuring the head and the tail of a list. (Not that I know of, anyway, but that language changes rapidly at the moment, and it does have some pattern-matching features now.) This means that I have to check item for null anyway.

In any case, while the implementation is imperative, an external caller can't tell. The GoTo method is still referentially transparent. Which means that it fits in your head.

You may have noticed that the implementation calls IsNamed, which is also new.

public bool IsNamed(string name)
{
    return Match((n_) => n == name, (n_) => n == name);
}

This is an instance method I added to FSItem.

In summary, the GoTo method enables client code to navigate down in the file hierarchy, as this unit test demonstrates:

[Fact]
public void GoToSkullMan()
{
    var sut = new FSZipper(myDisk);
 
    var actual = sut.GoTo("pics")?.GoTo("skull_man(scary).bmp");
 
    Assert.NotNull(actual);
    Assert.Equal(
        FSItem.CreateFile("skull_man(scary).bmp""Yikes!"),
        actual.FSItem);
}

The example is elementary. First go to the pics folder, and from there to the skull_man(scary).bmp.

Going up #

Going back up the hierarchy isn't as complicated.

public FSZipperGoUp()
{
    if (Breadcrumbs.Count == 0)
        return null;
 
    var head = Breadcrumbs.First();
    var tail = Breadcrumbs.Skip(1);
 
    return new FSZipper(
        FSItem.CreateFolder(head.Name, [.. head.Left, FSItem, .. head.Right]),
        tail.ToList());
}

If the Breadcrumbs collection is empty, we're already at the root, in which case we can't go further up. In that case, the GoUp method returns null, as does the GoTo method if it can't find an item with the desired name. This possibility is explicitly indicated by the FSZipper? return type; notice the question mark, which indicates that the value may be null. If you're working in a context or language where that feature isn't available, you may instead consider taking advantage of the Maybe monad (which is also what you'd idiomatically do in Haskell).

If Breadcrumbs is not empty, it means that there's a place to go up to. It also implies that the previous operation navigated down, and the only way that's possible is if the previous node was a folder. Thus, the GoUp method knows that it needs to reconstitute a folder, and from the head breadcrumb, it knows that folder's name, and what was originally to the Left and Right of the Zipper's FSItem property.

This unit test demonstrates how client code may use the GoUp method:

[Fact]
public void GoUpFromSkullMan()
{
    var sut = new FSZipper(myDisk);
    // This is the same as the GoToSkullMan test
    var newFocus = sut.GoTo("pics")?.GoTo("skull_man(scary).bmp");
 
    var actual = newFocus?.GoUp()?.GoTo("watermelon_smash.gif");
 
    Assert.NotNull(actual);
    Assert.Equal(
        FSItem.CreateFile("watermelon_smash.gif""smash!!"),
        actual.FSItem);
}

This test first repeats the navigation also performed by the other test, then uses GoUp to go one level up, which finally enables it to navigate to the watermelon_smash.gif file.

Renaming a file or folder #

A Zipper enables you to navigate a data structure, but you can also use it to modify the element in focus. One option is to rename a file or folder.

public FSZipper Rename(string newName)
{
    return new FSZipper(
        FSItem.Match(
            (_dat) => FSItem.CreateFile(newNamedat),
            (_items) => FSItem.CreateFolder(newNameitems)),
        Breadcrumbs);
}

The Rename method 'pattern-matches' on the 'current' FSItem and in both cases creates a new file or folder with the new name. Since it doesn't need the old name for anything, it uses the wildcard pattern to ignore that value. This operation is always possible, so the return type is FSZipper, without a question mark, indicating that the method never returns null.

The following unit test replicates the Haskell article's example by renaming the pics folder to cspi.

[Fact]
public void RenamePics()
{
    var sut = new FSZipper(myDisk);
 
    var actual = sut.GoTo("pics")?.Rename("cspi").GoUp();
 
    Assert.NotNull(actual);
    Assert.Empty(actual.Breadcrumbs);
    Assert.Equal(
        FSItem.CreateFolder("root",
        [
            FSItem.CreateFile("goat_yelling_like_man.wmv""baaaaaa"),
            FSItem.CreateFile("pope_time.avi""god bless"),
            FSItem.CreateFolder("cspi",
            [
                FSItem.CreateFile("ape_throwing_up.jpg""bleargh"),
                FSItem.CreateFile("watermelon_smash.gif""smash!!"),
                FSItem.CreateFile("skull_man(scary).bmp""Yikes!")
            ]),
            FSItem.CreateFile("dijon_poupon.doc""best mustard"),
            FSItem.CreateFolder("programs",
            [
                FSItem.CreateFile("fartwizard.exe""10gotofart"),
                FSItem.CreateFile("owl_bandit.dmg""mov eax, h00t"),
                FSItem.CreateFile("not_a_virus.exe""really not a virus"),
                FSItem.CreateFolder("source code",
                [
                    FSItem.CreateFile("best_hs_prog.hs""main = print (fix error)"),
                    FSItem.CreateFile("random.hs""main = print 4")
                ])
            ])
        ]),
        actual.FSItem);
}

Since the test uses GoUp after Rename, the actual value contains the entire tree, while the Breadcrumbs collection is empty.

Adding a new file #

Finally, we can add a new file to a folder.

public FSZipperAdd(FSItem item)
{
    return FSItem.Match<FSZipper?>(
        whenFile: (__) => null,
        whenFolder: (nameitems) => new FSZipper(
            FSItem.CreateFolder(nameitems.Prepend(item).ToList()),
            Breadcrumbs));
}

This operation may fail, since we can't add a file to a file. This is, again, clearly indicated by the return type, which allows null.

This implementation adds the file to the start of the folder, but it would also be possible to add it at the end. I would consider that slightly more idiomatic in C#, but here I've followed the Haskell example code, which conses the new item to the beginning of the list. As is idiomatic in Haskell.

The following unit test reproduces the Haskell article's example.

[Fact]
public void AddPic()
{
    var sut = new FSZipper(myDisk);
 
    var actual = sut.GoTo("pics")?.Add(FSItem.CreateFile("heh.jpg""lol"))?.GoUp();
 
    Assert.NotNull(actual);
    Assert.Equal(
        FSItem.CreateFolder("root",
        [
            FSItem.CreateFile("goat_yelling_like_man.wmv""baaaaaa"),
            FSItem.CreateFile("pope_time.avi""god bless"),
            FSItem.CreateFolder("pics",
            [
                FSItem.CreateFile("heh.jpg""lol"),
                FSItem.CreateFile("ape_throwing_up.jpg""bleargh"),
                FSItem.CreateFile("watermelon_smash.gif""smash!!"),
                FSItem.CreateFile("skull_man(scary).bmp""Yikes!")
            ]),
            FSItem.CreateFile("dijon_poupon.doc""best mustard"),
            FSItem.CreateFolder("programs",
            [
                FSItem.CreateFile("fartwizard.exe""10gotofart"),
                FSItem.CreateFile("owl_bandit.dmg""mov eax, h00t"),
                FSItem.CreateFile("not_a_virus.exe""really not a virus"),
                FSItem.CreateFolder("source code",
                [
                    FSItem.CreateFile("best_hs_prog.hs""main = print (fix error)"),
                    FSItem.CreateFile("random.hs""main = print 4")
                ])
            ])
        ]),
        actual.FSItem);
    Assert.Empty(actual.Breadcrumbs);
}

This example also follows the edit with a GoUp call, with the effect that the Zipper is once more focused on the entire tree. The assertion verifies that the new heh.jpg file is the first file in the pics folder.

Conclusion #

The code for FSZipper is actually a bit simpler than for the binary tree. This, I think, is mostly attributable to the FSZipper having fewer constituent sum types. While sum types are trivial, and extraordinarily useful in languages that natively support them, they require a lot of boilerplate in a language like C#.

Do you need something like FSZipper in C#? Probably not. As I've already discussed, this article series mostly exists as a programming exercise.



Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Monday, 23 September 2024 06:13:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Monday, 23 September 2024 06:13:00 UTC