# Strings, lists, and sequences as a monoid by Mark Seemann

*Strings, lists, and sequences are essentially the same monoid. An introduction for object-oriented programmers.*

This article is part of a series about monoids. In short, a *monoid* is an associative binary operation with a neutral element (also known as *identity*).

**Sequences**

C# models a lazily evaluated sequence of values as `IEnumerable<T>`

. You can combine two sequences by appending one to the other:

xs.Concat(ys);

Here, `xs`

and `ys`

are instances of `IEnumerable<T>`

. The Concat extension method concatenates two sequences together. It has the signature `IEnumerable<T> Concat<T>(IEnumerable<T>, IEnumerable<T>)`

, so it's a binary operation. If it's also associative and has identity, then it's a monoid.

Sequences are associative, because the order of evaluation doesn't change the outcome. Associativity is a *property* of a monoid, so one way to demonstrate this is with property-based testing:

[Property(QuietOnSuccess = true)] public void ConcatIsAssociative(int[] xs, int[] ys, int[] zs) { Assert.Equal( xs.Concat(ys).Concat(zs), xs.Concat(ys.Concat(zs))); }

This automated test uses FsCheck (yes, it also works from C#!) to demonstrate that `Concat`

is associative. For simplicity's sake, the test declares `xs`

, `ys`

, and `zs`

as *arrays*. This is because FsCheck natively knows how to create arrays, whereas it doesn't have built-in support for `IEnumerable<T>`

. While you can use FsCheck's API to define how `IEnumerable<T>`

objects should be created, I didn't want to add this extra complexity to the example. The associativity property holds for other pure implementations of `IEnumerable<T>`

as well. Try it, if you need to convince yourself.

The `Concat`

operation also has identity. The identity element is the empty sequence, as this FsCheck-based test demonstrates:

[Property(QuietOnSuccess = true)] public void ConcatHasIdentity(int[] xs) { Assert.Equal( Enumerable.Empty<int>().Concat(xs), xs.Concat(Enumerable.Empty<int>())); Assert.Equal( xs, xs.Concat(Enumerable.Empty<int>())); }

Appending an empty sequence before or after another sequence doesn't change the other sequence.

Since `Concat`

is an associative binary operation with identity, it's a monoid.

**Linked lists and other collections**

The above FsCheck-based tests demonstrate that `Concat`

is a monoid for arrays. The properties hold for all pure implementations of `IEnumerable<T>`

.

In Haskell, lazily evaluated sequences are modelled as linked lists. These are lazy because all Haskell expressions are lazily evaluated by default. The monoid laws hold for Haskell lists as well:

λ> ([1,2,3] ++ [4,5,6]) ++ [7,8,9] [1,2,3,4,5,6,7,8,9] λ> [1,2,3] ++ ([4,5,6] ++ [7,8,9]) [1,2,3,4,5,6,7,8,9] λ> [] ++ [1,2,3] [1,2,3] λ> [1,2,3] ++ [] [1,2,3]

In Haskell, `++`

is the operator that corresponds to `Concat`

in C#, but the operation is normally called *append* instead of *concat*.

In F#, linked lists are eagerly evaluated, because all F# expressions are eagerly evaluated by default. Lists are still monoids, though, because the monoid laws still hold:

> ([1; 2; 3] @ [4; 5; 6]) @ [7; 8; 9];; val it : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9] > [1; 2; 3] @ ([4; 5; 6] @ [7; 8; 9]);; val it : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9] > [] @ [1; 2; 3];; val it : int list = [1; 2; 3] > [1; 2; 3] @ [];; val it : int list = [1; 2; 3]

In F#, the list concatenation operator is `@`

, instead of `++`

, but the behaviour is the same.

**Strings**

Have you ever wondered why text values are called *strings* in most programming languages? After all, for most people, a string is a long flexible structure made from fibres. What does that have to do with text?

In programming, text is often arranged in memory as a consecutive block of characters, one after the other. Thus, you could think of text as characters like pearls on a string. A program often reads such a consecutive block of memory until it reaches a terminator of some kind. Thus, strings of characters have an order to them. They are similar to sequences and lists.

In fact, in Haskell, the type `String`

is nothing but a synonym for `[Char]`

(meaning: a list of `Char`

values). Thus, anything you can do with lists of other values, you can do with `String`

values:

λ> "foo" ++ [] "foo" λ> [] ++ "foo" "foo" λ> ("foo" ++ "bar") ++ "baz" "foobarbaz" λ> "foo" ++ ("bar" ++ "baz") "foobarbaz"

Clearly, `++`

over `String`

is a monoid in Haskell.

Likewise, in .NET, `System.String`

implements `IEnumerable<char>`

, so you'd expect it to be a monoid here as well - and it *almost* is. It's certainly associative:

[Property(QuietOnSuccess = true)] public void PlusIsAssociative(string x, string y, string z) { Assert.Equal( (x + y) + z, x + (y + z)); }

In C#, the `+`

operator is actually defined for `string`

, and as the FsCheck test demonstrates, it's associative. It almost also has identity. What's the equivalent of an empty list for strings? The empty string:

[Property(QuietOnSuccess = true)] public void PlusHasIdentity(NonNull<string> x) { Assert.Equal("" + x.Get, x.Get + ""); Assert.Equal(x.Get, x.Get + ""); }

Here, I had to tell FsCheck to avoid `null`

values, because, as usual, `null`

throws a big wrench into our attempts at being able to reason about the code.

The problem here is that `"" + null`

and `null + ""`

both return `""`

, which is not equal to the input value (`null`

). In other words, `""`

is not a true identity element for `+`

, because of this single special case. (And by the way, `null`

isn't the identity element either, because `null + null`

returns... `""`

! Of course it does.) This is, however, an implementation detail. As an exercise, consider writing an (extension) method in C# that makes `string`

a proper monoid, even for `null`

values. If you can do that, you'll have demonstrated that string concatenation is a monoid in .NET, just as it is in Haskell.

**Free monoid**

Recall that in the previous article, you learned how both addition and multiplication of numbers form monoids. There's at least one more monoid for numbers, and that's a sequence. If you have a generic sequence (`IEnumerable<T>`

), it can contain anything, including numbers.

Imagine that you have two numbers, 3 and 4, and you want to combine them, but you haven't yet made up your mind about *how* you want to combine them. In order to postpone the decision, you can put both numbers in a singleton array (that is, an array with a single element, not to be confused with the Singleton design pattern):

var three = new[] { 3 }; var four = new[] { 4 };

Since sequences are monoids, you can combine them:

```
var combination = three.Concat(four);
```

This gives you a new sequence that contains both numbers. At this point, you haven't lost any information, so once you've decided how to combine the numbers, you can *evaluate* the data that you've collected so far. This is called the free monoid.

If you need the sum of the numbers, you can add them together:

```
var sum = combination.Aggregate(0, (x, y) => x + y);
```

(Yes, I'm aware that the Sum method exists, but I want you to see the details.) This Aggregate overloads takes a `seed`

value as the first argument, and a function to combine two values as the second.

Here's how to get the product:

```
var product = combination.Aggregate(1, (x, y) => x * y);
```

Notice how in both cases, the `seed`

value is the identity for the monoidal operation: 0 for addition, and 1 for multiplication. Likewise, the aggregator function uses the binary operation associated with that particular monoid.

I think it's interesting that this is called the free monoid, similar to free monads. In both cases, you collect data without initially interpreting it, and then later you can submit the collected data to one of several evaluators.

**Summary**

Various collection types, like .NET sequences, arrays, or Haskell and F# lists, are monoids over concatenation. In Haskell, strings *are* lists, so string concatenation is a monoid as well. In .NET, the `+`

operator for strings is a monoid if you pretend that `null`

strings don't exist. Still, all of these are essentially variations of the same monoid.

It makes sense that C# uses `+`

for string concatenation, because, as the previous article described, addition is the most intuitive and 'natural' of all monoids. Because you know first-grade arithmetic, you can immediately grasp the concept of addition as a metaphor. A monoid, however, is more than a metaphor; it's an abstraction that describes well-behaved binary operations, where one of those operations just happen to be addition. It's a *generalisation* of the concept. It's an abstraction that you already understand.

**Next: ** Money monoid.