# Strings, lists, and sequences as a monoid by Mark Seemann

*Strings, lists, and sequences are essentially the same monoid. An introduction for object-oriented programmers.*

This article is part of a series about monoids. In short, a *monoid* is an associative binary operation with a neutral element (also known as *identity*).

### Sequences #

C# models a lazily evaluated sequence of values as `IEnumerable<T>`

. You can combine two sequences by appending one to the other:

xs.Concat(ys);

Here, `xs`

and `ys`

are instances of `IEnumerable<T>`

. The Concat extension method concatenates two sequences together. It has the signature `IEnumerable<T> Concat<T>(IEnumerable<T>, IEnumerable<T>)`

, so it's a binary operation. If it's also associative and has identity, then it's a monoid.

Sequences are associative, because the order of evaluation doesn't change the outcome. Associativity is a *property* of a monoid, so one way to demonstrate this is with property-based testing:

[Property(QuietOnSuccess = true)] public void ConcatIsAssociative(int[] xs, int[] ys, int[] zs) { Assert.Equal( xs.Concat(ys).Concat(zs), xs.Concat(ys.Concat(zs))); }

This automated test uses FsCheck (yes, it also works from C#!) to demonstrate that `Concat`

is associative. For simplicity's sake, the test declares `xs`

, `ys`

, and `zs`

as *arrays*. This is because FsCheck natively knows how to create arrays, whereas it doesn't have built-in support for `IEnumerable<T>`

. While you can use FsCheck's API to define how `IEnumerable<T>`

objects should be created, I didn't want to add this extra complexity to the example. The associativity property holds for other pure implementations of `IEnumerable<T>`

as well. Try it, if you need to convince yourself.

The `Concat`

operation also has identity. The identity element is the empty sequence, as this FsCheck-based test demonstrates:

[Property(QuietOnSuccess = true)] public void ConcatHasIdentity(int[] xs) { Assert.Equal( Enumerable.Empty<int>().Concat(xs), xs.Concat(Enumerable.Empty<int>())); Assert.Equal( xs, xs.Concat(Enumerable.Empty<int>())); }

Appending an empty sequence before or after another sequence doesn't change the other sequence.

Since `Concat`

is an associative binary operation with identity, it's a monoid.

### Linked lists and other collections #

The above FsCheck-based tests demonstrate that `Concat`

is a monoid for arrays. The properties hold for all pure implementations of `IEnumerable<T>`

.

In Haskell, lazily evaluated sequences are modelled as linked lists. These are lazy because all Haskell expressions are lazily evaluated by default. The monoid laws hold for Haskell lists as well:

λ> ([1,2,3] ++ [4,5,6]) ++ [7,8,9] [1,2,3,4,5,6,7,8,9] λ> [1,2,3] ++ ([4,5,6] ++ [7,8,9]) [1,2,3,4,5,6,7,8,9] λ> [] ++ [1,2,3] [1,2,3] λ> [1,2,3] ++ [] [1,2,3]

In Haskell, `++`

is the operator that corresponds to `Concat`

in C#, but the operation is normally called *append* instead of *concat*.

In F#, linked lists are eagerly evaluated, because all F# expressions are eagerly evaluated by default. Lists are still monoids, though, because the monoid laws still hold:

> ([1; 2; 3] @ [4; 5; 6]) @ [7; 8; 9];; val it : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9] > [1; 2; 3] @ ([4; 5; 6] @ [7; 8; 9]);; val it : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9] > [] @ [1; 2; 3];; val it : int list = [1; 2; 3] > [1; 2; 3] @ [];; val it : int list = [1; 2; 3]

In F#, the list concatenation operator is `@`

, instead of `++`

, but the behaviour is the same.

### Strings #

Have you ever wondered why text values are called *strings* in most programming languages? After all, for most people, a string is a long flexible structure made from fibres. What does that have to do with text?

In programming, text is often arranged in memory as a consecutive block of characters, one after the other. Thus, you could think of text as characters like pearls on a string. A program often reads such a consecutive block of memory until it reaches a terminator of some kind. Thus, strings of characters have an order to them. They are similar to sequences and lists.

In fact, in Haskell, the type `String`

is nothing but a synonym for `[Char]`

(meaning: a list of `Char`

values). Thus, anything you can do with lists of other values, you can do with `String`

values:

λ> "foo" ++ [] "foo" λ> [] ++ "foo" "foo" λ> ("foo" ++ "bar") ++ "baz" "foobarbaz" λ> "foo" ++ ("bar" ++ "baz") "foobarbaz"

Clearly, `++`

over `String`

is a monoid in Haskell.

Likewise, in .NET, `System.String`

implements `IEnumerable<char>`

, so you'd expect it to be a monoid here as well - and it *almost* is. It's certainly associative:

[Property(QuietOnSuccess = true)] public void PlusIsAssociative(string x, string y, string z) { Assert.Equal( (x + y) + z, x + (y + z)); }

In C#, the `+`

operator is actually defined for `string`

, and as the FsCheck test demonstrates, it's associative. It almost also has identity. What's the equivalent of an empty list for strings? The empty string:

[Property(QuietOnSuccess = true)] public void PlusHasIdentity(NonNull<string> x) { Assert.Equal("" + x.Get, x.Get + ""); Assert.Equal(x.Get, x.Get + ""); }

Here, I had to tell FsCheck to avoid `null`

values, because, as usual, `null`

throws a big wrench into our attempts at being able to reason about the code.

The problem here is that `"" + null`

and `null + ""`

both return `""`

, which is not equal to the input value (`null`

). In other words, `""`

is not a true identity element for `+`

, because of this single special case. (And by the way, `null`

isn't the identity element either, because `null + null`

returns... `""`

! Of course it does.) This is, however, an implementation detail. As an exercise, consider writing an (extension) method in C# that makes `string`

a proper monoid, even for `null`

values. If you can do that, you'll have demonstrated that string concatenation is a monoid in .NET, just as it is in Haskell.

### Free monoid #

Recall that in the previous article, you learned how both addition and multiplication of numbers form monoids. There's at least one more monoid for numbers, and that's a sequence. If you have a generic sequence (`IEnumerable<T>`

), it can contain anything, including numbers.

Imagine that you have two numbers, 3 and 4, and you want to combine them, but you haven't yet made up your mind about *how* you want to combine them. In order to postpone the decision, you can put both numbers in a singleton array (that is, an array with a single element, not to be confused with the Singleton design pattern):

var three = new[] { 3 }; var four = new[] { 4 };

Since sequences are monoids, you can combine them:

```
var combination = three.Concat(four);
```

This gives you a new sequence that contains both numbers. At this point, you haven't lost any information, so once you've decided how to combine the numbers, you can *evaluate* the data that you've collected so far. This is called the free monoid.

If you need the sum of the numbers, you can add them together:

```
var sum = combination.Aggregate(0, (x, y) => x + y);
```

(Yes, I'm aware that the Sum method exists, but I want you to see the details.) This Aggregate overloads takes a `seed`

value as the first argument, and a function to combine two values as the second.

Here's how to get the product:

```
var product = combination.Aggregate(1, (x, y) => x * y);
```

Notice how in both cases, the `seed`

value is the identity for the monoidal operation: 0 for addition, and 1 for multiplication. Likewise, the aggregator function uses the binary operation associated with that particular monoid.

I think it's interesting that this is called the free monoid, similar to free monads. In both cases, you collect data without initially interpreting it, and then later you can submit the collected data to one of several evaluators.

### Summary #

Various collection types, like .NET sequences, arrays, or Haskell and F# lists, are monoids over concatenation. In Haskell, strings *are* lists, so string concatenation is a monoid as well. In .NET, the `+`

operator for strings is a monoid if you pretend that `null`

strings don't exist. Still, all of these are essentially variations of the same monoid.

It makes sense that C# uses `+`

for string concatenation, because, as the previous article described, addition is the most intuitive and 'natural' of all monoids. Because you know first-grade arithmetic, you can immediately grasp the concept of addition as a metaphor. A monoid, however, is more than a metaphor; it's an abstraction that describes well-behaved binary operations, where one of those operations just happen to be addition. It's a *generalisation* of the concept. It's an abstraction that you already understand.

**Next: ** Money monoid.

## Comments

Thanks for this article series! Best regards, Manuel

Manuel, thank you for writing. The confusion is entirely caused by my sloppy writing. A monoid is an associative binary operation with identity. Since the free monoid essentially elevates each number to a singleton list, the binary operation in question is

list concatenation.The

`Aggregate`

method is a built-in BCL method that aggregates values. I'll have more to say about that in later articles, but aggregation in itself is not a monoid; it follows from monoids.I've yet to find a source that explains the etymology of the 'free' terminology, but as far as I can tell, free monoids, as well as free monads, are instances of a particular abstraction that you 'get for free', so to speak. You can

alwaysput values into singleton lists, just like you canalwayscreate a free monad from any functor. These instances are lossless in the sense that performing operations on them never erase data. For the free monoid, you just keep on concatenating more values to your list of values.This decouples the collection of data from evaluation. Data collection is lossless. Only when you want to evaluate the result must you decide on a particular type of evaluation. For integers, for example, you could choose between addition and multiplication. Once you perform the evaluation, the result is lossy.

In Haskell, the

`Data.Monoid`

module defines an`<>`

infix operator that you can use as the binary operation associated with a particular type. For lists, you can use it like this:Notice how the operation isn't lossy. This means you can defer the decision on how to evaluate it until later:

Notice how you can choose to evaluate

`xs`

to calculate the sum, or the product.I think the word

freeis used in algebraic structures to suggest that all possible interpretations are left open. This is because they are not constrained by additional specific laws which would allow to further evaluate (reduce, simplify) expressions.For example,

can be simplified to due to Monoid laws (identity) while can be reduced to due to specific arithmetic laws.Freedom from further constraints also mean that we can always devise automatically (hence free as in term algebra; its values are essentially the syntactic structures (AST) of the expressions allowed by the signature and the sole simplifications permitted are those specified by the general laws.

) an instance from a signature. This construction is calledIn the case of a Monoid, thanks to associativity (which is a Monoid law, not specific to any particular instance), if we consider complex expressions like

we can flatten their AST to a list without losing information and still without committing yet to any specific interpretation. And for atomic expressions like the single node AST becomes a singleton list.