# Types as sets by Mark Seemann

*To a certain degree, you can think of static types as descriptions of sets.*

If you've ever worked with C#, Java, F#, Haskell, or other compiled languages, you've encountered *static types* in programming. In school, you've probably encountered basic set theory. The two relate to each other in illuminating ways.

To be clear, I'm neither a mathematician nor a computer scientist, so I'm only basing this article on my layman's understanding of these topics. Still, I find some of the correspondences to be useful when thinking about certain programming topics.

### Two elements #

What's the simplest possible set? For various definitions of *simple*, probably the empty set. After that? The singleton set. We'll skip these, and instead start with a set with two elements:

If you had to represent such a set in code, how would you do it?

First, you'd have to figure out how to distinguish the two elements from each other. Giving each a label seems appropriate. What do you call them? Yin and yang? Church and state? Alice and Bob?

And then, how do you represent the labels in code, keeping in mind that they somehow must 'belong to the same set'? Perhaps as an `enum`

?

public enum Dualism { Yin, Yang }

As a data definition, though, an `enum`

is a poor choise because the underlying data type is an integer (`int`

by default). Thus, a method like this compiles and executes just fine:

public Dualism YinOrYang() { return (Dualism)42; }

The `Dualism`

returned by the function is neither `Yin`

nor `Yang`

.

So, how do you represent a set with two elements in code? One option would be to Church encode it (try it! It's a good exercise), but perhaps you find something like the following simpler:

public sealed class Dualism { public static readonly Dualism Yin = new Dualism(false); public static readonly Dualism Yang = new Dualism( true); public Dualism(bool isYang) { IsYang = isYang; } public bool IsYang { get; } public bool IsYin => !IsYang; }

With this design, a method like `YinOrYang`

can't cheat, but must return either `Yin`

or `Yang`

:

public Dualism YinOrYang() { return Dualism.Yin; }

Notice that this variation is based entirely off a single member: `IsYang`

, which is a Boolean value.

In fact, this implementation is isomorphic to `bool`

. You can create a `Dualism`

instance from a `bool`

, and you can convert that instance back to a Boolean value via `IsYang`

, without loss of information.

This holds for any two-element set: it's isomorphic to `bool`

. You could say that the data type `bool`

is equivalent to 'the' two-element set.

### More, but still few, elements #

Is there a data type that corresponds to a three-element set? Again, you can always use Church encoding to describe a data type with three cases, but in C#, the easiest backing type would probably be `bool?`

(`Nullable<bool>`

). When viewed as a set, it's a set inhabited by the three values `false`

, `true`

, and `null`

.

How about a set with four elements? A pair of Boolean values seems appropriate:

public sealed class Direction { public readonly static Direction North = new Direction(false, false); public readonly static Direction South = new Direction(false, true); public readonly static Direction East = new Direction( true, false); public readonly static Direction West = new Direction( true, true); public Direction(bool isEastOrWest, bool isLatter) { IsEastOrWest = isEastOrWest; IsLatter = isLatter; } public bool IsEastOrWest { get; } public bool IsLatter { get; } }

The `Direction`

class is backed by two Boolean values. We can say that a four-element set is isomorphic to a pair of Boolean values.

How about a five-element set? If a four-element set corresponds to a pair of Boolean values, then perhaps a pair of one Boolean value and one `Nullable<bool>`

?

Alas, that doesn't work. When combining types, the number of possible combinations is the product, not the sum, of individual types. So, a pair of `bool`

and `bool?`

would support 2 × 3 = 6 combinations: `(false, null)`

, `(false, false)`

, `(false, true)`

, `(true, null)`

, `(true, false)`

, and `(true, true)`

.

Again, a Church encoding (try it!) is an option, but you could also do something like this:

public sealed class Spice { public readonly static Spice Posh = new Spice(0); public readonly static Spice Scary = new Spice(1); public readonly static Spice Baby = new Spice(2); public readonly static Spice Sporty = new Spice(3); public readonly static Spice Ginger = new Spice(4); private readonly int id; private Spice(int id) { this.id = id; } public override bool Equals(object obj) { return obj is Spice spice && id == spice.id; } public override int GetHashCode() { return HashCode.Combine(id); } }

This seems like cheating, since it uses a much larger underlying data type (`int`

), but it still captures the essence of a set with five elements. You can't add more elements, or initialise the class with an out-of-range `id`

since the constructor is private.

The point isn't so much how to implement particular classes, but rather that if you can enumerate all possible values of a type, you can map a type to a set, and vice versa.

### More elements #

Which type corresponds to this set?

I hope that by now, it's clear that this set could correspond to infinitely many types. It's really only a matter of what we call the elements.

If we call the first element `0`

, the next one `1`

, and then `2`

, `3`

, and so on up to `255`

, the set corresponds to an unsigned byte. If we call the elements `-128`

, `-127`

, `-126`

up to `127`

, the set corresponds to a signed byte. They are, however, isomorphic.

Likewise, a set with 65,536 elements corresponds to a 16-bit integer, and so on. This also holds for 32-bit integers, 64-bit integers, and even floating point types like `float`

and `double`

. These sets are just too large to draw.

### Infinite sets #

While most languages have built-in number types based on a fixed number of bytes, some languages also come with number types that support arbitrarily large numbers. .NET has BigInteger, Haskell comes with `Integer`

, and so on. These numbers aren't truly infinite, but are limited by machine capacity rather than the data structure used to represent them in memory.

Another example of a type with arbitrary size is the ubiquitous `string`

. There's no strict upper limit to how large strings you can create, although, again, your machine will ultimately run out of memory or disk space.

Theoretically, there are infinitely many strings, so, like `BigInteger`

, `string`

corresponds to an infinite set. This also implies that `string`

and `BigInteger`

are isomorphic, but that shouldn't really be that surprising, since everything that's happening on a computer is already encoded as (binary) numbers - including strings.

Any class that contains a `string`

field is therefore also isomorphic to an (or *the?*) infinite set. Two `string`

fields also correspond to infinity, as does a `string`

field paired with a `bool`

field, and so on. As soon as you have just one 'infinite type', the corresponding set is infinite.

### Constrained types #

How about static types (classes) that use one or more built-in types as backing fields, but on top of that impose 'business rules'?

This one, for example, uses a `byte`

as a backing field, but prohibits some values:

public sealed class DesByte { private readonly byte b; public DesByte(byte b) { if (12 <= b && b <= 19) throw new ArgumentOutOfRangeException(nameof(b), "[12, 19] not allowed."); this.b = b; } }

While this class doesn't correspond to the above 256-element set, you can still enumerate all possible values:

But then what about a class like this one?

public sealed class Gadsby { private readonly string manuscript; public Gadsby(string manuscript) { if (manuscript.Contains('e', StringComparison.OrdinalIgnoreCase)) throw new ArgumentException( "The manuscript may not contain the letter 'e'.", nameof(manuscript)); this.manuscript = manuscript; } }

While the constructor prohibits any string that contains the letter *e*, you can still create infinitely many `string`

values even with that constraint.

You can, however, still conceivably enumerate all possible `Gadsby`

values, although the corresponding set would be infinitely large.

Obviously, this isn't practical, but the point isn't one of practicality. The point is that you can think of types as sets.

### Function types #

So far we've only covered 'values', even though it's trivial to create types that correspond to infinitely large sets.

In type systems, functions also have types. In Haskell, for example, the *type* `Int -> Bool`

indicates a function that takes an `Int`

as input and return a `Bool`

. This might for example be a function that checks whether the number is even.

Likewise, we can write the type of `not`

(Boolean negation) as `Bool -> Bool`

. In a set diagram, we can illustrate it like this:

Each element points to the other one: *true* points to *false*, and vice versa. Together, the two arrows completely describe the `not`

function (the `!`

operator in C#).

If we remove the original elements (*true* and *false*) of the set, it becomes clearer that these two arrows also form a set.

In general, we can think of functions as sets of arrows, but still sets. Many of these sets are infinite.

### Conclusion #

Set theory is a branch of mathematics, and so is type theory. Having no formal education in either, I don't claim that types and sets are the same. A quick web search implies that while there are many similarities between types and sets, there are also differences. In this article, I've highlighted some similarities.

Thinking about types as sets can be helpful in everyday programming. In test-driven development, for example, equivalence partitioning provides insights into which test inputs to use. Being able to consider a *system under test's* inputs as sets (rather than types) makes this easier.

As future articles will cover, it also becomes easier to think about Postel's law and the Liskov substitution principle.

## Comments

It is isomorphic to

aninfinite set. More generally, any class that contains a field of type`T`

is isomorphic to a set with cardinality at least at big as the cardinality of the inhabitants of`T`

. There are infinite sets with different cardinalities, and`string`

is isomorphic to the smallest of these, which is known as countably infinite or Aleph-nought. Any class that contains a field of type`string -> bool`

is isomorphic to a set with cardinality at least Aleph-one since that function type is isomorphic to a set with that cardinality.Tyson, thank you for writing. I wasn't kidding when I wrote that I'm not a mathematician. Given, however, that I've read and more or less understood The Annotated Turing, I should have known better. That book begins with a lucid explanation of Cantor's theorem.

Does this practically impact the substance of the present article?

Nope. I think the article is good. I think everything you said is correct. I just wanted to elaborate a bit at the one point where you conveyed some hesitation.