Can types replace validation?

With some examples in C#.

In a comment to my article on ASP.NET validation revisited Maurice Johnson asks:

"I was just wondering, is it possible to use the type system to do the validation instead ?

"What I mean is, for example, to make all the ReservationDto's field a type with validation in the constructor (like a class name, a class email, and so on). Normally, when the framework will build ReservationDto, it will try to construct the fields using the type constructor, and if there is an explicit error thrown during the construction, the framework will send us back the error with the provided message.

"Plus, I think types like "email", "name" and "at" are reusable. And I feel like we have more possibilities for validation with that way of doing than with the validation attributes.

"What do you think ?"

Maurice Johnson

I started writing a response below the question, but it grew and grew so I decided to turn it into a separate article. I think the question is of general interest.

The halting problem #

I'm all in favour of using the type system for encapsulation, but there are limits to what it can do. We know this because it follows from the halting problem.

I'm basing my understanding of the halting problem on my reading of The Annotated Turing. In short, given an arbitrary computer program in a Turing-complete language, there's no general algorithm that will determine whether or not the program will finish running.

A compiler that performs type-checking is a program, but typical type systems aren't Turing-complete. It's possible to write type checkers that always finish, because the 'programming language' they are running on - the type system - isn't Turing-complete.

Normal type systems (like C#'s) aren't Turing-complete. You expect the C# compiler to always arrive at a result (either compiled code or error) in finite time. As a counter-example, consider Haskell's type system. By default it, too, isn't Turing-complete, but with sufficient language extensions, you can make it Turing-complete. Here's a fun example: Typing the technical interview by Kyle Kingsbury (Aphyr). When you make the type system Turing-complete, however, termination is no longer guaranteed. A program may now compile forever or, practically, until it times out or runs out of memory. That's what happened to me when I tried to compile Kyle Kingsbury's code example.

How is this relevant?

This matters because understanding that a normal type system is not Turing-complete means that there are truths it can't express. Thus, we shouldn't be surprised if we run into rules or policies that we can't express with the type system we're given. What exactly is inexpressible depends on the type system. There are policies you can express in Haskell that are impossible to express in C#, and so on. Let's stick with C#, though. Here are some examples of rules that are practically inexpressible:

An integer must be positive.
A string must be at most 100 characters long.
A maximum value must be greater than a minimum value.
A value must be a valid email address.

Hillel Wayne provides more compelling examples in the article Making Illegal States Unrepresentable.

Encapsulation #

Depending on how many times you've been around the block, you may find the above list naive. You may, for example, say that it's possible to express that an integer is positive like this:

public struct NaturalNumber : IEquatable<NaturalNumber>
{
    private readonly int i;
 
    public NaturalNumber(int candidate)
    {
        if (candidate < 1)
            throw new ArgumentOutOfRangeException(
                nameof(candidate),
                $"The value must be a positive (non-zero) number, but was: {candidate}.");
 
        this.i = candidate;
    }
 
    // Various other members follow...

I like introducing wrapper types like this. To the inexperienced developer this may seem redundant, but using a wrapper like this has several advantages. For one, it makes preconditions explicit. Consider a constructor like this:

public Reservation(
    Guid id,
    DateTime at,
    Email email,
    Name name,
    NaturalNumber quantity)

What are the preconditions that you, as a client developer, has to fulfil before you can create a valid Reservation object? First, you must supply five arguments: id, at, email, name, and quantity. There is, however, more information than that.

Consider, as an alternative, a constructor like this:

public Reservation(
    Guid id,
    DateTime at,
    Email email,
    Name name,
    int quantity)

This constructor requires you to supply the same five arguments. There is, however, less explicit information available. If that was the only available constructor, you might be wondering: Can I pass zero as quantity? Can I pass -1?

When the only constructor available is the first of these two alternatives, you already have the answer: No, the quantity must be a natural number.

Another advantage of creating wrapper types like NaturalNumber is that you centralise run-time checks in one place. Instead of sprinkling defensive code all over the code base, you have it in one place. Any code that receives a NaturalNumber object knows that the check has already been performed.

There's a word for this: Encapsulation.

You gather a coherent set of invariants and collect it in a single type, making sure that the type always guarantees its invariants. Note that this is an important design technique in functional programming too. While you may not have to worry about state mutation preserving invariants, it's still important to guarantee that all values of a type are valid.

Predicative and constructive data #

It's debatable whether the above NaturalNumber class really uses the type system to model what constitutes valid data. Since it relies on a run-time predicate, it falls in the category of types Hillel Wayne calls predicative. Such types are easy to create and compose well, but on the other hand fail to take full advantage of the type system.

It's often worthwhile considering if a constructive design is possible and practical. In other words, is it possible to make illegal states unrepresentable (MISU)?

What's wrong with NaturalNumber? Doesn't it do that? No, it doesn't, because this compiles:

new NaturalNumber(-1)

Surely it will fail at run time, but it compiles. Thus, it's representable.

The compiler gives you feedback faster than tests. Considering MISU is worthwhile.

Can we model natural numbers in a constructive way? Yes, with Peano numbers. This is even possible in C#, but I wouldn't consider it practical. On the other hand, while it's possible to represent any natural number, there is no way to express -1 as a Peano number.

As Hillel Wayne describes, constructive data types are much harder and requires a considerable measure of creativity. Often, a constructive model can seem impossible until you get a good idea.

"a list can only be of even length. Most languages will not be able to express such a thing in a reasonable way in the data type."

Nils Eriksson

Such a requirement may look difficult until inspiration hits. Then one day you may realise that it'd be as simple as a list of pairs (two-tuples). In Haskell, it could be as simple as this:

newtype EvenList a = EvenList [(a,a)] deriving (Eq, Show)

With such a constructive data model, lists of uneven length are unrepresentable. This is a simple example of the kind of creative thinking you may need to engage in with constructive data modelling.

If you feel the need to object that Haskell isn't 'most languages', then here's the same idea expressed in C#:

public sealed class EvenCollection<T> : IEnumerable<T>
{
    private readonly IEnumerable<Tuple<T, T>> values;
 
    public EvenCollection(IEnumerable<Tuple<T, T>> values)
    {
        this.values = values;
    }
 
    public IEnumerator<T> GetEnumerator()
    {
        foreach (var x in values)
        {
            yield return x.Item1;
            yield return x.Item2;
        }
    }
 
    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

You can create such a list like this:

var list = new EvenCollection<string>(new[]
{
    Tuple.Create("foo", "bar"),
    Tuple.Create("baz", "qux")
});

On the other hand, this doesn't compile:

var list = new EvenCollection<string>(new[]
{
    Tuple.Create("foo", "bar"),
    Tuple.Create("baz", "qux", "quux")
});

Despite this digression, the point remains: Constructive data modelling may be impossible, unimagined, or impractical.

Often, in languages like C# we resort to predicative data modelling. That's also what I did in the article ASP.NET validation revisited.

Validation as functions #

That was a long rambling detour inspired by a simple question: Is it possible to use types instead of validation?

In order to address that question, it's only proper to explicitly state assumptions and definitions. What's the definition of validation?

I'm not aware of a ubiquitous definition. While I could draw from the Wikipedia article on the topic, at the time of writing it doesn't cite any sources when it sets out to define what it is. So I may as well paraphrase. It seems fair, though, to consider the stem of the word: Valid.

Validation is the process of examining input to determine whether or not it's valid. I consider this a (mostly) self-contained operation: Given the data, is it well-formed and according to specification? If you have to query a database before making a decision, you're not validating the input. In that case, you're applying a business rule. As a rule of thumb I expect validations to be pure functions.

Validation, then, seems to imply a process. Before you execute the process, you don't know if data is valid. After executing the process, you do know.

Data types, whether predicative like NaturalNumber or constructive like EvenCollection<T>, aren't processes or functions. They are results.

An arrow labelled 'validation' pointing from a document to the left labelled 'Data' to a box to the right labelled 'Type'.

Sometimes an algorithm can use a type to infer the validation function. This is common in statically typed languages, from C# over F# to Haskell (which are the languages with which I'm most familiar).

Data Transfer Object as a validation DSL #

In a way you can think of the type system as a domain-specific language (DSL) for defining validation functions. It's not perfectly suited for that task, but often good enough that many developers reach for it.

Consider the ReservationDto class from the ASP.NET validation revisited article where I eventually gave up on it:

public sealed class ReservationDto
{
    public LinkDto[]? Links { get; set; }
    public Guid? Id { get; set; }
    [Required, NotNull]
    public DateTime? At { get; set; }
    [Required, NotNull]
    public string? Email { get; set; }
    public string? Name { get; set; }
    [NaturalNumber]
    public int Quantity { get; set; }
}

It actually tries to do what Maurice Johnson suggests. Particularly, it defines At as a DateTime? value.

> var json = "{ \"At\": \"2022-10-11T19:30\", \"Email\": \"z@example.com\", \"Quantity\": 1}";
> JsonSerializer.Deserialize<ReservationDto>(json)
ReservationDto { At=[11.10.2022 19:30:00], Email="z@example.com", Id=null, Name=null, Quantity=1 }

A JSON deserializer like this one uses run-time reflection to examine the type in question and then maps the incoming data onto an instance. Many XML deserializers work the same way.

What happens if you supply malformed input?

> var json = "{ \"At\": \"foo\", \"Email\": \"z@example.com\", \"Quantity\": 1}";
> JsonSerializer.Deserialize<ReservationDto>(json)
System.Text.Json.JsonException:↩
The JSON value could not be converted to System.Nullable`1[System.DateTime].↩
Path: $.At | LineNumber: 0 | BytePositionInLine: 26.↩
[...]

(I've wrapped the result over multiple lines for readability. The ↩ symbol indicates where I've wrapped the text. I've also omitted a stack trace, indicated by [...]. I'll do that repeatedly throughout this article.)

What happens if we try to define ReservationDto.Quantity with NaturalNumber?

> var json = "{ \"At\": \"2022-10-11T19:30\", \"Email\": \"z@example.com\", \"Quantity\": 1}";
> JsonSerializer.Deserialize<ReservationDto>(json)
System.Text.Json.JsonException:↩
The JSON value could not be converted to NaturalNumber.↩
Path: $.Quantity | LineNumber: 0 | BytePositionInLine: 67.↩
[...]

While JsonSerializer is a sophisticated piece of software, it's not so sophisticated that it can automatically map 1 to a NaturalNumber value.

I'm sure that you can configure the behaviour with one or more JsonConverter objects, but this is exactly the kind of framework Whack-a-mole that I consider costly. It also suggests a wider problem.

Error handling #

What happens if input to a validation function is malformed? You may want to report the errors to the caller, and you may want to report all errors in one go. Consider the user experience if you don't: A user types in a big form and submits it. The system informs him or her that there's an error in the third field. Okay, correct the error and submit again. Now there's an error in the fifth field, and so on.

It's often better to return all errors as one collection.

The problem is that type-based validation doesn't compose well. What do I mean by that?

It's fairly clear that if you take a simple (i.e. non-complex) type like NaturalNumber, if you fail to initialize a value it's because the input is at fault:

> new NaturalNumber(-1)
System.ArgumentOutOfRangeException: The value must be a positive (non-zero) number, but was: -1.↩
(Parameter 'candidate')
  + NaturalNumber..ctor(int)

The problem is that for complex types (i.e. types made from other types), exceptions short-circuit. As soon as one exception is thrown, further data validation stops. The ASP.NET validation revisited article shows examples of that particular problem.

This happens when validation functions have no composable way to communicate errors. When throwing exceptions, you can return an exception message, but exceptions short-circuit rather than compose. The same is true for the Either monad: It short-circuits. Once you're on the failure track you stay there and no further processing takes place. Errors don't compose.

Monoidal versus applicative validation #

The naive take on validation is to answer the question: Is that data valid or invalid? Notice the binary nature of the question. It's either-or.

This is true for both predicative data and constructive data.

For constructive data, the question is: Is a candidate value representable? For example, can you represent -1 as a Peano number? The answer is either yes or no; true or false.

This is even clearer for predicative data, which is defined by a predicate. (Here's another example of a natural number specification.) A predicate is a function that returns a Boolean value: True or false.

It's possible to compose Boolean values. The composition that we need in this case is Boolean and, which is also known as the all monoid: If all values are true, the composed value is true; if just one value is false, the composed value is false.

The problem is that during composition, we lose information. While a single false value causes the entire aggregated value to be false, we don't know why. And we don't know if there was only a single false value, or if there were more than one. Boolean all short-circuits on the first false value it encounters, and stops processing subsequent predicates.

In logic, that's all you need, but in data validation you often want to know what's wrong with the data.

Fortunately, this is a solved problem. Use applicative validation, an example of which I supplied in the article An applicative reservation validation example in C#.

This changes focus on validation. No longer is validation a true/false question. Validation is a function from less-structured data to more-structured data. Parse, don't validate.

Conclusion #

Can types replace validation?

In some cases they can, but I think that the general answer is no. Granted, this answer is partially based on capabilities of current deserialisers. JsonSerializer.Deserialize short-circuits on the first error it encounters, and the same does aeson's eitherDecode.

While that's the current state of affairs, it may not have to stay like that forever. One might be able to derive an applicative parser from a desired destination type, but I haven't seen that done yet.

It sounds like a worthwhile research project.

Comments

Lloyd Atkinson #

This slightly reminds me of Zod which is described as "TypeScript-first schema validation with static type inference".

The library automatically infers a type that matches the validation - in a way it blurs this line between types and validation by making them become one.

Of course, once you have that infered type there is nothing stopping you using it without the library, but that's something code reviews could catch. It's quite interesting though.

				
					import { z } from 'zod';

					const User = z.object({
					  username: z.string(),
					  age: z.number().positive({
					    message: 'Your age must be positive!',
					  }),
					});

					User.parse({ username: 'Ludwig', age: -1 });

					// extract the inferred type
					type User = z.infer<typeof User>;
					// { username: string, age: number }

2022-08-28 00:53 UTC

Published: Monday, 22 August 2022 05:57:00 UTC

Can types replace validation? by Mark Seemann