Non-exceptional averages

How do you code without exceptions? Here's one example.

Encouraging object-oriented programmers to avoid throwing exceptions is as fun as telling them to renounce null references. To be fair, exception-throwing is such an ingrained feature of C#, Java, C++, etcetera that it can be hard to see how to do without it.

To be clear, I don't insist that you pretend that exceptions don't exist in languages that have them. I'm also not advocating that you catch all exceptions in order to resurface them as railway-oriented programming. On the other hand, I do endorse the generally good advice that you shouldn't use exceptions for control flow.

What can you do instead? Despite all the warnings against railway-oriented programming, Either is still a good choice for a certain kind of control flow. Exceptions are for exceptional situations, such as network partitions, running out of memory, disk failures, and so on. Many run-time errors are both foreseeable and preventable. Prefer code that prevents errors.

There's a few ways you can do that. One of them is to protect invariants by enforcing pre-conditions. If you have a static type system, you can use the type system to prevent errors.

Average duration #

How would you calculate the average of a set of durations? You might, for example, need to calculate average duration of message handling for a polling consumer. C# offers many built-in overloads of the Average extension method, but none that calculates the average of TimeSpan values.

How would you write that method yourself?

It's not a trick question.

Based on my experience coaching development teams, this is a representative example:

public static TimeSpan Average(this IEnumerable<TimeSpan> timeSpans)
{
    var sum = TimeSpan.Zero;
    var count = 0;
    foreach (var ts in timeSpans)
    {
        sum += ts;
        count++;
    }
    return sum / count;
}

This gets the job done in most situations, but it has two error modes. It doesn't work if timeSpans is empty, and it doesn't work if it's infinite.

When the input collection is empty, you'll be trying to divide by zero, which isn't allowed. How do you deal with that? Most programmers I've met just shrug and say: don't call the method with an empty collection. Apparently, it's your responsibility as the caller. You have to memorise that this particular Average method has that particular precondition.

I don't think that's a professional position. This puts the burden on client developers. In a world like that, you have to learn by rote the preconditions of thousands of APIs.

What can you do? You could add a Guard Clause to the method.

Guard Clause #

Adding a Guard Clause doesn't really make the method much easier to reason about for client developers, but at least it protects an invariant.

public static TimeSpan Average(this IEnumerable<TimeSpan> timeSpans)
{
    if (!timeSpans.Any())
        throw new ArgumentOutOfRangeException(
            nameof(timeSpans),
            "Can't calculate the average of an empty collection.");
 
    var sum = TimeSpan.Zero;
    var count = 0;
    foreach (var ts in timeSpans)
    {
        sum += ts;
        count++;
    }
    return sum / count;
}

Don't get me wrong. I often write code like this because it makes it easier for me as a library developer to reason about the rest of the method body. On the other hand, it basically just replaces one run-time exception with another. Before I added the Guard Clause, calling Average with an empty collection would cause it to throw an OverflowException; now it throws an ArgumentOutOfRangeException.

From client developers' perspective, this is only a marginal improvement. You're still getting no help from the type system, but at least the run-time error is a bit more informative. Sometimes, that's the best you can do.

Finite collections #

The Average method has two preconditions, but we've only addressed one. The other precondition is that the input timeSpans must be finite. Unfortunately, this compiles:

static IEnumerable<T> InfinitelyRepeat<T>(T x)
{
    while (true) yield return x;
}
var ts = new TimeSpan(1, 2, 3, 4);
var tss = InfinitelyRepeat(ts);
 
var avg = tss.Average();

Since tss infinitely repeats ts, the Average method call (theoretically) loops forever; in fact it quickly overflows because it keeps adding TimeSpan values together.

Infinite collections aren't allowed. Can you make that precondition explicit?

I don't know of a way to test that timeSpans is finite at run time, but I can change the input type:

public static TimeSpan Average(this IReadOnlyCollection<TimeSpan> timeSpans)
{
    if (!timeSpans.Any())
        throw new ArgumentOutOfRangeException(
            nameof(timeSpans),
            "Can't calculate the average of an empty collection.");
 
    var sum = TimeSpan.Zero;
    foreach (var ts in timeSpans)
        sum += ts;
    return sum / timeSpans.Count;
}

Instead of accepting any IEnumerable<TimeSpan> as an input argument, I've now constrained timeSpans to an IReadOnlyCollection<TimeSpan>. This interface has been in .NET since .NET 4.5 (I think), but it lives a quiet existence. Few people know of it.

It's just IEnumerable<T> with an extra constraint:

public interface IReadOnlyCollection<T> : IEnumerable<T>
{
    int Count { get; }
}

The Count property strongly implies that the IEnumerable<T> is finite. Also, that the value is an int implies that the maximum size of the collection is 2,147,483,647. That's probably going to be enough for most day-to-day use.

You can no longer pass an infinite stream of values to the Average method. It's simply not going to compile. That both communicates and protects the invariant that infinite collections aren't allowed. It also makes the implementation code simpler, since the method doesn't have to count the elements. That information is already available from timeSpans.Count.

If a type can address one invariant, can it also protect the other?

Non-empty collection #

You can change the input type again. Here I've used this NotEmptyCollection<T> implementation:

public static TimeSpan Average(this NotEmptyCollection<TimeSpan> timeSpans)
{
    var sum = timeSpans.Head;
    foreach (var ts in timeSpans.Tail)
        sum += ts;
    return sum / timeSpans.Count;
}

Now client code can no longer call the Average method with an empty collection. That's also not going to compile.

You've replaced a run-time check with a compile-time check. It's now clear to client developers who want to call the method that they must supply a NotEmptyCollection<TimeSpan>, instead of just any IReadOnlyCollection<TimeSpan>.

You can also simplify the implementation code:

public static TimeSpan Average(this NotEmptyCollection<TimeSpan> timeSpans)
{
    var sum = timeSpans.Aggregate((x, y) => x + y);
    return sum / timeSpans.Count;
}

How do we know that NotEmptyCollection<T> contains at least one element? The constructor enforces that constraint:

public NotEmptyCollection(T head, params T[] tail)
{
    if (head == null)
        throw new ArgumentNullException(nameof(head));
 
    this.Head = head;
    this.Tail = tail;
}

But wait, there's a Guard Clause and a throw there! Have we even accomplished anything, or did we just move the throw around?

Parse, don't validate #

A Guard Clause is a kind of validation. It validates that input fulfils preconditions. The problem with validation is that you have to repeat it in various different places. Every time you receive some data as an input argument, it may or may not have been validated. A receiving method can't tell. There's no flag on a string, or a number, or a collection, which is set when data has been validated.

Every method that receives such an input will have to perform validation, just to be sure that the preconditions hold. This leads to validation code being duplicated over a code base. When you duplicate code, you later update it in most of the places it appears, but forget to update it in a few places. Even if you're meticulous, a colleague may not know about the proper way of validating a piece of data. This leads to bugs.

As Alexis King explains in her Parse, don’t validate article, 'parsing' is the process of validating input of weaker type into a value of a stronger type. The stronger type indicates that validation has happened. It's like a Boolean flag that indicates that, yes, the data contained in the type has been through validation, and found to hold.

This is also the case of NotEmptyCollection<T>. If you have an object of that type, you know that it has already been validated. You know that the collection isn't empty. Even if you think that it looks like we've just replaced one exception with another, that's not the point. The point is that we've replaced scattered and unsystematic validation code with a single verification step.

You may still be left with the nagging doubt that I didn't really avoid throwing an exception. I think that the NotEmptyCollection<T> constructor strikes a pragmatic balance. If you look only at the information revealed by the type (i.e. what an IDE would display), you'll see this when you program against the class:

public NotEmptyCollection(T head, params T[] tail)

While you could, technically, pass null as the head parameter, it should be clear to you that you're trying to do something you're not supposed to do: head is not an optional argument. Had it been optional, the API designer should have provided an overload that you could call without any value. Such a constructor overload isn't available here, so if you try to cheat the compiler by passing null, don't be surprised to get a run-time exception.

For what it's worth, I believe that you can only be pragmatic if you know how to be dogmatic. Is it possible to protect NotEmptyCollection<T>'s invariants without throwing exceptions?

Yes, you could do that by making the constructor private and instead afford a static factory method that returns a Maybe or Either value. In Haskell, this is typically called a smart constructor. It's only a few lines of code, so I could easily show it here. I chose not to, though, because I'm concerned that readers will interpret this article the wrong way. I like Maybe and Either a lot, but I agree with the above critics that it may not be idiomatic in object-oriented languages.

Summary #

Encapsulation is central to object-oriented design. It's the notion that it's an object's own responsibility to protect its invariants. In statically typed object-oriented programming languages, objects are instances of classes. Classes are types. Types encapsulate invariants; they carry with them guarantees.

You can sometimes model invariants by using types. Instead of performing a run-time check on input arguments, you can declare constructors and methods in such a way that they only take arguments that are already guaranteed to be valid.

That's one way to reduce the amount of exceptions that your code throws.

Comments

Tyson Williams #

Great post. I too prefer to avoid exceptions by strengthening preconditions using types.

Since tss infinitely repeats ts, the Average method call (theoretically) loops forever; in fact it quickly overflows because it keeps adding TimeSpan values together.

I am not sure what you mean here. My best guess is that you are saying that this code would execute forever except that it will overflow, which will halt the execution. However, I think the situation is ambiguous. This code is impure because, as the Checked and Unchecked documentation says, its behavior depends on whether or not the -checked compiler option is given. This dependency on the compiler option can be removed by wrapping this code in a checked or unchecked block, which would either result in a thrown exception or an infinite loop respectively.

This gets the job done in most situations, but it has two error modes. It doesn't work if timeSpans is empty, and it doesn't work if it's infinite.

There is a third error mode, and it exists in every implementation you gave. The issue of overflow is not restricted to the case of infinitely many TimeSpans. It only takes two. I know of or remember this bug as "the last binary search bug". That article shows how to correctly compute the average of two integers without overflowing. A correct implementation for computing the average of more than two integers is to map each element to a mixed fraction with the count as the divisor and then appropriately aggregate those values. The implementation given in this Quora answer seems correct to me.

I know all this is unrelated to the topic of your post, but I also know how much you prefer to use examples that avoid this kind of accidental complexity. Me too! However, I still like your example and can't think of a better one at the moment.

2020-02-05 14:13 UTC

Mark Seemann #

Tyson, thank you for writing. Given an infinite stream of values, the method throws an OverflowException. This is because TimeSpan addition explicitly does that:

> TimeSpan.MaxValue + new TimeSpan(1)
System.OverflowException: TimeSpan overflowed because the duration is too long.
  + System.TimeSpan.Add(System.TimeSpan)
  + System.TimeSpan.op_Addition(System.TimeSpan, System.TimeSpan)

This little snippet from C# Interactive also illustrates the third error mode that I hadn't considered. Good point, that.

2020-02-06 6:47 UTC

Tyson Williams #

Ah, yes. You are correct. Thanks for pointing out my mistake. Another way to verify this is inspecting TimeSpan.Add in Mircosoft's reference source. I should have done those checks before posting. Thanks again!

2020-02-06 13:33 UTC

Published: Monday, 03 February 2020 06:38:00 UTC

Non-exceptional averages by Mark Seemann