Put cyclomatic complexity to good use

Monday, 09 December 2019 14:37:00 UTC

An actually useful software metric.

In Humane Code I argue that software development suffers from a lack of useful measurements. While I stand by that general assertion, a few code metrics can be useful. Cyclomatic complexity, while no silver bullet, can be put to good use.

Recap #

I think of cyclomatic complexity as a measure of the number of pathways through a piece of code. Even the simplest body of code affords a single pathway, so the minimum cyclomatic complexity is 1. You can easily 'calculate' the cyclomatic complexity of a method for function. You start at one, and then you count how many times if and for occurs. For each of these keywords you find, you increment the number (which started at 1).

The specifics are language-dependent. The idea is to count branching and looping instructions. In C#, for example, you'd also have to include foreach, while, do, and each case in a switch block. In other languages, the keywords to count will differ.

What's the cyclomatic complexity of this TryParse method?

public static bool TryParse(string candidateout UserNamePassworCredentialscredentials)
{
    credentials = null;
 
    var arr = candidate.Split(',');
    if (arr.Length < 2)
        return false;
 
    credentials = new UserNamePassworCredentials(arr[0], arr[1]);
    return true;
}

The cyclomatic complexity of this method is 2. You start with the number 1 and then increment it every time you find one of the branching keywords. In this case, there's only a single if, so increment 1 to 2. That's it. If you're in doubt, Visual Studio can calculate metrics for you. (It calculates other metrics as well, but I don't find those useful.)

Guide for unit testing #

I find cyclomatic complexity useful because it measures the number of pathways through a method. As such, it indicates the minimum number of test cases you ought to furnish. (Edit July 14, 2023: While this tends to work well in practice, it turns out that it's not strictly true in general. See the article Is cyclomatic complexity really related to branch coverage? and the comments for more details.) This is useful when reviewing code and tests.

Sometimes I'm presented with code that other people wrote. When I look through the production code, I consider its cyclomatic complexity. If, for example, a method has a cyclomatic complexity of 5, I'd expect to find at least five test cases to cover that method.

At other times, I start by reading the tests. The number of test cases gives me a rough indication of what degree of complexity to expect. If I see four distinct tests for the same method, I expect it to have a cyclomatic complexity about 4.

I don't demand 100% coverage. Sometimes, people don't write tests for guard clauses, and I usually accept such omissions. On the other hand, I think that proper decision logic should be covered by tests. If I were to stick unwaveringly to cyclomatic complexity, that would make my reviews more objective, but not necessarily better. I could insist on 100% code coverage, but I don't consider that a good idea.

Presented with the above TryParse method, I'd expect to see at least two unit tests, since the cyclomatic complexity is 2.

The need for more test cases #

Two unit tests aren't enough, though. You could write these two tests:

[Fact]
public void TryParseSucceeds()
{
    var couldParse = UserNamePassworCredentials.TryParse("foo,bar"out var actual);
 
    Assert.True(couldParse);
    var expected = new UserNamePassworCredentials("foo""bar");
    Assert.Equal(expectedactual);
}
 
[Fact]
public void TryParseFails()
{
    var couldParse = UserNamePassworCredentials.TryParse("foo"out var actual);
 
    Assert.False(couldParse);
    Assert.Null(actual);
}

Using the Devil's advocate technique, however, this implementation of TryParse passes both tests:

public static bool TryParse(string candidateout UserNamePassworCredentialscredentials)
{
    credentials = null;
    if (candidate != "foo,bar")
        return false;
 
    credentials = new UserNamePassworCredentials("foo""bar");
    return true;
}

This is clearly not the correct implementation, but it has 100% code coverage. It also still has cyclomatic complexity of 2. The metric suggests a minimum number of tests - not a sufficient number.

More test cases #

It often makes sense to cover each branch with a single parametrised test:

[Theory]
[InlineData("foo,bar""foo""bar")]
[InlineData("baz,qux""baz""qux")]
[InlineData("ploeh,fnaah""ploeh""fnaah")]
[InlineData("foo,bar,baz""foo""bar")]
public void TryParseSucceeds(string candidatestring userNamestring password)
{
    var couldParse = UserNamePassworCredentials.TryParse(candidateout var actual);
 
    Assert.True(couldParse);
    var expected = new UserNamePassworCredentials(userNamepassword);
    Assert.Equal(expectedactual);
}
 
[Theory]
[InlineData("")]
[InlineData("foobar")]
[InlineData("foo;bar")]
[InlineData("foo")]
public void TryParseFails(string candidate)
{
    var couldParse = UserNamePassworCredentials.TryParse(candidateout var actual);
 
    Assert.False(couldParse);
    Assert.Null(actual);
}

Is a total of eight test cases the correct number? Cyclomatic complexity can't help you here. You'll have to rely on other heuristics, such as test-driven development, the transformation priority premise, and the Devil's Advocate.

Humane Code #

I also find cyclomatic complexity useful for another reason. I keep an eye on complexity because I care about code maintainability. In my Humane Code video, I discuss the magic number seven, plus or minus two.

When you read code, you essentially run a little emulator in your brain. You have to maintain state in order to interpret the code you look at. Will this conditional evaluate to true or false? Is the code going to exit that loop now? Is that array index out of bounds? You can only follow the code by keeping track of variables' contents, and your brain can keep track of approximately seven things.

Cyclomatic complexity is a measure of pathways - not how many things you need to keep track of. Still, in my experience, there seems to be a useful correlation. Code with high cyclomatic complexity tends to have many moving parts. There's too much to keep track of. With low cyclomatic complexity, on the other hand, the code involves few moving parts.

I use cyclomatic complexity 7 as an approximate maximum for that reason. It's only a rule of thumb, since I'm painfully aware that I'm transplanting experimental psychology to a context where no conclusions can be scientifically drawn. But like the 80/24 rule I find that it works well in practice.

Complexity of a method call #

Consider the above parametrised tests. Some of the test cases provide enough triangulation to defeat the Devil's attempt at hard-coding return values. This explains test values like "foo,bar", "baz,qux", and "ploeh,fnaah", but why did I include the "foo,bar,baz" test case? And why did I include the empty string as one of the test cases for TryParseFails?

When I write tests, I aspire to compose tests that verify the behaviour rather than the implementation of the System Under Test. The desired behaviour, I decided, is that any extra entries in the comma-separated input should be ignored. Likewise, if there's fewer than two entries, parsing should fail. There must be both a user name and a password.

Fortunately, this happens to be how Split already works. If you consider all the behaviour that Split exhibits, it encapsulates moderate complexity. It can split on multiple alternative delimiters, it can throw away empty entries, and so on. What would happen if you inline some of that functionality?

public static bool TryParse(string candidateout UserNamePassworCredentialscredentials)
{
    credentials = null;
 
    var l = new List<string>();
    var element = "";
    foreach (var c in candidate)
    {
        if (c == ',')
        {
            l.Add(element);
            element = "";
        }
        else
            element += c;
 
    }
    l.Add(element);
 
    if (l.Count < 2)
        return false;
 
    credentials = new UserNamePassworCredentials(l[0], l[1]);
    return true;
}

This isn't as sophisticated as the Split method it replaces, but it passes all eight test cases. Why did I do this? To illustrate the following point.

What's the cyclomatic complexity now?

Keep in mind that the externally observable behaviour (as defined by eight test cases) hasn't changed. The cyclomatic complexity, however, has. It's now 4 - double the previous metric.

A method call (like a call to Split) can hide significant cyclomatic complexity. That's a desirable situation. This is the benefit that encapsulation offers: that you don't have to worry about implementation details as long as both caller and callee fulfils the contract.

When you calculate cyclomatic complexity, a method call doesn't increment the complexity, regardless of the degree of complexity that it encapsulates.

Summary #

Cyclomatic complexity is one of the rare programming metrics that I find useful. It measures the number of pathways through a body of code.

You can use it to guide your testing efforts. The number is the minimum number of tests you must write in order to cover all branches. You'll likely need more test cases than that.

You can also use the number as a threshold. I suggest that 7 ought to be the maximum cyclomatic complexity of a method or function. You're welcome to pick another number, but keeping an eye on cyclomatic complexity is useful. It tells you when it's time to refactor a complex method.

Cyclomatic complexity considers only the code that directly implements a method or function. That code can call other code, but what happens behind a method call doesn't impact the metric.


Comments

Ghillie Dhu #

Do you know of a tool to calculate cyclomatic complexity for F#? It appears that the Visual Studio feature doesn't support it.

2019-12-09 19:20 UTC

Ghillie, thank you for writing. I'm not aware of any such tool.

FWIW, it's not difficult to manually calculate cyclometric complexity for F#, but clearly that doesn't help if you'd like to automate the process.

It might be a fine project for anyone looking for a way to contribute to the F# ecosystem.

2019-12-09 20:06 UTC
Carlos Schults #

Hi, Mark. Thanks for your article. I'd commenting because I'd like to learn more about your thoughts on mutation testing. I ask this because I know you're not the biggest fan of code coverage as a useful metric. I'm not either, or at least I wasnt, until I learned about mutation testing.

My current view is that code coverage is only (mostly) meaningless if you don't have a way of measuring the quality of the tests. Since mutation testing's goal is exactly that (to test the tests, if you will) my opinion is that, if you use a mutation testing tool, then code coverage become really useful and you should try to get to 100%. I've even written a post about this subject.

So, in short: what are your thoughts on mutation testing and how it affects the meaning of code coverage, if at all? Looking forward to read your answer. A whole post on this would be even better!

Thanks!

2019-12-14 12:32 UTC

Carlos, thank you for writing. I'm sympathetic to the idea of mutation testing, but apart from that, I have no opinion of it. I don't think that I ought to have an opinion about something with which I've no experience.

I first heard about mutation testing decades ago, but I've never come across a mutation testing tool for C# (or F#, for that matter). Can you recommend one?

2019-12-14 13:51 UTC
Carlos Schults #

Unfortunately, tooling is indeed one of the main Achilles heels of mutation testing, at least when it comes to .NET.

In the Java world, they have PIT, which is considered state of the art. For C#, I have tried a few tools, with no success. The most promising solution I've found so far, for C#, is Stryker.net, which is a port of the Stryker mutation, designed originally for JavaScript. The C# version is still in its early phases but it's already usable and it looks very promising.

2019-12-14 16:16 UTC

Is mutation testing the automated version of what Mark has called the Devil's Advocate technique?

2019-12-15 02:26 UTC

Tyson, I actually discuss the relationship with mutation testing in that article.

2019-12-15 9:28 UTC

Refactoring registration flow to functional architecture

Monday, 02 December 2019 08:19:00 UTC

An example showing a refactoring from F# partial application 'dependency injection' to an impure/pure/impure sandwich.

In a comment to Dependency rejection, I wrote:

"I'd welcome a simplified, but still concrete example where the impure/pure/impure sandwich described here isn't going to be possible."
Christer van der Meeren kindly replied with a suggestion.

The code in question relates to validationverification of user accounts. You can read the complete description in the linked comment, but I'll try to summarise it here. I'll then show a refactoring to a functional architecture - specifically, to an impure/pure/impure sandwich.

The code is available on GitHub.

Registration flow #

The system in question uses two-factor authentication with mobile phones. When you sign up for the service, you give your phone number. You then receive an SMS, and must use whatever is in that SMS to prove ownership of the phone number. Christer van der Meeren illustrates the flow like this:

A flowchart describing the workflow for completing a registration.

He also supplies sample code:

let completeRegistrationWorkflow
    (createProof: Mobile -> Async<ProofId>)
    (verifyProof: Mobile -> ProofId -> Async<bool>)
    (completeRegistration: Registration -> Async<unit>)
    (proofId: ProofId option)
    (registration: Registration)
    : Async<CompleteRegistrationResult> =
    async {
        match proofId with
        | None ->
            let! proofId = createProof registration.Mobile
            return ProofRequired proofId
        | Some proofId ->
            let! isValid = verifyProof registration.Mobile proofId
            if isValid then
                do! completeRegistration registration
                return RegistrationCompleted
            else
                let! proofId = createProof registration.Mobile
                return ProofRequired proofId
    }

While this is F#, it's not functional, since it uses partial application for dependency injection. From the description, I find it safe to assume that we can consider Async as a surrogate for IO.

The code implies the existence of other types. I decided to define them like this:

type Mobile = Mobile of int
type ProofId = ProofId of Guid
type Registration = { Mobile : Mobile }
type CompleteRegistrationResult = ProofRequired of ProofId | RegistrationCompleted

In reality, they're probably more complicated, but this is enough to make the code compile.

Is it possible to refactor completeRegistrationWorkflow to an impure/pure/impure sandwich?

Applicability #

It is possible to refactor completeRegistrationWorkflow to an impure/pure/impure sandwich. You'll see how to do that soon. Before we start that work, however, I'd like to warn against jumping to conclusions. It's possible that the problem statement doesn't capture some subtleties that one would have to deal with in the real world. It's also possible that I've misunderstood the essence of Christer van der Meeren's problem description.

It's (relatively) easy to teach the basics of programming. You teach a beginner about keywords, programming constructs, how to compile or interpret a program, and so on.

On the other hand, it's hard to write about dealing with complicated code. There are ways to make legacy code better, but the moves you have to make depend on myriad details. Complicated code is, by definition, something that's hard to learn. This means that truly complicated legacy code is rarely suitable for instructive examples. One has to strike a delicate balance and produce an example that looks complicated enough to warrant improvement, but on the other hand still be simple enough to be understood.

I think that Christer van der Meeren has struck that balance. With three dependencies, the sample code looks just complicated enough to warrant refactoring. On the other hand, you can understand what it's supposed to do in a few minutes. There's a risk, however, that the example is too simplified. That could weaken the result of the refactoring that follows. Could you still apply that refactoring if the problem was more complicated?

It's my experience that it's conspicuously often possible to implement an impure/pure/impure sandwich.

Fakes #

In the rest of this article, I want to show how to refactor completeRegistrationWorkflow to an impure/pure/impure sandwich. As Refactoring admonishes:

"to refactor, the essential precondition is [...] solid tests"

Martin Fowler
Right now, however, there's no tests, so I'm going to add some.

The tests will need some Test Doubles to stand in for the three dependency functions. If possible, I prefer state-based testing over interaction-based testing. First, then, we need some Fakes.

While completeRegistrationWorkflow takes three dependency functions, it looks as though there's only two architectural dependencies:

  • A two-factor authentication service
  • A registration database (or service)
Defining a Fake two-factor authentication object is the most complicated of the two, but still manageable:

type Fake2FA () =
    let mutable proofs = Map.empty
 
    member _.CreateProof mobile =
        match Map.tryFind mobile proofs with
        | Some (proofId, _) -> proofId
        | None ->
            let proofId = ProofId (Guid.NewGuid ())
            proofs <- Map.add mobile (proofId, false) proofs
            proofId
        |> fun proofId -> async { return proofId }
 
    member _.VerifyProof mobile proofId =
        match Map.tryFind mobile proofs with
        | Some (_, true-> true
        | _ -> false
        |> fun b -> async { return b }
 
    member _.VerifyMobile mobile =
        match Map.tryFind mobile proofs with
        | Some (proofId, _) ->
            proofs <- Map.add mobile (proofId, true) proofs
        | _ -> ()

In F#, I find that the easiest way to model a mutable resource is to use an object. This one just keeps track of a collection of proofs. The CreateProof method fits the function signature of completeRegistrationWorkflow's createProof function argument. It looks for an existing proof for the mobile number so that it can reuse the same proof multiple times. If there's no proof for mobile, it creates a new Guid and returns it after having first added it to the collection.

Likewise, the VerifyProof method fits the type of the verifyProof function argument. Proofs are actually tuples of IDs and a flag that keeps track of whether or not they've been verified. The method returns the flag if it's there, and false otherwise.

The third VerifyMobile method is a test-specific functionality that enables a test to mark a proof as having been verified via two-factor authentication.

Compared to Fake2FA, the Fake registration database is simple:

type FakeRegistrationDB () =
    inherit Collection<Registration> ()
 
    member this.CompleteRegistration r = async { this.Add r }

Again, the CompleteRegistration method fits the completeRegistration function argument to completeRegistrationWorkflow. It just makes the inherited Add method Async.

Fixture creation #

My plan is to add Characterisation Tests so that I can refactor. I do, however, plan to change the API of the System Under Test (SUT). This could break the tests, which would defy their purpose. To protect against this, I'll test against a Facade. Initially, this Facade will be equivalent to the completeRegistrationWorkflow function, but this will change as I refactor.

In addition to the SUT Facade, the tests will also need access to the 'injected' dependencies. You can address this by creating a Fixture Object:

let createFixture () =
    let twoFA = Fake2FA ()
    let db = FakeRegistrationDB ()
    let sut =
        completeRegistrationWorkflow
            twoFA.CreateProof
            twoFA.VerifyProof
            db.CompleteRegistration
    sut, twoFA, db

This function return a triple of values: the SUT Facade and the two Fakes.

The SUT Facade is a partially applied function of the type ProofId option -> Registration -> Async<CompleteRegistrationResult>. In other words, it abstracts away the specifics about how impure actions are executed. It seems reasonable to imagine that the two remaining input arguments, ProofId option and Registration, are run-time values. Regardless of refactoring, the resulting function should be able to receive those arguments and produce the desired outcome.

Characterising the missing proof ID case #

It looks like the cyclomatic complexity of completeRegistrationWorkflow is 3, so you're going to need three Characterisation Tests. You can add them in any order you like, but in this case I found it natural to follow the order in which the branches are laid out in the SUT.

This test case verifies what happens if the proof ID is missing:

[<Theory>]
[<InlineData 123>]
[<InlineData 432>]
let ``Missing proof ID`` mobile = async {
    let sut, twoFA, db = createFixture ()
    let r = { Mobile = Mobile mobile }
 
    let! actual = sut None r
 
    let! expectedProofId = twoFA.CreateProof r.Mobile
    let expected = ProofRequired expectedProofId
    expected =! actual
    test <@ Seq.isEmpty db @> }

All the tests in this article use xUnit.net 2.4.0 with Unquote 5.0.0.

This test calls the sut Facade with a None proof ID and an arbitrary Registration r. Had I used a property-based testing framework such as FsCheck or Hedgehog, I could have made the Registration value itself an arbitrary test argument, but I thought that this was overkill for this situation.

In order to figure out the expectedProofId, the test relies on the behaviour of the Fake2FA class. The CreateProof method is idempotent, so calling it several times with the same number should return the same proof. In this test case, we expect the sut to have done so already, so calling the method once more from the test should return the same value that the SUT received. The test then wraps the proof ID in the ProofRequired case and uses Unquote's =! (must equal) operator to verify that expected is equal to actual.

Finally, the test also verifies that the reservations database remains empty.

Since this is a Characterisation Test it already passes, which makes it untrustworthy. How do I know that I didn't write a Tautological Assertion?

When I write Characterisation Tests, I always try to change the SUT to verify that the test fails for the appropriate reason. In order to fail the first assertion, I can make this change to the None branch of the SUT:

match proofId with
| None ->
    //let! proofId = createProof registration.Mobile
    let proofId = ProofId (Guid.NewGuid ())
    return ProofRequired proofId

This fails the expected =! actual assertion, as expected.

Likewise, you can fail the second assertion with this change:

match proofId with
| None ->
    do! completeRegistration registration
    let! proofId = createProof registration.Mobile
    return ProofRequired proofId

The addition of the completeRegistration statement causes the test <@ Seq.isEmpty db @> assertion to fail, again as expected.

Now I trust that test.

Characterising the valid proof ID case #

Next, you have the case where all is good. The proof ID is present and valid. You can characterise the behaviour with this test:

[<Theory>]
[<InlineData 987>]
[<InlineData 247>]
let ``Valid proof ID`` mobile = async {
    let sut, twoFA, db = createFixture ()
    let r = { Mobile = Mobile mobile }
    let! p = twoFA.CreateProof r.Mobile
    twoFA.VerifyMobile r.Mobile
 
    let! actual = sut (Some p) r
 
    RegistrationCompleted =! actual
    test <@ Seq.contains r db @> }

This test uses CreateProof to create a proof before the sut is exercised. It also uses the test-specific VerifyMobile method to mark the mobile number (and thereby the proof) as valid.

Again, there's two assertions: one against the return value actual, and one that verifies that the registration database db now contains the registration r.

As before, you can't trust a Characterisation Test before you've seen it fail, so first edit the isValid branch of the SUT like this:

if isValid then
    do! completeRegistration registration
    //return RegistrationCompleted
    return ProofRequired proofId

This fails the RegistrationCompleted =! actual assertion, as expected.

Now make this change:

if isValid then
    //do! completeRegistration registration
    return RegistrationCompleted

Now the test <@ Seq.contains r db @> assertion fails, as expected.

This test also seems trustworthy.

Characterising the invalid proof ID case #

The final test case is when a proof ID exists, but it's invalid:

[<Theory>]
[<InlineData 327>]
[<InlineData 666>]
let ``Invalid proof ID`` mobile = async {
    let sut, twoFA, db = createFixture ()
    let r = { Mobile = Mobile mobile }
    let! p = twoFA.CreateProof r.Mobile
 
    let! actual = sut (Some p) r
 
    let! expectedProofId = twoFA.CreateProof r.Mobile
    let expected = ProofRequired expectedProofId
    expected =! actual
    test <@ Seq.isEmpty db @> }

The arrange phase of the test is comparable to the previous test case. The only difference is that the new test doesn't invoke twoFA.VerifyMobile r.Mobile. This leaves the generated proof ID p invalid.

The assertions, on the other hand, are identical to those of the Missing proof ID test case, which means that you can make the same edits to the else branch as you can to the None branch, as described above. If you do that, the assertions fail as they're supposed to. You can also trust this Characterisation Test.

Eta expansion #

While I want to keep the SUT Facade's type unchanged, I do want change the way I compose it. The goal is an impure/pure/impure sandwich: Do something impure first, then call a pure function with the data obtained, and finally do something impure with the output of the pure function.

This means that the composition is going to manipulate the input values to the SUT Facade. To make that easier, I perform an eta conversion on the sut:

let createFixture () =
    let twoFA = Fake2FA ()
    let db = FakeRegistrationDB ()
    let sut pid r =
        completeRegistrationWorkflow
            twoFA.CreateProof
            twoFA.VerifyProof
            db.CompleteRegistration
            pid
            r
    sut, twoFA, db

This doesn't change the behaviour or how the SUT is composed. It only makes the pid and r arguments explicitly visible.

Move proof verification #

When you consider the current implementation of completeRegistrationWorkflow, it seems that the impure actions are interleaved with the decision-making code. How to separate them?

The first opportunity that I identified was that it always calls verifyProof in the Some case. Whenever you want to call a method only in the Some case, but not in the None case, it suggest Option.map.

It should be possible to run Option.map (twoFA.VerifyProof r.Mobile) pid as the initial impure action of the impure/pure/impure sandwich. If that's possible, we could pass the output of that pure function as an argument to completeRegistrationWorkflow. That would already make it simpler:

let completeRegistrationWorkflow
    (createProof: Mobile -> Async<ProofId>)
    (completeRegistration: Registration -> Async<unit>)
    (proof: bool option)
    (registration: Registration)
    : Async<CompleteRegistrationResult> =
    async {
        match proof with
        | None ->
            let! proofId = createProof registration.Mobile
            return ProofRequired proofId
        | Some isValid ->
            if isValid then
                do! completeRegistration registration
                return RegistrationCompleted
            else
                let! proofId = createProof registration.Mobile
                return ProofRequired proofId
    }

Notice that by changing the proof argument to a bool option, you no longer need to call verifyProof, so you can remove it.

There's just one problem. The result of Option.map (twoFA.VerifyProof r.Mobile) pid is an Option<Async<bool>>, but you need an Option<bool>.

You can compose the SUT Facade in an asynchronous workflow, and use a let! binding, but that's not going to solve the problem. A let! binding only works when the outer container is Async. Here, the outermost container is Option. You're going to need to flip the containers around so that you get an Async<Option<bool>> that you can let!-bind:

let sut pid r = async {
    let! p =
        match Option.map (twoFA.VerifyProof r.Mobile) pid with
        | Some b -> async {
            let! b' = b
            return Some b' }
        | None -> async { return None }
    return! completeRegistrationWorkflow
        twoFA.CreateProof
        db.CompleteRegistration
        p
        r
    }

By pattern-matching on Option.map (twoFA.VerifyProof r.Mobile) pid, you can return one of two alternative asynchronous workflows.

Due to the let! binding, p is a bool option that you can pass to completeRegistrationWorkflow.

Traversal #

I know what you're going to say. You'll protest that I just moved complex behaviour out of completeRegistrationWorkflow. The implied assumption here is that completeRegistrationWorkflow is the top-level behaviour that you'd compose in a Composition Root. The createFixture function plays that role in this refactoring exercise.

You'd normally view the Composition Root as a Humble Object - an object that we accept isn't covered by tests because it has a cyclomatic complexity of one. This is no longer the case.

The conversion of Option<Async<bool>> to Async<Option<bool>> is, however, a well-known operation. In Haskell this is known as a traversal, and it's a completely generic operation:

// ('a -> Async<'b>) -> 'a option -> Async<'b option>
let traverse f = function
    | Some x -> async {
        let! x' = f x
        return Some x' }
    | None -> async { return None }

You can put this function in a general-purpose module called AsyncOption and cover it by unit tests if you will. You can even put this module in a separate library; it's perfectly decoupled from the the specifics of the registration flow domain.

If you do that, completeRegistrationWorkflow doesn't change, but the composition does:

let sut pid r = async {
    let! p = AsyncOption.traverse (twoFA.VerifyProof r.Mobile) pid
    return! completeRegistrationWorkflow
        twoFA.CreateProof
        db.CompleteRegistration
        p
        r
    }

You're now back where you'd like to be: One impure action produces a value that you can pass to another function. There's no explicit branching in the code. The cyclomatic complexity remains one.

Change return type #

That first refactoring takes care of one out of three impure dependencies. Next, you can get rid of createProof. This one seems to be more difficult to get rid of. It doesn't seem to be required only in the Some case, so a map or traverse can't work. In both cases, however, the result of calling createProof is handled in exactly the same way.

Here's another common trick in functional programming: Decouple decisions from effects. Return a value that indicates the decision that the function reaches, and then let the second impure action of the impure/pure/impure sandwich act on the decision.

In this case, you can model your decision as a Mobile option. You might want to consider a more explicit type, in order to better communicate intent, but it's best to keep each refactoring step small:

let completeRegistrationWorkflow
    (completeRegistration: Registration -> Async<unit>)
    (proof: bool option)
    (registration: Registration)
    : Async<Mobile option> =
    async {
        match proof with
        | None -> return Some registration.Mobile
        | Some isValid ->
            if isValid then
                do! completeRegistration registration
                return None
            else
                return Some registration.Mobile
    }

Notice that the createProof dependency is no longer required. I've removed it from the argument list of completeRegistrationWorkflow.

The composition now looks like this:

let createFixture () =
    let twoFA = Fake2FA ()
    let db = FakeRegistrationDB ()
    let sut pid r = async {
        let! p = AsyncOption.traverse (twoFA.VerifyProof r.Mobile) pid
        let! res = completeRegistrationWorkflow db.CompleteRegistration p r
        let! pidr = AsyncOption.traverse twoFA.CreateProof res
        return pidr
            |> Option.map ProofRequired
            |> Option.defaultValue RegistrationCompleted }

Thanks to the let! binding, the result res is a Mobile option. You can now let the twoFA.CreateProof method traverse over res. This produces an Async<Option<ProofId>> that you can let!-bind to pidr - a ProofId option.

You can use Option.map to wrap the ProofId value in a ProofRequired case, if it's there. This step of the final pipeline produces a CompleteRegistrationResult option.

Finally, you can use Option.defaultValue to fold the option into a CompleteRegistrationResult. The default value is RegistrationCompleted. This is the case value that'll be used if the option is None.

Again, the composition has a cyclomatic complexity of one, and the type of the sut remains ProofId option -> Registration -> Async<CompleteRegistrationResult>. This is a true refactoring. The type of the SUT remains the same, and no behaviour changes. The tests still pass, even though I haven't had to edit them.

Change return type to Result #

Consider the intent of completeRegistrationWorkflow. The purpose of the operation is to complete a registration workflow. The name is quite explicit. Thus, the happy path is when the proof ID is valid and the function can call completeRegistration.

Usually, when you call a function that returns an option, the implied contract is that the Some case represents the happy path. That's not the case here. The Some case carries information about the error paths. This isn't idiomatic.

It'd be more appropriate to use a Result return value:

let completeRegistrationWorkflow
    (completeRegistration: Registration -> Async<unit>)
    (proof: bool option)
    (registration: Registration)
    : Async<Result<unit, Mobile>> =
    async {
        match proof with
        | None -> return Error registration.Mobile
        | Some isValid ->
            if isValid then
                do! completeRegistration registration
                return Ok ()
            else
                return Error registration.Mobile
    }

This change is in itself small, but it does require some changes to the composition. Just as you had to add an Option.traverse function when the return type was an option, you'll now have to add similar functionality to Result. Result is also known as Either. Not only is it a bifunctor, you can also traverse both axes. Haskell calls this a bitraversable functor.

// ('a -> Async<'b>) -> ('c -> Async<'d>) -> Result<'a,'c> -> Async<Result<'b,'d>>
let traverseBoth f g = function
    | Ok x -> async {
        let! x' = f x
        return Ok x' }
    | Error e -> async {
        let! e' = g e
        return Error e' }

Here I just decided to call the function traverseBoth and the module AsyncResult.

You're also going to need the equivalent of Option.defaultValue for Result. Something that translates both dimensions of Result into the same type. That's the Either catamorphism, so you could, for example, introduce another general-purpose function called cata:

// ('a -> 'b) -> ('c -> 'b) -> Result<'a,'c> -> 'b
let cata f g = function
    | Ok x -> f x
    | Error e -> g e

This is another entirely general-purpose function that you can put in a general-purpose module called Result, in a general-purpose library. You can also cover it by unit tests, if you like.

These two general-purpose functions enable you to compose the workflow:

let createFixture () =
    let twoFA = Fake2FA ()
    let db = FakeRegistrationDB ()
    let sut pid r = async {
        let! p = AsyncOption.traverse (twoFA.VerifyProof r.Mobile) pid
        let! res = completeRegistrationWorkflow db.CompleteRegistration p r
        let! pidr =
            AsyncResult.traverseBoth
                (fun () -> async { return () })
                twoFA.CreateProof
                res
        return pidr
            |> Result.cata (fun () -> RegistrationCompleted) ProofRequired }
    sut, twoFA, db

This looks more confused than previous iterations. From here, though, it'll get better again. The first two lines of code are the same as before, but now res is a Result<unit, Mobile>. You still need to let twoFA.CreateProof traverse the 'error path', but now you also need to take care of the happy path.

In the Ok case you have a unit value (()), but traverseBoth expects its f and g functions to return Async values. I could have fixed that with a more specialised traverseError function, but we'll soon move on from here, so it's hardly worthwhile.

In Haskell, you can 'elevate' a value simply with the pure function, but in F#, you need the more cumbersome (fun () -> async { return () }) to achieve the same effect.

The traversal produces pidr (for Proof ID Result) - a Result<unit, ProofId> value.

Finally, it uses Result.cata to turn both the Ok and Error dimensions into a single CompleteRegistrationResult that can be returned.

Removing the last dependency #

There's still one dependency left: the completeRegistration function, but it's now trivial to remove. Instead of calling the dependency function from within completeRegistrationWorkflow you can use the same trick as before. Decouple the decision from the effect.

Return information about the decision the function made. In the above incarnation of the code, the Ok dimension is currently empty, since it only returns unit. You can use that 'channel' to communicate that you decided to complete a registration:

let completeRegistrationWorkflow
    (proof: bool option)
    (registration: Registration)
    : Async<Result<Registration, Mobile>> =
    async {
        match proof with
        | None -> return Error registration.Mobile
        | Some isValid ->
            if isValid then
                return Ok registration
            else
                return Error registration.Mobile
    }

This is another small change. When isValid is true, the function no longer calls completeRegistration. Instead, it returns Ok registration. This means that the return type is now Async<Result<Registration, Mobile>>. It also means that you can remove the completeRegistration function argument.

In order to compose this variation, you need one new general-purpose function. Perhaps you find this barrage of general-purpose functions exhausting, but it's an artefact of a design philosophy of the F# language. The F# base library contains only few general-purpose functions. Contrast this with GHC's base library, which comes with all of these functions built in.

The new function is like Result.cata, but over Async<Result<_>>.

// ('a -> 'b) -> ('c -> 'b) -> Async<Result<'a,'c>> -> Async<'b>
let cata f g r = async {
    let! r' = r
    return Result.cata f g r' }

Since this function does conceptually the same as Result.cata I decided to retain the name cata and just put it in the AsyncResult module. (This may not be strictly correct, as I haven't really given a lot of thought to what a catamorphism for Async would look like, if one exists. I'm open to suggestions about better naming. After all, cata is hardly an idiomatic F# name.)

With AsyncResult.cata you can now compose the system:

let sut pid r = async {
    let! p = AsyncOption.traverse (twoFA.VerifyProof r.Mobile) pid
    let! res = completeRegistrationWorkflow p r
    return!
        res
        |> AsyncResult.traverseBoth db.CompleteRegistration twoFA.CreateProof
        |> AsyncResult.cata (fun () -> RegistrationCompleted) ProofRequired
    }

Not only did the call to completeRegistrationWorkflow get even simpler, but you also now avoid the awkwardly named pidr value. Thanks to the let! binding, res has the type Result<Registration, Mobile>.

Note that you can now let both impure actions (db.CompleteRegistration and twoFA.CreateProof) traverse the result. This step produces an Async<Result<unit, ProofId>> that's immediately piped to AsyncResult.cata. This reduces the two alternative dimensions of the Result to a single Async<CompleteRegistrationResult> value.

The completeRegistrationWorkflow function now begs to be further simplified.

Pure registration workflow #

Once you remove all dependencies, your domain logic doesn't have to be asynchronous. Nothing asynchronous happens in completeRegistrationWorkflow, so simplify it:

let completeRegistrationWorkflow
    (proof: bool option)
    (registration: Registration)
    : Result<Registration, Mobile> =
    match proof with
    | None -> Error registration.Mobile
    | Some isValid ->
        if isValid then Ok registration
        else Error registration.Mobile

Gone is the async computation expression, including the return keyword. This is now a pure function.

You'll have to adjust the composition once more, but it's only a minor change:

let sut pid r = async {
    let! p = AsyncOption.traverse (twoFA.VerifyProof r.Mobile) pid
    return!
        completeRegistrationWorkflow p r
        |> AsyncResult.traverseBoth db.CompleteRegistration twoFA.CreateProof
        |> AsyncResult.cata (fun () -> RegistrationCompleted) ProofRequired
    }

The result of invoking completeRegistrationWorkflow is no longer an Async value, so there's no reason to let!-bind it. Instead, you can call it and immediately pipe its output to AsyncResult.traverseBoth.

DRY #

Consider completeRegistrationWorkflow. Can you make it simpler?

At this point it should be evident that two of the branches contain duplicate code. Applying the DRY principle you can simplify it:

let completeRegistrationWorkflow
    (proof: bool option)
    (registration: Registration)
    : Result<Registration, Mobile> =
    match proof with
    | Some true -> Ok registration
    | _ -> Error registration.Mobile

I'm not too fond of this style of type annotation for simple functions like this, so I'd like to remove it:

let completeRegistrationWorkflow proof registration =
    match proof with
    | Some true -> Ok registration
    | _ -> Error registration.Mobile

These two steps are pure refactorings: they only reorganise the code that implements completeRegistrationWorkflow, so the composition doesn't change.

Essential complexity #

While reading this article, you may have felt frustration gather. This is cheating! You took out all of the complexity. Now there's nothing left! You're likely to feel that I've moved a lot of behaviour into untestable code. I've done nothing of the sort.

I'll remind you that while functions like AsyncOption.traverse and AsyncResult.cata do contain branching behaviour, they can be tested. In fact, since they're pure functions, they're intrinsically testable.

It's true that a composition of a pure function with its impure dependencies may not be (unit) testable, but that's also true for a Dependency Injection-based object graph composed in a Composition Root.

Compositions of functions may look non-trivial, but to a degree, the type system will assist you. If your composition compiles, it's likely that you've composed the impure/pure/impure sandwich correctly.

Did I take out all the complexity? I didn't. There's a bit left; the function now has a cyclomatic complexity of two. If you look at the original function, you'll see that the duplication was there all along. Once you remove all the accidental complexity, you uncover the essential complexity. This happens to me so often when I apply functional programming principles that I fancy that functional programming is a silver bullet.

Pipeline composition #

We're mostly done now. The problem now appears in all its simplicity, and you have an impure/pure/impure sandwich.

You can still improve the code, though.

If you consider the current composition, you may find that p isn't the best variable name. I admit that I struggled with naming that variable. Sometimes, variable names are in the way and the code might be clearer if you could elide them by composing a pipeline of functions.

That's always worth an attempt. This time, ultimately I find that it doesn't improve things, but even an attempt can be illustrative.

If you want to eliminate a named value, you can often do so by piping the output of the function that produced the variable directly to the next function. This does, however, require that the function argument is the right-most. Currently, that's not the case. registration is right-most, and proof is to the left.

There's no compelling reason that the arguments should come in that order, so flip them:

let completeRegistrationWorkflow registration proof =
    match proof with
    | Some true -> Ok registration
    | _ -> Error registration.Mobile

This enables you to write the entire composition as a single pipeline:

let sut pid r = async {
    return!
        AsyncOption.traverse (twoFA.VerifyProof r.Mobile) pid
        |> Async.map (completeRegistrationWorkflow r)
        |> Async.bind (
            AsyncResult.traverseBoth db.CompleteRegistration twoFA.CreateProof
            >> AsyncResult.cata (fun () -> RegistrationCompleted) ProofRequired)
    }

This does, however, call for two new general-purpose functions: Async.map and Async.bind:

// ('a -> 'b) -> Async<'a> -> Async<'b>
let map f x = async {
    let! x' = x
    return f x' }
 
// ('a -> Async<'b>) -> Async<'a> -> Async<'b>
let bind f x = async {
    let! x' = x
    return! f x' }

In my opinion, these functions ought to belong to F#'s Async module, but for for reasons that aren't clear to me, they don't. As you can see, though, they're easy to add.

While the this change gets rid of the p variable, I don't think it makes the overall composition easier to understand. The action of swapping the function arguments does, however, enable another simplification.

Eta reduction #

Now that proof is completeRegistrationWorkflow's last function argument, you can perform an eta reduction:

let completeRegistrationWorkflow registration = function
    | Some true -> Ok registration
    | _ -> Error registration.Mobile

Not everyone is a fan of the point-free style, but I like it. YMMV.

Sandwich #

Regardless of whether you prefer completeRegistrationWorkflow in point-free or pointed style, I think that the composition needs improvement. It should explicitly communicate that it's an impure/pure/impure sandwich. This makes it necessary to reintroduce some variables, so I'm also going to bite the bullet and devise some better names.

let sut pid r = async {
    let! validityOfProof = 
        AsyncOption.traverse (twoFA.VerifyProof r.Mobile) pid
    let decision = completeRegistrationWorkflow r validityOfProof
    return!
        decision
        |> AsyncResult.traverseBoth db.CompleteRegistration twoFA.CreateProof
        |> AsyncResult.cata (fun () -> RegistrationCompleted) ProofRequired
    }

Instead of p, I decided to call the first value validityOfProof. This is the result of the first impure action in the sandwich (the upper slice of bread).

While validityOfProof is the result of an impure action, the value itself is pure and can be used as input to completeRegistrationWorkflow. This is the pure part of the sandwich. I called the output decision because the workflow makes a decision based on its input, and it's up to the caller to act on that decision.

Notice that decision is bound with a let binding (instead of a let! binding), despite taking place inside an async workflow. This is because completeRegistrationWorkflow is pure. It doesn't return an Async value.

The second impure action acts on decision through a pipeline of AsyncResult.traverseBoth and AsyncResult.cata, as previously explained.

I think that the impure/pure/impure sandwich is more visible like this, so that was my final edit. I'm happy with how it looks now.

Conclusion #

I don't claim that you can always refactor code to an impure/pure/impure sandwich. In fact, I can easily envision categories of software where such an architecture seems impossible.

Still, I find it intriguing that when I find myself in the realm of web services or message-based applications, I can't recall a case where a sandwich has been impossible. Surely, there must cases where it is so. That's the reason that I solicit examples. This article was a response to such an example. I found it fruitful, because it enabled me to discuss several useful techniques for composing behaviour in a functional architecture. On the other hand, it failed to be a counter-example.

I'm sure that some readers are left with a nagging doubt. That's all very impressive, but would you actually write code like that in a piece of production software?

If it was up to me, then: yes. I find that when I can keep code pure, it's trivial to unit test and there's no test-induced damage. Functions also compose in a way objects don't easily do, so there's many advantages to functional programming. I'll take them when they're available.

As always, context matters. I've been in team settings where other team members would embrace this style of programming, and in other environments where team members wouldn't understand what was going on. In the latter case, I'd adjust my approach to challenge, not alienate, other team members.

My intention with this article was to show what's possible, not to dictate what you should do. That's up to you.

This article is the December 2 entry in the F# Advent Calendar in English 2019.


Comments

Thank you so much for the comprehensive reply to my comment. It was very instructive to see refactoring process, from thought to code. The post is an excellent reply to the question I asked.

A slight modification

In my original comment, I made one simplification that, in hindsight, I perhaps should not have made. It is not critical, but it complicates things slightly. In reality, the completeRegistration function does not return Async<unit>, but Async<Result<unit, CompleteRegistrationError>> (where, currently, CompleteRegistrationError has the single case UserExists, returned if the DB throws a unique constraint error).

As I see it, the impact of this to your refactoring is two-fold:

  • You can't easily use AsyncResult.traverseBoth, since the signatures between the two cases aren't compatible (unless you want to mess around with nested Result values). You could write a custom traverse function just for the needed signature, but then we’ve traveled well into the lands of “generic does not imply general”.
  • It might be better to model the registration result (completed vs. proof required) as its own DU, with Result being reserved for actual errors.

Evaluating the refactoring

My original comment ended in the following question (emphasis added):

Is it possible to refactor this to direct input/output, in a way that actually reduces complexity where it matters?

With this (vague) question and the above modifications in mind, let's look at the relevant code before/after. In both cases, there are two functions: The workflow/logic, and the composition.

Before

Before refactoring, we have a slightly complex impure workflow (which still is fairly easily testable using state-based testing, as you so aptly demonstrated) – note the asyncResult CE (I’m using the excellent FsToolkit.ErrorHandling, if anyone wonders) and the updated signatures; otherwise it’s the same:

let completeRegistrationWorkflow
    (createProof: Mobile -> Async<ProofId>)
    (verifyProof: Mobile -> ProofId -> Async<bool>)
    (completeRegistration: Registration -> Async<Result<unit, CompleteRegistrationError>>)
    (proofId: ProofId option)
    (registration: Registration)
    : Async<Result<CompleteRegistrationResult, CompleteRegistrationError>> =
  asyncResult {
    match proofId with
    | None ->
        let! proofId = createProof registration.Mobile
        return ProofRequired proofId
    | Some proofId ->
        let! isValid = verifyProof registration.Mobile proofId
        if isValid then
          do! completeRegistration registration
          return RegistrationCompleted
        else
          let! proofId = createProof registration.Mobile
          return ProofRequired proofId
  }

Secondly, we have the trivial "humble object" composition, which looks like this:

let complete proofId validReg =
  Workflows.Registration.complete
    Http.createMobileClaimProof
    Http.verifyMobileClaimProof
    Db.completeRegistration
    proofId
    validReg

The composition is, indeed, humble – the only thing it does is call the higher-order workflow function with the correct parameters. It has no cyclomatic complexity and is trivial to read, and I don't think anyone would consider it necessary to test.

After

After refactoring, we have the almost trivial pure function we extracted (for simplicity I let it return Result here, as you proposed):

let completePure reg proofValidity =
  match proofValidity with
  | Some true -> Ok reg
  | Some false | None -> Error reg.Mobile

Secondly, we have the composition function. Now, with the modification to completeRegistration (returning Async<Result<_,_>>), it can't as easily be written in point-free style. You might certainly be able to improve it, but here is my quick initial take.

let complete proofId reg : Async<Result<CompleteRegistrationResult, CompleteRegistrationError>> =
  asyncResult {
    let! proofValidity =
      proofId |> Option.traverseAsync (Http.verifyMobileClaimProof reg.Mobile)

    match completePure reg proofValidity with
    | Ok reg ->
        do! Db.completeRegistration reg
        return RegistrationCompleted
    | Error mobile ->
        let! proofId = Http.createMobileClaimProof mobile
        return ProofRequired proofId
  }

Evaluation

Now that we have presented the code before/after, let us take stock of what we have gained and lost by the refactoring.

Pros:

  • We have gotten rid of the "DI workflow" entirely
  • More of the logic is pure

Cons:

  • The logic we extracted to a pure function is almost trivial. This is not in itself bad, but one can wonder whether it was worth it (apart from the purely instructive aspects).
  • If the extracted logic is pure, where then did the rest of the complexity go? The only place it could – it ended up in the "composition", i.e. the "humble object". The composition function isn't just calling a higher-order function with the correct function arguments any more; it has higher cyclomatic complexity and is much harder to read, and can't be easily tested (since it's a composition function). The new composition is, so to say, quite a bit less humble than the original composition. This is particularly evident in my updated version, but personally I also have to look at your simpler(?), point-free version a couple of times to convince myself that it is, really, not doing anything wrong. (Though regardless of whether a function is written point-free or not, it does the exact same thing and has the same complexity.)
  • To the point above: The composition function needs many "complex" helper functions that would likely confuse, if not outright alienate beginner F# devs (which could, for example, lead to worse onboarding). This is particularly relevant for non-standard functions like AsyncOption.traverse, AsyncResult.traverseBoth, AsyncResult.cata, etc.

Returning to my initial question: Does the refactoring “reduce complexity where it matters?“ I’m not sure. This is (at least partly) “personal opinions” territory, of course, and my vague question doesn’t help. But personally I find the result of the refactoring more complex to understand than the original, DI workflow-based version.

Based on Scott Wlaschin’s book Domain Modelling Made Functional, it’s possible he might agree. He seems very fond of the “DI workflow” approach there. I personally prefer a bit more dependency rejection than that, because I find “DR”/sandwiches often leads to simpler code, but in this particular case, I may prefer the impure DI workflow, tested using state-based testing. At least for the more complex code I described, but perhaps also for your original example.

Still, I truly appreciate your taking the time to respond in this manner. It was very instructive, as always, which was after all the point. And you’re welcome to share any insights regarding this comment, too.

2019-12-03 13:46 UTC

Christer, thank you for writing. This is great! One of your comments inspires me to compose another article that I've long wanted to write. If I manage to produce it in time, I'll publish it Monday. Once that's done, I'll respond here in a more thorough manner.

When I do that, however, I don't plan to reproduce your updated example, or address it in detail. I see nothing in it that invalidates what I've already written. As far as I can tell, you don't need to explicitly pattern-match on completePure reg proofValidity. You should be able to map or traverse over it like already shown. If you want my help with the details, I'll be happy to do so, but then please prepare a minimal working example like I did for this article. You can either fork my example or make a new repository.

2019-12-04 8:35 UTC

This is a fantastic post Mark! Thank you very much for going step-by-step while explaining how you refactored this code.

I find it intriguing that when I find myself in the realm of web services or message-based applications, I can't recall a case where a [impure/pure/impure] sandwich has been impossible. Surely, there must cases where it is so. That's the reason that I solicit examples.

I would like to suggest a example in the realm of web services or message-based applications that cannot be expressed as a impure/pure/impure sandwich.

Let's call an "impure/pure/impure sandwich" an impure-pure-impure composition. More generally, any impure funciton can be expressed as a composition of the form [pure-]impure(-pure-impure)*[-pure]. That is, (1) it might begin with a pure step, then (2) there is an impure step, then (3) there is a sequence of length zero or more containing a pure step followed by an impure step, and lastly (4) it might end with another pure step. One reason an impure fucntion might intentially be expressed by a composition that ends with a pure step is to erase senitive informaiton form the memory hierarchy. For simplicity though, let's assume that any impure function can be refactored so that the corresponding composition ends with an impure step. Let the length of a composition be one plus the number of dashes (-) that it contains.

Suppose f is a function with an impure-pure-impure composition such that f cannot be refactored to a fucntion with a composition of a smaller length. Then there exists fucntion f' with a pure-impure-pure-impure composition. The construction uses public-key cryptography. I think this is a natural and practical example.

Here is the definition of f' in words. The user sends to the server ciphertext encryped using the server's public key. The user's request is received by a process that already has the server's private key loaded into memory. This process decrypts the user's ciphertext using its private key to obtain some plantext p. This step is pure. Then the process passes p into f.

Using symmetric-key cryptography, it is possible to construct a function with a composition of an arbitrarily high length. The following construction reminds me of how union routing works (though each decryption in that case is intended to happen in a different process on a different server). I admit that this example is not very natural or practical.

Suppose f is a function with a composition of length n. Then there exists fucntion f' with a composition of length greater than n. Specifically, if the original composition starts with a pure step, then the length is larger by one; if the original composition starts with an impure step, then the length is larger by two.

Here is the definition of f' in words. The user sends to the server an ID and ciphertext encryped using a symmetric key that corresponds to the ID. The user's request is received by a process that does not have any keys loaded into memory. First, this process obtains from disk the appropriate symmetric key using the ID. This step is impure. Then this process decrypts the user's ciphertext using this key to obtain some plantext p. This step is pure. Then the process passes p into f.

2019-12-06 17:21 UTC

Tyson, thank you for writing. Unfortunately, I don't follow your chain of reasoning. Cryptography strikes me as fitting the impure/pure/impure sandwich architecture quite well. There's definitely an initial impure step because you have to initialise a random number generator, as well as load keys, salts, and whatnot from storage. From there, though, the cryptographic algorithms are, as far as I'm aware, pure calculation. I don't see how asymmetric cryptography changes that.

The reason that I'm soliciting examples that defy the impure/pure/impure sandwich architecture, however, is that I'm looking for a compelling example. What to do when the sandwich architecture is impossible is a frequently asked question. To be clear, I know what to do in that situation, but I'd like to write an article that answers the question in a compelling way. For that, I need an example that an uninitiated reader can follow.

2019-12-07 10:24 UTC

Sorry that my explanation was unclear. I should have included an example.

Cryptography strikes me as fitting the impure/pure/impure sandwich architecture quite well. There's definitely an initial impure step because you have to initialise a random number generator, as well as load keys, salts, and whatnot from storage. From there, though, the cryptographic algorithms are, as far as I'm aware, pure calculation. I don't see how asymmetric cryptography changes that.

I agree that cyptographic algorithms are pure. From your qoute that I included above, I get the impression that you have neglected to consider what computation is to be done with the output of the cyptogrpahic algorithm.

Here is a specific example of my first construction, which uses public-key cyptography. Consider the function sut that concluded your post. I repeat it for clarity.

let sut pid r = async {
    let! validityOfProof = 
        AsyncOption.traverse (twoFA.VerifyProof r.Mobile) pid
    let decision = completeRegistrationWorkflow r validityOfProof
    return!
        decision
        |> AsyncResult.traverseBoth db.CompleteRegistration twoFA.CreateProof
        |> AsyncResult.cata (fun () -> RegistrationCompleted) ProofRequired
    }

Let sut be the function f in that construction. In particular, sut is an impure/pure/impure sandwich, or equivalently an impure-pure-impure composition that of course has length 3. Furthermore, I think it is clear that this behavior cannot be expressed as a pure-impure-pure composition, a pure-impure composition, an impure-pure composition, or an impure composition. You worked very hard to simplfy that code, and I believe an implicit claim of yours is that it cannot be simplified any further.

In this case, f' would be the following function.

let privateKey = ...
let sut' ciphertext = async {
    let (pid, r) = decrypt privateKey ciphertext
    let! validityOfProof = 
        AsyncOption.traverse (twoFA.VerifyProof r.Mobile) pid
    let decision = completeRegistrationWorkflow r validityOfProof
    return!
        decision
        |> AsyncResult.traverseBoth db.CompleteRegistration twoFA.CreateProof
        |> AsyncResult.cata (fun () -> RegistrationCompleted) ProofRequired
    }

As I defined it, this function is a pure-impure-pure-impure composition, which has length 4. Maybe in your jargon you would call this a pure/impure/pure/impure sandwich. My claim is that this function cannot be refactored into an impure/pure/impure sandwich.

Do you think that my claim is correct?

2019-12-07 14:48 UTC

Tyson, thank you for your patience with me. Now I get it. As stated, your composition looks like a pure-impure-pure-impure composition, but unless you hard-code privateKey, you'll have to load that value, which is an impure operation. That would make it an impure-pure-impure-pure-impure composition.

The decryption step itself is an impure-pure composition, assuming that we need to load keys, salts, etc. from persistent storage. You might also want to think of it as a 'mostly' pure function, since you could probably load decryption keys once when the application process starts, and keep them around for its entire lifetime.

It's a correct example of a more involved interaction model. Thank you for supplying it. Unfortunately, it's not one I can use for an article. Like other cross-cutting concerns like caching, logging, retry mechanisms, etcetera, security can be abstracted away as middleware. This implies that you'd have a middleware action that's implemented as an impure-pure-impure sandwich, and an application feature that's implemented as another impure-pure-impure sandwich. These two sandwiches are unrelated. A change to one of them is unlikely to trigger a change in the other. Thus, we can still base our application architecture on the notion of the impure-pure-impure sandwich.

I hope I've explained my demurral in a sensible way.

2019-12-08 13:24 UTC
This implies that you'd have a middleware action that's implemented as an impure-pure-impure sandwich, and an application feature that's implemented as another impure-pure-impure sandwich. These two sandwiches are unrelated. A change to one of them is unlikely to trigger a change in the other.

The are unrelated semantically. Syntatically, the whole application sandwich is the last piece of impure bread on the middleware sandwich. This reminds me of a thought I have had and also heard recently, which is that the structure of code is like a fractal.

Anyway, I am hearing you say that you want functions to have "one responsibility", to do "one thing", to change for "one reason". With that constraint satisfied, you are requesting an example of a funciton that is not an impure/pure/impure sandwich. I am up to that challenge. Here is another attempt.

Suppose our job is to implement a man-in-the-middle attack in the style of Schneier's Chess Grandmaster Problem in which Alice and Bob know that they are communicating with Malory while Malory simply repeats what she hears to the other person. Specifically, Alice is a client and Bob is a server. Mailory acts like a server to Alice and like a client to Bob. The funciton would look something like this.

let malroyInTheMiddle aliceToMalory = async {
    let maloryToBob = convertIncoming aliceToMalory
    let! bobToMalory = service maloryToBob
    let maloryToAlice = convertOutgoing bobToMalory
    return maloryToAlice
}

This is a pure-impure-pure composition, which is different from an impure-pure-impure composition.

2019-12-09 14:28 UTC

Tyson, thank you for writing. The way I understand it, we are to assume that both convertIncoming and convertOutgoing are complicated functions that require substantial testing to get right. Under that assumption, I think that you're right. This doesn't directly fit the impure-pure-impure sandwich architecture.

It does, however, fit a simple function composition. As far as I can see, it's equivalent to something like this:

let malroyInTheMiddle  =
    Async.fromResult
    >> Async.map convertIncoming
    >> Async.bind service
    >> Async.map convertOutgoing

I haven't tested it, but I'd imagine it to be something like that.

To nitpick, this isn't a pure-impure-pure composition, but rather an impure-pure-impure-pure-impure composition. The entry point of a system is always impure, as is the output.

2019-12-10 11:29 UTC

Christer, I'd hoped that I'd already addressed some of your concerns in the article itself, but I may not have done a good enough job of it. Overall, given a question like

"Is it possible to refactor this to direct input/output, in a way that actually reduces complexity where it matters?"
I tend to put emphasis on is it possible. Not that the rest of the question is unimportant, but it's more subjective. Perhaps you find that my article didn't answer your question, but I hope at least that I managed to establish that, yes, it's possible to refactor to an impure-pure-impure sandwich.

Does it matter? I think it does, but that's subjective. I do think, though, that I can objectively say that my refactoring is functional, whereas passing impure functions as arguments isn't. Whether or not an architecture ought to be functional is, again, subjective. No-one says that it has to be. That's up to you.

As I wrote in my preliminary response, I'm not going to address your modification. I don't see that it matters. Even when you return an Async<Result<_,_>> you can map, bind, or traverse over it. You may not be able to use AsyncResult.traverseBoth, but you can derive specialisations like AsyncResult.traverseOk and AsyncResult.traverseError.

First, like you did, I find it illustrative to juxtapose the alternatives. I'm going to use the original example first:

let completeRegistrationWorkflow
    (createProof: Mobile -> Async<ProofId>)
    (verifyProof: Mobile -> ProofId -> Async<bool>)
    (completeRegistration: Registration -> Async<unit>)
    (proofId: ProofId option)
    (registration: Registration)
    : Async<CompleteRegistrationResult> =
    async {
        match proofId with
        | None ->
            let! proofId = createProof registration.Mobile
            return ProofRequired proofId
        | Some proofId ->
            let! isValid = verifyProof registration.Mobile proofId
            if isValid then
                do! completeRegistration registration
                return RegistrationCompleted
            else
                let! proofId = createProof registration.Mobile
                return ProofRequired proofId
    }

let sut =
    completeRegistrationWorkflow
        twoFA.CreateProof
        twoFA.VerifyProof
        db.CompleteRegistration

In contrast, here's my refactoring:

let completeRegistrationWorkflow registration = function
    | Some true -> Ok registration
    | _ -> Error registration.Mobile

let sut pid r = async {
    let! validityOfProof = 
        AsyncOption.traverse (twoFA.VerifyProof r.Mobile) pid
    let decision = completeRegistrationWorkflow r validityOfProof
    return!
        decision
        |> AsyncResult.traverseBoth db.CompleteRegistration twoFA.CreateProof
        |> AsyncResult.cata (fun () -> RegistrationCompleted) ProofRequired
    }

It's true that my composition (sut) seems more involved than yours, but the overall trade-off looks good to me. In total, the code is simpler.

In your evaluation, you make some claims that I'd like to specifically address. Most are reasonable, but I think a few require special attention:

"The logic we extracted to a pure function is almost trivial."
Indeed. This should be listed as a pro, not a con. Whether or not it's worth it is a different discussion.

As I wrote in the article:

"If you look at the original function, you'll see that the duplication was there all along. Once you remove all the accidental complexity, you uncover the essential complexity."
(Emphasis from the original.) Isn't it always worth it to take away accidental complexity?
"The composition function isn't just calling a higher-order function with the correct function arguments any more; it has higher cyclomatic complexity"
No, it doesn't. It has a cyclomatic complexity of 1, exactly like the original humble object.
"can't be easily tested"
True, but neither can the original humble object.
"The new composition is, so to say, quite a bit less humble than the original composition."
According to which criterion? It has the same cyclomatic complexity, but I admit that more characters went into typing it. On the other hand, the composition juxtaposed with the actual function has far fewer characters than the original example.

You also write that

"The composition function needs many "complex" helper functions [...]. This is particularly relevant for non-standard functions like AsyncOption.traverse, AsyncResult.traverseBoth, AsyncResult.cata, etc."
I don't like the epithet non-standard. It's true that these functions aren't in FSharp.Core, for reasons that aren't clear to me. In comparison, they're part of the standard base library in Haskell.

There's nothing non-standard about functions like these. Like map and bind, traversals and catamorphisms are universal abstractions. They exist independently of particular programming languages or library packages.

I think that it's fair criticism that they may not be friendly to absolute beginners, but they're still fairly basic ideas that address real needs. The same can be said for the asyncResult computation expression that you've decided to use. It's also 'non-standard' only in the sense that it's not part of FSharp.Core, but otherwise standard in that it's just a stack of monads, and plenty of libraries supply that functionality. You can also write that computation expression yourself in a dozen lines of code.

In the end, all of this is subjective. As I also wrote in my conclusion:

"I've been in team settings where [...] team members wouldn't understand what was going on. In the latter case, I'd adjust my approach to challenge, not alienate, other team members.

"My intention with this article was to show what's possible, not to dictate what you should do."

What I do, however, think is important to realise is that what I suggest is to learn a set of concepts once. Once you understand functors, monads, traversals etcetera, that's knowledge that applies to F#, Haskell, C#, JavaScript (I suppose) and so on.

Personally, I find it a better investment of my time to learn a general concept once, and then work with trivial code, rather than having to learn, over and over again, how to deal with each new code base's accidental complexity.

2019-12-12 2:56 UTC

Mark, thank you for getting back to me with a detailed response.

First, a general remark. I see that my comment might have been a bit "sharp around the edges” and phrased somewhat carelessly, giving the impression that I was not happy with your treatment of my example. I’d just like to clearly state that I am. You replied in your usual clear manner to exactly the question I posed, and seeing your process and solution was instructive for me.

We are all learning, all the time, and if I use a strong voice, that is primarily because strong opinions, weakly held often seems to be a fruitful way to drive discussion and learning.

With that in mind, allow me to address some of your remarks and possibly soften my previous comment.

Perhaps you find that my article didn't answer your question, but I hope at least that I managed to establish that, yes, it's possible to refactor to an impure-pure-impure sandwich.

Your article did indeed answer my question. My takeaway (then, not necessarily now after reading the rest of your comment) was that you managed to refactor to impure-pure-impure at the "expense” of making the non-pure part harder to understand. But as you say, that’s subjective, and your remarks on that later in your comment was a good point of reflection for me. I’ll get to that later.

Does it matter? I think it does, but that's subjective. I do think, though, that I can objectively say that my refactoring is functional, whereas passing impure functions as arguments isn't. Whether or not an architecture ought to be functional is, again, subjective. No-one says that it has to be. That's up to you.

I agree on all points.

First, like you did, I find it illustrative to juxtapose the alternatives.

I don’t agree 100% that it’s a completely fair comparison, since you’re leaving out the implementations of AsyncOption.traverse, AsyncResult.traverseBoth, and AsyncResult.cata. However, I get why you are doing it. These are generic utility functions for universal concepts that, as you say later in your comment, you “learn once”. In that respect, it’s fair to leave them out. My only issue with it is that since F# doesn’t have higher-kinded types, these utility functions have to be specific to the monads and monad stacks in use. I originally thought this made such functions less understandable and less useful, but after reading the rest of your comment, I’m not sure they are. More on that below.

In your evaluation

(Emphasis yours.) Just in case: “Evaluation” might have been a poor choice of words. I hope you did not take it to mean that I was a teacher grading a student’s test. This was not in any way intended personally (e.g. evaluating "your solution”). I was merely looking to sum up my subjective opinions about the refactoring.

Isn't it always worth it to take away accidental complexity?

I find it hard to say an unequivocal "yes” to such general statements. Ultimately it depends on the specific context and tradeoffs involved. If the context is “a codebase to be used for onboarding new F# devs” and the tradeoffs are “use generic helper functions to traverse bifunctors in a stack of monads”, then I’m not sure. (It may still be, but it’s certainly not a given.)

But generally, though I haven’t reflected deeply on this, I’m sure you’re right that it’s worthwhile to always take away accidental complexity.

No, it doesn't. It has a cyclomatic complexity of 1, exactly like the original humble object.

You’re right. Thank you for the clarifying article on cyclomatic complexity.

can't be easily tested

True, but neither can the original humble object.

That’s correct, but my point was that the original composition was trivial (just calling a “DI function” with the correct arguments/dependencies) and didn’t need to be tested, whereas the refactored composition does more and might warrant testing (at least to a larger degree than the original).

This raises an interesting point. It seems (subjectively to me based on what I’ve read) to be a general consensus that a function can be left untested (is “humble”, so to speak) as long as it consists of just generic helpers, like the refactored composition. That "if it compiles, it works”. This is not a general truth, since for some signatures there may exist several transformations from the input type to the output type, where the output value is different for the different transformations. I have come across such cases, and even had bugs because I used the wrong transformation. Which is why I said:

The new composition is, so to say, quite a bit less humble than the original composition.

According to which criterion?

It is more complex in the sense that it doesn’t just call a function with the correct dependencies. The original composition is more or less immediately recognizable as correct. The refactored composition, as I said, required me to look at it more carefully to convince myself that it was correct. (I will grant that this is to some extent subjective, though.)

I don't like the epithet non-standard. It's true that these functions aren't in FSharp.Core, for reasons that aren't clear to me. In comparison, they're part of the standard base library in Haskell.

There's nothing non-standard about functions like these. Like map and bind, traversals and catamorphisms are universal abstractions. They exist independently of particular programming languages or library packages.

I think that it's fair criticism that they may not be friendly to absolute beginners, but they're still fairly basic ideas that address real needs.

What I do, however, think is important to realise is that what I suggest is to learn a set of concepts once. Once you understand functors, monads, traversals etcetera, that's knowledge that applies to F#, Haskell, C#, JavaScript (I suppose) and so on.

Personally, I find it a better investment of my time to learn a general concept once, and then work with trivial code, rather than having to learn, over and over again, how to deal with each new code base's accidental complexity.

This is the primary point of reflection for me in your comment. While I frequently use monads and monad stacks (particularly Async<Result<_,_>>) and often write utility code to transform when needed (e.g. List.traverseResult), I try to limit the number of such custom utility functions. Why? I’m not sure, actually. It may very well have to do with my work environment, where for a long time I have been the only F# dev and I don’t want to alienate the other .NET devs before they even get started with F#.

In light of your comment, perhaps F# devs are doing others a disservice if we limit our use of important, general concepts like functors, monads, traversals etc.? Then again, there’s certainly a balance to be struck. I got started (and thrilled) with F# by reading F# for fun and profit and learning about algebraic types, the concise syntax, "railway-oriented programming” etc. If my first glimpse of F# had instead been AsyncSeq.traverseAsyncResultOption, then I might never have left the warm embrace of C#.

I might check out FSharpPlus, which seems to make this kind of programming easier. I have previously steered away from that library because I deemed it “too complex” (c.f. my remarks about alienating coworkers), but it might be time to reconsider. If you have tried it, I would love to hear your thoughts on it in some form or another, though admittedly that isn’t directly related to the topic at hand.

2019-12-12 9:04 UTC

Christer, don't worry about the tone of the debate. I'm not in the least offended or vexed. On the contrary, I find this a valuable discussion, and I'm glad that we're having it in a medium where it's also visible to other people.

I think that we're gravitating towards consensus. I definitely agree that the changes I suggest aren't beginner-friendly.

People sometimes ask me for advice on how to get started with functional programming, and I always tell .NET developers to start with F#. It's a friendly language that enables everyone to learn gradually. If you already know C# (or Visual Basic .NET) the only thing you need to learn about F# is some syntax. Then you can write object-oriented F#. As you learn new functional concepts, you can gradually change the way you write F# code. That's what I did.

I agree with your reservations about onboarding and beginner-friendliness. When that's a concern, I wouldn't write the F# code like I suggested either.

For a more sophisticated team, however, I feel that my suggestions are improvements that matter. I grant you that the composition seems more convoluted, but I consider the overall trade-off beneficial. In the terminology suggested by Rich Hickey, it may not be easier, bit it's simpler.

I have no experience with FSharpPlus or any similar libraries. I usually just add the monad stacks and functions to my code base on an as-needed basis. As we've seen here, such functions are mostly useful to compose other functions, so they rarely need to be exported as part of a code base's surface area.

2019-12-12 15:02 UTC

TimeSpan configuration values in .NET Core

Monday, 25 November 2019 07:04:00 UTC

You can use a standard string format for TimeSpan values in configuration files.

Sometimes you need to make TimeSpan values configurable. I often see configuration files that look like this:

{
  "SeatingDurationInSeconds""9000"
}

Code can read such values from configuration files like this:

var seatingDuration = TimeSpan.FromSeconds(Configuration.GetValue<int>("SeatingDurationInSeconds"));

This works, but is abstruse. How long is 9000 seconds?

The idiomatic configuration file format for .NET Core is JSON, which even prevents you from adding comments. Had the configuration been in XML, at least you could have added a comment:

<!--9000 seconds = 2½ hours-->
<SeatingDurationInSeconds>9000</SeatingDurationInSeconds>

In this case, however, it doesn't matter. Use the standard TimeSpan string representation instead:

{
  "SeatingDuration""2:30:00"
}

Code can read the value like this:

var seatingDuration = Configuration.GetValue<TimeSpan>("SeatingDuration");

I find a configuration value like "2:30:00" much easier to understand than 9000, and the end result is the same.

I haven't found this documented anywhere, but from experience I know that this capability is present in the .NET Framework, so I wondered if it was also available in .NET Core. It is.


Comments

Rex Ng #

There is actually an ISO 8601 standard for durations.

In your example you can use

{ "SeatingDuration": "P9000S" }
or
{ "SeatingDuration": "P2H30M" }
which is also quite readable in my opinion.

Not every JSON serializer supports it though.

2019-11-26 01:02 UTC

Rex, thank you for writing. I was aware of the ISO standard, although I didn't know that you can use it in a .NET Core configuration file. This is clearly subjective, but I don't find that format as readable as 2:30:00.

2019-11-26 6:59 UTC

Small methods are easy to troubleshoot

Monday, 18 November 2019 06:48:00 UTC

Write small methods. How small? Small enough that any unhandled exception is easy to troubleshoot.

Imagine that you receive a bug report. This one include a logged exception:

System.NullReferenceException: Object reference not set to an instance of an object.
   at Ploeh.Samples.BookingApi.Validator.Validate(ReservationDto dto)
   at Ploeh.Samples.BookingApi.ReservationsController.Post(ReservationDto dto)
   at lambda_method(Closure , Object , Object[] )
   at Microsoft.Extensions.Internal.ObjectMethodExecutor.Execute(Object target, Object[] parameters)
   at Microsoft.AspNetCore.Mvc.Internal.ActionMethodExecutor.SyncActionResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeActionMethodAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeNextActionFilterAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Rethrow(ActionExecutedContext context)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeInnerFilterAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeNextResourceFilter()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Rethrow(ResourceExecutedContext context)
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeFilterPipelineAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeAsync()
   at Microsoft.AspNetCore.Builder.RouterMiddleware.Invoke(HttpContext httpContext)
   at Microsoft.AspNetCore.Diagnostics.DeveloperExceptionPageMiddleware.Invoke(HttpContext context)
		

Oh, no, you think, not a NullReferenceException.

If you find it hard to troubleshoot NullReferenceExceptions, you're not alone. It doesn't have to be difficult, though. Open the method at the top of the stack trace, Validate:

public static class Validator
{
    public static string Validate(ReservationDto dto)
    {
        if (!DateTime.TryParse(dto.Date, out var _))
            return $"Invalid date: {dto.Date}.";
        return "";
    }
}

Take a moment to consider this method in the light of the logged exception. What do you think went wrong? Which object was null?

Failed hypothesis #

You may form one of a few hypotheses about which object was null. Could it be dto? dto.Date? Those are the only options I can see.

When you encounter a bug in a production system, if at all possible, reproduce it as a unit test.

If you think that the problem is that dto.Date is null, test your hypothesis in a unit test:

[Fact]
public void ValidateNullDate()
{
    var dto = new ReservationDto { Date = null };
    string actual = Validator.Validate(dto);
    Assert.NotEmpty(actual);
}

If you think that a null dto.Date should reproduce the above exception, you'd expect the test to fail. When you run it, however, it passes. It passes because DateTime.TryParse(null, out var _) returns false. It doesn't throw an exception.

That's not the problem, then.

That's okay, this sometimes happens. You form a hypothesis and fail to validate it. Reject it and move on.

Validated hypothesis #

If the problem isn't with dto.Date, it must be with dto itself. Write a unit test to test that hypothesis:

[Fact]
public void ValidateNullDto()
{
    string actual = Validator.Validate(null);
    Assert.NotEmpty(actual);
}

When you run this test, it does indeed fail:

Ploeh.Samples.BookingApi.UnitTests.ValidatorTests.ValidateNullDto
  Duration: 6 ms

  Message: 
    System.NullReferenceException : Object reference not set to an instance of an object.
  Stack Trace: 
    Validator.Validate(ReservationDto dto) line 12
    ValidatorTests.ValidateNullDto() line 36

This looks like the exception included in the bug report. You can consider this as validation of your hypothesis. This test reproduces the defect.

It's easy to address the issue:

public static string Validate(ReservationDto dto)
{
    if (dto is null)
        return "No reservation data supplied.";
    if (!DateTime.TryParse(dto.Date, out var _))
        return $"Invalid date: {dto.Date}.";
    return "";
}

The point of this article, however, is to show that small methods are easy to troubleshoot. How you resolve the problem, once you've identified it, is up to you.

Ambiguity #

Methods are usually more complex than the above example. Imagine, then, that you receive another bug report with this logged exception:

System.NullReferenceException: Object reference not set to an instance of an object.
   at Ploeh.Samples.BookingApi.MaîtreD.CanAccept(IEnumerable`1 reservations, Reservation reservation)
   at Ploeh.Samples.BookingApi.ReservationsController.Post(ReservationDto dto)
   at lambda_method(Closure , Object , Object[] )
   at Microsoft.Extensions.Internal.ObjectMethodExecutor.Execute(Object target, Object[] parameters)
   at Microsoft.AspNetCore.Mvc.Internal.ActionMethodExecutor.SyncActionResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeActionMethodAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeNextActionFilterAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Rethrow(ActionExecutedContext context)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeInnerFilterAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeNextResourceFilter()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Rethrow(ResourceExecutedContext context)
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeFilterPipelineAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeAsync()
   at Microsoft.AspNetCore.Builder.RouterMiddleware.Invoke(HttpContext httpContext)
   at Microsoft.AspNetCore.Diagnostics.DeveloperExceptionPageMiddleware.Invoke(HttpContext context)

When you open the code file for the MaîtreD class, this is what you see:

public class MaîtreD
{
    public MaîtreD(int capacity)
    {
        Capacity = capacity;
    }
 
    public int Capacity { get; }
 
    public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
    {
        int reservedSeats = 0;
        foreach (var r in reservations)
            reservedSeats += r.Quantity;
 
        if (Capacity < reservedSeats + reservation.Quantity)
            return false;
 
        return true;
    }
}

This code throws a NullReferenceException, but which object is null? Can you identify it from the code?

I don't think that you can. It could be reservations or reservation (or both). Without more details, you can't tell. The exception is ambiguous.

The key to making troubleshooting easy is to increase your chances that, when exceptions are thrown, they're unambiguous. The problem with a NullReferenceException is that you can't tell which object was null.

Remove ambiguity by protecting invariants #

Consider the CanAccept method. Clearly, it requires both reservations and reservation in order to work. This requirement is, however, currently implicit. You can make it more explicit by letting the method protect its invariants. (This is also known as encapsulation. For more details, watch my Pluralsight course Encapsulation and SOLID.)

A simple improvement is to add a Guard Clause for each argument:

public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
{
    if (reservations is null)
        throw new ArgumentNullException(nameof(reservations));
    if (reservation is null)
        throw new ArgumentNullException(nameof(reservation));
 
    int reservedSeats = 0;
    foreach (var r in reservations)
        reservedSeats += r.Quantity;
 
    if (Capacity < reservedSeats + reservation.Quantity)
        return false;
 
    return true;
}

If you'd done that from the start, the logged exception would instead have been this:

System.ArgumentNullException: Value cannot be null.
Parameter name: reservations
   at Ploeh.Samples.BookingApi.MaîtreD.CanAccept(IEnumerable`1 reservations, Reservation reservation)
   at Ploeh.Samples.BookingApi.ReservationsController.Post(ReservationDto dto)
   at lambda_method(Closure , Object , Object[] )
   at Microsoft.Extensions.Internal.ObjectMethodExecutor.Execute(Object target, Object[] parameters)
   at Microsoft.AspNetCore.Mvc.Internal.ActionMethodExecutor.SyncActionResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeActionMethodAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeNextActionFilterAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Rethrow(ActionExecutedContext context)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.InvokeInnerFilterAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeNextResourceFilter()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Rethrow(ResourceExecutedContext context)
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeFilterPipelineAsync()
   at Microsoft.AspNetCore.Mvc.Internal.ResourceInvoker.InvokeAsync()
   at Microsoft.AspNetCore.Builder.RouterMiddleware.Invoke(HttpContext httpContext)
   at Microsoft.AspNetCore.Diagnostics.DeveloperExceptionPageMiddleware.Invoke(HttpContext context)

It's now clear that it was the reservations argument that was null. Now fix that issue. Why does the CanAccept receive a null argument?

You may now consider examining the next frame in the stack trace: ReservationsController.Post:

public ActionResult Post(ReservationDto dto)
{
    var validationMsg = Validator.Validate(dto);
    if (validationMsg != "")
        return BadRequest(validationMsg);
 
    var reservation = Mapper.Map(dto);
    var reservations = Repository.ReadReservations(reservation.Date);
 
    var accepted = maîtreD.CanAccept(reservationsreservation);
    if (!accepted)
        return StatusCode(
            StatusCodes.Status500InternalServerError,
            "Couldn't accept.");
    var id = Repository.Create(reservation);
    return Ok(id);
}

The Post code only calls CanAccept once, so again, you can unambiguously deduce where the null reference comes from. It comes from the Repository.ReadReservations method call. Now go there and figure out why it returns null.

Notice how troubleshooting is trivial when the methods are small.

Size matters #

How small is a small method? How large does a method have to be before it's no longer small?

A method is small when you can easily troubleshoot it based exclusively on a logged exception.

Until you get the hang of it, it can be hard to predict if a method is going to be easy to troubleshoot. I therefore also follow another rule of thumb:

For languages like C#, methods should have a maximum size of 80x24 characters.

What if you already have bigger methods? Those are much harder to troubleshoot. If you have a method that spans hundreds of lines of code, a logged exception is rarely sufficient for troubleshooting. In order to save space, I'm not going to show en example of such a method, but I'm sure that you can easily find one in the code base that you professionally work with. Such a method is likely to have dozens of objects that could be null, so a NullReferenceException and a stack trace will contain too little information to assist troubleshooting.

I often see developers add tracing or logging to such code in order to facilitate troubleshooting. This makes the code even more bloated, and harder to troubleshoot.

Instead, cut the big method into smaller helper methods. You can keep the helper methods private so that you don't change the API. Even so, a logged exception will now report the helper method in its stack trace. Keep those helper methods small, and troubleshooting becomes trivial.

Conclusion #

Small methods come with many benefits. One of these is that it's easy to troubleshoot a small method from a logged exception. Small methods typically take few arguments, and use few variables. A logged exception will often be all that you need to pinpoint where the problem occurred.

A large method body, on the other hand, is difficult to troubleshoot. If all you have is a logged exception, you'll be able to find multiple places in the code where that exception could have been thrown.

Refactor big methods into smaller helper methods. If one of these helper methods throws an exception, the method name will be included in the stack trace of a logged exception. When the helper method is small, you can easily troubleshoot it.

Keep methods small. Size does matter.


Comments

How did you create those exception stack traces?

Compared to your examples, I am used to seeing each stack frame line end with the source code line number. (Although, sometimes I have seen all line numbers being 0, which is equivalent to having no line numbers at all.) The exception stack trace in your unit test includes line numbers but your other exception stack traces do not. This Stack Overflow answer contains an exception stack trace like with line numbers like I am used to seeing.

The line containing "lambda_method" also seem odd to me. As I recall, invoking a lambda expression also contributes more information to the stack trace than your examples include. Although, that information is cryptic enough that I don't remember off the top of my head what it means. (After a quick search, I didn't find a good example of how I am used to seeing lambda expressions displayed in a stack trace.)

With this additional informaiton, the methods can be longer while reamining unambiguous (in the sense you described).

2019-11-18 14:57 UTC

Tyson, thank you for writing. IIRC, you only see line numbers for code compiled with debug information. That's why you normally don't see line numbers for .NET base class library methods, ASP.NET Core, etc.

While you can deploy and run an application in Debug mode, I wouldn't. When you compile in Release mode, the compiler makes optimisations (such as in-lining) that it can't do when it needs to retain a map to a specific source. In Debug mode, you squander those optimisations.

I've always deployed systems in Release mode. To make the examples realistic, I didn't include the line numbers in the stack trace, because in Release mode, they aren't going to be there.

When you're troubleshooting by reproducing a logged exception as a unit test, on the other hand, it's natural to do so in Debug mode.

2019-11-18 16:39 UTC

I just tried to verify this. I didn't find any difference between compiling in Debug or Release mode as well as no fundemental issue with in-lining. In all four cases (debug vs release and in-lining vs no inlining), exception stack traces still included line numbers. When in-lining, the only different was the absense of the in-lined method from the stack trace.

Instead, I found that line numbers are included in exception stack traces if and only if the corresponding .pdb file is next to the .exe file when executed.

2019-11-18 19:07 UTC

Tyson, that sounds correct. To be honest, I stopped putting effort into learning advanced debugging skills a long time ago. Early in my career we debugged by writing debug statements directly to output! Then I learned test-driven development, and with that disappeared most of the need to debug full systems. At the time I began working with systems that logged unhandled exceptions, the habit of writing small methods was already so entrenched that I never felt that I needed the line numbers.

FWIW, more than five years ago, I decided to publish symbols with AutoFixture, but I never found it useful.

2019-11-19 7:18 UTC

I recommend setting DebugType to Embedded in every project file. This puts the PDB file inside the DLL. Having separate files is harmful in the same way that NuGet package restore is harmful. I don't think of this as an advanced debugging skill. It is trivial for one person to add the required line to each project. Then every future stack trace seen by any developer will include line numbers. Seems like an easy win to me.

2022-09-17 18:07 UTC

Diamond rock

Monday, 11 November 2019 09:15:00 UTC

A diamond kata implementation written in Rockstar

I've previously written about the diamond kata, which has become one of my favourite programming exercises. I wanted to take the Rockstar language out for a spin, and it seemed a good candidate problem.

Rockstar #

If you're not aware of the Rockstar programming language, it started with a tweet from Paul Stovell:

"To really confuse recruiters, someone should make a programming language called Rockstar."

This inspired Dylan Beattie to create the Rockstar programming language. The language's landing page already sports an idiomatic implementation of the FizzBuzz kata, so I had to pick another exercise. The diamond kata was a natural choice for me.

Lyrics #

I'll start with the final result. What follows here are the final lyrics to Diamond rock. As you'll see, it starts with a short prologue after which it settles into a repetitive pattern. I imagine that this might make it useful for audience participation, in the anthem rock style of e.g. We Will Rock You.

After 25 repetitions the lyrics change. I haven't written music to any of them, but I imagine that this essentially transitions into another song, just like We Will Rock You traditionally moves into We Are the Champions. The style of the remaining lyrics does, however, suggest something musically more complex than a rock anthem.

Your memory says  
The way says A

Despair takes your hope
The key is nothing
Your courage is a tightrope
Let it be without it
If your hope is your courage
Let the key be the way

Build your courage up
If your hope is your courage
The key says B

Build your courage up
If your hope is your courage
The key says C

Build your courage up
If your hope is your courage
The key says D

Build your courage up
If your hope is your courage
The key says E

Build your courage up
If your hope is your courage
The key says F

Build your courage up
If your hope is your courage
The key says G

Build your courage up
If your hope is your courage
The key says H

Build your courage up
If your hope is your courage
The key says I

Build your courage up
If your hope is your courage
The key says J

Build your courage up
If your hope is your courage
The key says K

Build your courage up
If your hope is your courage
The key says L

Build your courage up
If your hope is your courage
The key says M

Build your courage up
If your hope is your courage
The key says N

Build your courage up
If your hope is your courage
The key says O

Build your courage up
If your hope is your courage
The key says P

Build your courage up
If your hope is your courage
The key says Q

Build your courage up
If your hope is your courage
The key says R

Build your courage up
If your hope is your courage
The key says S

Build your courage up
If your hope is your courage
The key says T

Build your courage up
If your hope is your courage
The key says U

Build your courage up
If your hope is your courage
The key says V

Build your courage up
If your hope is your courage
The key says W

Build your courage up
If your hope is your courage
The key says X

Build your courage up
If your hope is your courage
The key says Y

Build your courage up
If your hope is your courage
The key says Z

Give back the key

Dream takes a tightrope
Your hope's start'n'stop
While Despair taking your hope ain't a tightrope
Build your hope up

Give back your hope

Raindrop is a wintercrop
Light is a misanthrope
Let it be without raindrop
Darkness is a kaleidoscope
Let it be without raindrop

Diamond takes your hour, love, and your day
If love is the way
Let your sorrow be your hour without light
Put your sorrow over darkness into flight
Put your memory of flight into motion
Put it with love, and motion into the ocean
Whisper the ocean

If love ain't the way
Let ray be your day of darkness without light
Let your courage be your hour without darkness, and ray
Put your courage over darkness into the night
Put your memory of ray into action
Let satisfaction be your memory of the night
Let alright be satisfaction with love, action, love, and satisfaction
Shout alright


Listen to the wind
Let flight be the wind
Let your heart be Dream taking flight
Let your breath be your heart of darkness with light
Your hope's stop'n'start
While your hope is lower than your heart
Let away be Despair taking your hope
Diamond taking your breath, away, and your hope
Build your hope up

Renown is southbound
While your hope is as high as renown
Let away be Despair taking your hope
Diamond taking your breath, away, and your hope
Knock your hope down

Not only do these look like lyrics, but it's also an executable program!

Execution #

If you want to run the program, you can copy the text and paste it into the web-based Rockstar interpreter; that's what I did.

When you Rock! the lyrics, the interpreter will prompt you for an input. There's no instructions or input validation, but the only valid input is the letters A to Z, and only in upper case. If you type anything else, I don't know what'll happen, but most likely it'll just enter an infinite loop, and you'll have to reboot your computer.

If you input, say, E, the output will be the expected diamond figure:

    A    
   B B   
  C   C  
 D     D 
E       E
 D     D 
  C   C  
   B B   
    A    

When you paste the code, be sure to include everything. There's significant whitespace in those lyrics; I'll explain later.

Readable code #

As the Rockstar documentation strongly implies, singability is more important than readability. You can, however, write more readable Rockstar code, and that's what I started with:

Space says  
LetterA says A

GetLetter takes index
If index is 0
Give back LetterA

If index is 1
retVal says B
Give back retVal

If index is 2
retVal says C
Give back retVal


GetIndex takes letter
Index is 0
While GetLetter taking Index ain't letter
Build Index up

Give back Index

PrintLine takes width, l, lidx
If l is LetterA
Let completeSpaceCount be width minus 1
Let paddingCount be completeSpaceCount over 2
Let padding be Space times paddingCount
Let line be padding plus l plus padding
Say line
Else
Let internalSpaceSize be lidx times 2 minus 1
Let filler be Space times internalSpaceSize
Let totalOuterPaddingSize be width minus 2, internalSpaceSize
Let paddingSize be totalOuterPaddingSize over 2
Let padding be Space times paddingSize
Let line be padding plus l, filler, l, padding
Say line

Listen to input
Let idx be GetIndex taking input
Let width be idx times 2 plus 1
Let counter be 0
While counter is lower than idx
Let l be GetLetter taking counter
PrintLine taking width, l, counter
Build counter up

While counter is as high as 0
Let l be GetLetter taking counter
PrintLine taking width, l, counter
Knock counter down

This prototype only handled the input letters A, B, and C, but it was enough to verify that the algorithm worked. I've done the diamond kata several times before, so I only had to find the most imperative implementation on my hard drive. It wasn't too hard to translate to Rockstar.

Although Rockstar supports mainstream quoted strings like "A", "B", and so on, you can see that I went straight for poetic string literals. Before I started persisting Rockstar code to a file, I experimented with the language using the online interpreter. I wanted the program to look as much like rock lyrics as I could, so I didn't want to have too many statements like Shout "FizzBuzz!" in my code.

Obscuring space #

My first concern was whether I could obscure the space character. Using a poetic string literal, I could:

Space says  

The rules of poetic string literals is that everything between says  and the newline character becomes the value of the string variable. So there's an extra space after says !

After I renamed all the variables and functions, that line became:

Your memory says  

Perhaps it isn't an unprintable character, but it is unsingable.

No else #

The keyword Else looks conspicuously like a programming construct, so I wanted to get rid of that as well. That was easy, because I could just invert the initial if condition:

If l ain't LetterA

This effectively switches between the two alternative code blocks.

Obscuring letter indices #

I also wanted to obscure the incrementing index values 1, 2, 3, etcetera. Since the indices are monotonically increasing, I realised that I could use a counter and increment it:

number is 0
If index is number
Let retVal be LetterA

Build number up
If index is number
retVal says B

Build number up
If index is number
retVal says C

The function initialises number to 0 and assigns a value to retVal if the input index is also 0.

If not, it increments the number (so that it's now 1) and again compares it to index. This sufficiently obscures the indices, but if there's a way to hide the letters of the alphabet, I'm not aware of it.

After I renamed the variables, the code became:

Your courage is a tightrope
Let it be without it
If your hope is your courage
Let the key be the way

Build your courage up
If your hope is your courage
The key says B

Build your courage up
If your hope is your courage
The key says C

There's one more line of code in the final lyrics, compared to the above snippet. The line Let it be without it has no corresponding line of code in the readable version. What's going on?

Obscuring numbers #

Like poetic string literals, Rockstar also supports poetic number literals. Due to its modulo-ten-based system, however, I found it difficult to come up with a good ten-letter word that fit the song's lyrical theme. I could have done something like this to produce the number 0:

Your courage is barbershop

or some other ten-letter word. My problem was that regardless of what I chose, it didn't sound good. Some article like a or the would sound better, but that would change the value of the poetic number literal. a tightrope is the number 19, because a has one letter, and tightrope has nine.

There's a simple way to produce 0 from any number: just subtract the number from itself. That's what Let it be without it does. I could also have written it as Let your courage be without your courage, but I chose to take advantage of Rockstar's pronoun feature instead. I'd been looking for an opportunity to include the phrase Let It Be ever since I learned about the Let x be y syntax.

The following code snippet initialises the variable Your courage to 19, but on the next line subtracts 19 from 19 and updates the variable so that its value is now 0.

Your courage is a tightrope
Let it be without it

I had the same problem with initialising the numbers 1 and 2, so further down I resorted to similar tricks:

Raindrop is a wintercrop
Light is a misanthrope
Let it be without raindrop
Darkness is a kaleidoscope
Let it be without raindrop 

Here I had the additional constraint that I wanted the words to rhyme. The rhymes are a continuation of the previous lines' up and hope, so I struggled to come up with a ten-letter word that rhymes with up; wintercrop was the best I could do. a wintercrop is 10, and the strategy is to define Light and Darkness as 11 and 12, and then subtract 10 from both. At the first occurrence of Let it be without raindrop, it refers to Light, whereas the second time it refers to Darkness.

Lyrical theme #

Once I had figured out how to obscure strings and numbers, it was time to rename all the readable variables and function names into idiomatic Rockstar.

At first, I thought that I'd pattern my lyrics after Shine On You Crazy Diamond, but I soon ran into problems with the keyword taking. I found it difficult to find words that would naturally succeed taking. Some options I thought of were:

  • taking the bus
  • taking a chance
  • taking hold
  • taking flight
  • taking time
Some of these didn't work for various reasons. In Rockstar times is a keyword, and apparently time is reserved as well. At least, the online interpreter choked on it.

Taking a chance sounded too much like ABBA. Taking hold created the derived problem that I had to initialise and use a variable called hold, and I couldn't make that work.

Taking flight, on the other hand, turned out to provide a fertile opening.

I soon realised, though, that my choice of words pulled the lyrical theme away from idiomatic Rockstar vocabulary. While I do get the Tommy and Gina references, I didn't feel at home in that poetic universe.

On the other hand, I thought that the words started to sound like Yes. I've listened to a lot of Yes. The lyrics are the same kind of lame and vapid as what was taking form in my editor. I decided to go in that direction.

Granted, this is no longer idiomatic Rockstar, since it's more prog rock than hair metal. I invoke creative license.

Soon I also conceived of the extra ambition that I wanted the verses to rhyme. Here, it proved fortunate that the form let x be y is interchangeable with the form put y into x. Some words, like darkness, are difficult to rhyme with, so it helps that you can hide them within a put y into x form.

Over the hours(!) I worked on this, a theme started to emerge. I'm particularly fond of the repeated motifs like:

Your hope's start'n'stop

which rhymes with up, but then later it appears again as

Your hope's stop'n'start

which rhymes with heart. Both words, by the way, represent the number 0, since there's ten letters when you ignore the single quotes.

Conclusion #

I spent more time on this that I suppose I ought to, but once I got started, it was hard to stop. I found the translation from readable code into 'idiomatic' Rockstar at least as difficult as writing working software. There's a lesson there, I believe.

Rockstar is still a budding language, so I did miss a few features, chief among which would be arrays, but I'm not sure how one would make arrays sufficiently rock'n'roll.

A unit testing framework would also be nice.

If you liked this article, please endorse my Rockstar skills on LinkedIn so that we can "confuse recruiters."


The 80/24 rule

Monday, 04 November 2019 06:51:00 UTC

Write small blocks of code. How small? Here's how small.

One of the most common questions I get is this:

If you could give just one advice to programmers, what would it be?

That's easy:

Write small blocks of code.

Small methods. Small functions. Small procedures.

How small?

Few lines of code #

You can't give a universally good answer to that question. Among other things, it depends on the programming language in question. Some languages are much denser than others. The densest language I've ever encountered is APL.

Most mainstream languages, however, seem to be verbose to approximately the same order of magnitude. My experience is mostly with C#, so I'll use that (and similar languages like Java) as a starting point.

When I write C# code, I become uncomfortable when my method size approaches fifteen or twenty lines of code. C# is, however, a fairly wordy language, so it sometimes happens that I have to allow a method to grow larger. My limit is probably somewhere around 25 lines of code.

That's an arbitrary number, but if I have to quote a number, it would be around that size. Since it's arbitrary anyway, let's make it 24, for reasons that I'll explain later.

The maximum line count of a C# (or Java, or JavaScript, etc.) method, then, should be 24.

To repeat the point from before, this depends on the language. I'd consider a 24-line Haskell or F# function to be so huge that if I received it as a pull request, I'd reject it on the grounds of size alone.

Narrow line width #

Most languages allow for flexibility in layout. For example, C-based languages use the ; character as a delimiter. This enables you to write more than one statement per line:

var foo = 32; var bar = foo + 10; Console.WriteLine(bar);

You could attempt to avoid the 24-line-height rule by writing wide lines. That would, however, be to defeat the purpose.

The purpose of writing small methods is to nudge yourself towards writing readable code; code that fits in your brain. The smaller, the better.

For completeness sake, let's institute a maximum line width as well. If there's any accepted industry standard for maximum line width, it's 80 characters. I've used that maximum for years, and it's a good maximum.

Like all other programmers, other people's code annoys me. The most common annoyance is that people write too wide code.

This is probably because most programmers have drunk the Cool Aid that bigger screens make you more productive. When you code on a big screen, you don't notice how wide your lines become.

There's many scenarios where wide code is problematic:

  • When you're comparing changes to a file side-by-side. This often happens when you review pull requests. Now you have only half of your normal screen width.
  • When you're looking at code on a smaller device.
  • When you're getting old, or are otherwise visually impaired. After I turned 40, I discovered that I found it increasingly difficult to see small things. I still use a 10-point font for programming, but I foresee that this will not last much longer.
  • When you're mob programming you're limited to the size of the shared screen.
  • When you're sharing your screen via the web, for remote pair programming or similar.
  • When you're presenting code at meetups, user groups, conferences, etc.
What most programmers need, I think, is just a nudge. In Visual Studio, for example, you can install the Editor Guidelines extension, which will display one or more vertical guidelines. You can configure it as you'd like, but I've mine set to 80 characters, and bright red:

Screen shot of editor with code, showing red vertical line at 80 characters.

Notice the red dotted vertical line that cuts through universe. It tells me where the 80 character limit is.

Terminal box #

The 80-character limit has a long and venerable history, but what about the 24-line limit? While both are, ultimately, arbitrary, both fit the size of the popular VT100 terminal, which had a display resolution of 80x24 characters.

A box of 80x24 characters thus reproduces the size of an old terminal. Does this mean that I suggest that you should write programs on terminals? No, people always misunderstand this. That should be the maximum size of a method. On larger screens, you'd be able to see multiple small methods at once. For example, you could view a unit test and its target in a split screen configuration.

The exact sizes are arbitrary, but I think that there's something fundamentally right about such continuity with the past.

I've been using the 80-character mark as a soft limit for years. I tend to stay within it, but I occasionally decide to make my code a little wider. I haven't paid quite as much attention to the number of lines of my methods, but only for the reason that I know that I tend to write methods shorter than that. Both limits have served me well for years.

Example #

Consider this example:

public ActionResult Post(ReservationDto dto)
{
    var validationMsg = Validator.Validate(dto);
    if (validationMsg != "")
        return BadRequest(validationMsg);
 
    var reservation = Mapper.Map(dto);
    var reservations = Repository.ReadReservations(reservation.Date);
 
    var accepted = maîtreD.CanAccept(reservationsreservation);
    if (!accepted)
        return StatusCode(
            StatusCodes.Status500InternalServerError,
            "Couldn't accept.");
 
    var id = Repository.Create(reservation);
    return Ok(id);
}

This method is 18 lines long, which includes the method declaration, curly brackets and blank lines. It easily stays within the 80-character limit. Note that I've deliberately formatted the code so that it behaves. You can see it in this fragment:

return StatusCode(
    StatusCodes.Status500InternalServerError,
    "Couldn't accept.");

Most people write it like this:

return StatusCode(StatusCodes.Status500InternalServerError, "Couldn't accept.");

That doesn't look bad, but I've seen much worse examples.

Another key to writing small methods is to call other methods. The above Post method doesn't look like much, but significant functionality could be hiding behind Validator.Validate, Repository.ReadReservations, or maîtreD.CanAccept. I hope that you agree that each of these objects and methods are named well enough to give you an idea about their purpose.

Code that fits in your brain #

As I describe in my Humane Code video, the human brain can only keep track of about seven things. I think that this rule of thumb applies to the way we read and interpret code. If you need to understand and keep track of more than seven separate things at the same time, the code becomes harder to understand.

This could explain why small methods are good. They're only good, however, if they're self-contained. When you look at a method like the above Post method, you'll be most effective if you don't need to have a deep understanding of how each of the dependencies work. If this is true, the method only juggles about five dependencies: Validator, Mapper, Repository, maîtreD, and its own base class (which provides the methods BadRequest, StatusCode, and Ok). Five dependencies is fewer than seven.

Another way to evaluate the cognitive load of a method is to measure its cyclomatic complexity. The Post method's cyclomatic complexity is 3, so that should be easily within the brain's capacity.

These are all heuristics, so read this for inspiration, not as law. They've served me well for years, though.

Conclusion #

You've probably heard about the 80/20 rule, also known as the Pareto principle. Perhaps the title lead you to believe that this article was a misunderstanding. I admit that I went for an arresting title; perhaps a more proper name is the 80x24 rule.

The exact numbers can vary, but I've found a maximum method size of 80x24 characters to work well for C#.


Comments

Jiehong #

As a matter of fact, terminals had 80 characters lines, because IBM punch cards, representing only 1 line, had 80 symbols (even though only the first 72 were used at first). However, I don't know why terminals settled for 24 lines! In Java, which is similar to C# in term of verbosity, Clean Code tend to push towards 20-lines long functions or less. One of the danger to make functions even smaller is that many more functions can create many indirections, and that becomes harder to keep track within our brains.

2019-11-04 11:13 UTC
Terrell #

Some additional terminal sizing history in Mike Hoye's recent similarly-named post: http://exple.tive.org/blarg/2019/10/23/80x25/ Spoiler - Banknotes!

2019-11-04 13:25 UTC

A basic Haskell solution to the robot journeys coding exercise

Monday, 28 October 2019 04:34:00 UTC

This article shows an idiomatic, yet beginner-friendly Haskell solution to a coding exercise.

Mike Hadlow tweeted a coding exercise that involves parsing and evaluating instruction sets. Haskell excels at such problems, so I decided to give it a go. Since this was only an exercise for the fun of it, I didn't want to set up a complete Haskell project. Rather, I wanted to write one or two .hs files that I could interact with via GHCi. This means no lenses, monad transformers, or other fancy libraries.

Hopefully, this makes the code friendly to Haskell beginners. It shows what I consider idiomatic, but basic Haskell, solving a problem of moderate difficulty.

The problem #

Mike Hadlow has a detailed description of the exercise, but in short, you're given a file with a set of instructions that look like this:

1 1 E
RFRFRFRF
1 1 E

The first and last lines describe the position and orientation of a robot. The first line, for example, describes a robot at position (1, 1) facing east. A robot can face in one of the four normal directions of the map: north, east, south, and west.

The first line gives the robot's start position, and the last line the expected end position.

The middle line is a set of instructions to the robot. It can turn left or right, or move forward.

The exercise is to evaluate whether journeys are valid; that is, whether the robot's end position matches the expected end position if it follows the commands.

Imports #

I managed to solve the exercise with a single Main.hs file. Here's the module declaration and the required imports:

module Main where
 
import Data.Foldable
import Data.Ord
import Text.Read (readPrec)
import Text.ParserCombinators.ReadP
import Text.ParserCombinators.ReadPrec (readPrec_to_PminPrec)

These imports are only required to support parsing of input. Once parsed, you can evaluate each journey using nothing but the functions available in the standard Prelude.

Types #

Haskell is a statically typed language, so it often pays to define some types. Granted, the exercise hardly warrants all of these types, but as an example of idiomatic Haskell, I think that this is still good practice. After all, Haskell types are easy to declare. Often, they are one-liners:

data Direction = North | East | South | West deriving (EqShowRead)

The Direction type enumerates the four corners of the world.

data Robot = Robot {
    robotPosition :: (Integer, Integer)
  , robotDirection :: Direction }
  deriving (EqShowRead)

The Robot record type represents the state of a robot: its position and the direction it faces.

You'll also need to enumerate the commands that you can give a robot:

data Command = TurnLeft | TurnRight | MoveForward deriving (EqShowRead)

Finally, you can also define a type for a Journey:

data Journey = Journey {
    journeyStart :: Robot
  , journeyCommands :: [Command]
  , journeyEnd :: Robot }
  deriving (EqShowRead)

These are all the types required for solving the exercise.

Parsing #

The format of the input file is simple enough that it could be done in an ad-hoc fashion using lines, word, read, and a few other low-level functions. While the format barely warrants the use of parser combinators, I'll still use some to showcase the power of that approach.

Since one of my goals is to implement the functionality using a single .hs file, I can't pull in external parser combinator libraries. Instead, I'll use the built-in ReadP module, which I've often found sufficient to parse files like the present exercise input file.

First, you're going to have to be able to parse numbers, which can be done using the Read type class. You'll need, however, to be able to compose Integer parsers with other ReadP parsers.

parseRead :: Read a => ReadP a
parseRead = readPrec_to_P readPrec minPrec

This turns every Read instance value into a ReadP value. (I admit that I wasn't sure which precedence number to use, but minPrec seems to work.)

Next, you need a parser for Direction values:

parseDirection :: ReadP Direction
parseDirection =
  choice [
    char 'N' >> return North,
    char 'E' >> return East,
    char 'S' >> return South,
    char 'W' >> return West ]

Notice how declarative this looks. The choice function combines a list of other parsers. When an individual parser in that list encounters the 'N' character, it'll parse it as North, 'E' as East, and so on.

You can now parse an entire Robot using the Applicative <*> and <* operators.

parseRobot :: ReadP Robot
parseRobot =
  (\x y d -> Robot (x, y) d) <$>
  (parseRead <* char ' ') <*>
  (parseRead <* char ' ') <*>
   parseDirection

The <*> operator combines two parsers by using the output of both of them, whereas the <* combines two parsers by running both of them, but discarding the output of the right-hand parser. A good mnemonic is that the operator points to the parser that produces an output. Here', the parseRobot function uses the <* operator to require that each number is followed by a space. The space, however, is just a delimiter, so you throw it away.

parseRead parses any Read instance. Here, the parseRobot function uses it to parse each Integer in a robot's position. It also uses parseDirection to parse the robot's direction.

Similar to how you can parse directions, you can also parse the commands:

parseCommand :: ReadP Command
parseCommand =
  choice [
    char 'L' >> return TurnLeft,
    char 'R' >> return TurnRight,
    char 'F' >> return MoveForward]

Likewise, similar to how you parse a single robot, you can now parse a journey:

parseJourney :: ReadP Journey
parseJourney =
  Journey <$>
  (parseRobot <* string "\n") <*>
  (many parseCommand <* string "\n") <*>
   parseRobot

The only new element compared to parseRobot is the use of the many parser combinator, which looks for zero, one, or many Command values.

This gives you a way to parse a complete journey, but the input file contains many of those, separated by newlines and other whitespace:

parseJourneys :: ReadP [Journey]
parseJourneys = parseJourney `sepBy` skipSpaces

Finally, you can parse a multi-line string into a list of journeys:

parseInput :: String -> [Journey]
parseInput = fst . minimumBy (comparing snd) . readP_to_S parseJourneys

When you run readP_to_S, it'll produce a list of alternatives, as there's more than one way to interpret the file according to parseJourneys. Each alternative is presented as a tuple of the parse result and the remaining (or unconsumed) string. I'm after the alternative that consumes as much of the input file as possible (which turns out to be all of it), so I use minimumBy to find the tuple that has the smallest second element. Then I return the first element of that tuple.

Play around with readP_to_S parseJourneys in GHCi if you want all the details.

Evaluation #

Haskell beginners may still find operators like <*> cryptic, but they're essential to parser combinators. Evaluation of the journeys is, in comparison, simple.

You can start by defining a function to turn right:

turnRight :: Robot -> Robot
turnRight r@(Robot _ North) = r { robotDirection = East }
turnRight r@(Robot _  East) = r { robotDirection = South }
turnRight r@(Robot _ South) = r { robotDirection = West }
turnRight r@(Robot _  West) = r { robotDirection = North }

There's more than one way to write a function that rotates one direction to the right, but I chose one that I found most readable. It trades clarity for verbosity by relying on simple pattern matching. I hope that it's easy to understand for Haskell beginners, and perhaps even for people who haven't seen Haskell code before.

The function to turn left uses the same structure:

turnLeft :: Robot -> Robot
turnLeft r@(Robot _ North) = r { robotDirection = West }
turnLeft r@(Robot _  West) = r { robotDirection = South }
turnLeft r@(Robot _ South) = r { robotDirection = East }
turnLeft r@(Robot _  East) = r { robotDirection = North }

The last command you need to implement is moving forward:

moveForward :: Robot -> Robot
moveForward (Robot (x, y) North) = Robot (x, y + 1) North
moveForward (Robot (x, y)  East) = Robot (x + 1, y) East
moveForward (Robot (x, y) South) = Robot (x, y - 1) South
moveForward (Robot (x, y)  West) = Robot (x - 1, y) West

The moveForward function also pattern-matches on the direction the robot is facing, this time to increment or decrement the x or y coordinate as appropriate.

You can now evaluate all three commands:

evalCommand :: Command -> Robot -> Robot
evalCommand   TurnRight = turnRight
evalCommand    TurnLeft = turnLeft
evalCommand MoveForward = moveForward

The evalCommand pattern-matches on all three Command cases and returns the appropriate function for each.

You can now evaluate whether a Journey is valid:

isJourneyValid :: Journey -> Bool
isJourneyValid (Journey s cs e) = foldl (flip evalCommand) s cs == e

The isJourneyValid function pattern-matches the constituent values out of Journey. I named the journeyStart value s (for start), the journeyCommands value cs (for commands), and the journeyEnd value e (for end).

The evalCommand function evaluates a single Command, but a Journey contains many commands. You'll need to evaluate the first command to find the position from which you evaluate the second command, and so on. Imperative programmers would use a for loop for something like that, but in functional programming, a fold, in this case from the left, is how it's done.

foldl requires you to supply an initial state s as well as the list of commands cs. The entire foldl expression produces a final Robot state that you can compare against the expected end state e.

Execution #

Load the input file, parse it, and evaluate each journey in the main function:

main :: IO ()
main = do
  input <- parseInput <$> readFile "input.txt"
  mapM_ print $ isJourneyValid <$> input

I just load the Main.hs file in GHCi and run the main function:

Prelude> :load Main.hs
[1 of 1] Compiling Main             ( Main.hs, interpreted )
Ok, one module loaded.
*Main> main
True
True
True

I used the same input file as Mike Hadlow, and it turns out that all journeys are valid. That's not what I'd expected from an exercise like this, so I cloned and ran Mike's solution as well, and it seems that it arrives at the same result.

Conclusion #

Haskell is a great language for small coding exercises that require parsing and interpretation. In this article, I demonstrated one solution to the robot journeys coding exercise. My goal was to show some beginner-friendly, but still idiomatic Haskell code.

Granted, the use of parser combinators is on the verge of being overkill, but I wanted to show an example; Haskell examples are scarce, so I hope it's helpful.


A red-green-refactor checklist

Monday, 21 October 2019 06:49:00 UTC

A simple read-do checklist for test-driven development.

I recently read The Checklist Manifesto, a book about the power of checklists. That may sound off-putting and tedious, but I actually found it inspiring. It explains how checklists empower skilled professionals to focus on difficult problems, while preventing avoidable mistakes.

Since I read the book with the intent to see if there were ideas that we could apply in software development, I thought about checklists one might create for software development. Possibly the simplest checklist is one that describes the red-green-refactor cycle of test-driven development.

Types of checklists #

As the book describes, there's basically two types of checklists:

  • Do-confirm. With such a checklist, you perform a set of tasks, and then subsequently, at a sufficient pause point go through the checklist to verify that you remembered to perform all the tasks on the list.
  • Read-do. With this type of checklist, you read each item for instructions and then perform the task. Only when you've performed the task do you move on to the next item on the list.
I find it most intuitive to describe the red-green-refactor cycle as a read-do list. I did, however, find it expedient to include a do-confirm sub-list for one of the overall steps.

This list is, I think, mostly useful if you're still learning test-driven development. It can be easily internalised. As such, I offer this for inspiration, and as a learning aid.

Red-green-refactor checklist #

Read each of the steps in the list and perform the task.

  1. Write a failing test.
    • Did you run the test?
    • Did it fail?
    • Did it fail because of an assertion?
    • Did it fail because of the last assertion?
  2. Make all tests pass by doing the simplest thing that could possibly work.
  3. Consider the resulting code. Can it be improved? If so, do it, but make sure that all tests still pass.
  4. Repeat
Perhaps the most value this checklist provides isn't so much the overall read-do list, but rather the subordinate do-confirm list associated with the first step.

I regularly see people write failing tests as an initial step. The reason the test fails, however, is because the implementation throws an exception.

Improperly failing tests #

Consider, as an example, the first test you might write when doing the FizzBuzz kata.

[Fact]
public void One()
{
    string actual = FizzBuzz.Convert(1);
    Assert.Equal("1"actual);
}

I wrote this test first (i.e. before the 'production' code) and used Visual Studio's refactoring tools to generate the implied type and method.

When I run the test, it fails.

Further investigation, however, reveals that the test fails when Convert is called:

Ploeh.Katas.FizzBuzzKata.FizzBuzzTests.One
   Source: FizzBuzzTests.cs line: 11
   Duration: 8 ms

  Message: 
    System.NotImplementedException : The method or operation is not implemented.
  Stack Trace: 
    at FizzBuzz.Convert(Int32 i) in FizzBuzz.cs line: 9
    at FizzBuzzTests.One() in FizzBuzzTests.cs line: 13

This is hardly surprising, since this is the current 'implementation':

public static string Convert(int i)
{
    throw new NotImplementedException();
}

This is what the subordinate do-confirm checklist is for. Did the test fail because of an assertion? In this case, the answer is no.

This means that you're not yet done with the read phase.

Properly failing tests #

You can address the issue by changing the Convert method:

public static string Convert(int i)
{
    return "";
}

This causes the test to fail because of an assertion:

 Ploeh.Katas.FizzBuzzKata.FizzBuzzTests.One
   Source: FizzBuzzTests.cs line: 11
   Duration: 13 ms

  Message: 
    Assert.Equal() Failure
              ↓ (pos 0)
    Expected: 1
    Actual:   
              ↑ (pos 0)
  Stack Trace: 
    at FizzBuzzTests.One() in FizzBuzzTests.cs line: 14

Not only does the test fail because of an assertion - it fails because of the last assertion (since there's only one assertion). This completes the do-confirm checklist, and you're now ready to make the simplest change that could possibly work:

public static string Convert(int i)
{
    return "1";
}

This passes the test suite.

Conclusion #

It's important to see tests fail. Particularly, it's important to see tests fail for the reason you expect them to fail. You'd be surprised how often you inadvertently write an assertion that can never fail.

Once you've seen the test fail for the proper reason, make it pass.

Finally, refactor the code if necessary.


Comments

I remember the first time that I realized that I did the red step wrong because my test didn't fail for the intended reason (i.e. it didn't fail because of an assertion). Before that, I didn't realize that I needed to This is a nice programming checklist. Thanks for sharing it :)

3. Consider the resulting code. Can it be improved? If so, do it, but make sure that all tests still pass.
Finally, refactor the code if necessary.

If I can be a Devil's advocate for a moment, then I would say that code can always be improved and few things are necessary. In all honesty though, I think the refactoring step is the most interesting. All three steps include aspects of science and art, but I think the refactor step includes the most of both. On the one hand, it is extremely creative and full of judgement calls about what code should be refactored and what properties the resulting code should have. On the other hand, much of the work of how to (properly) refactor is laid out in books like Martin Fowler's Refacoring and is akin to algebraic manipulations of an algebraic formula.

In other words, I feel like there is room to expand on this checklist in the refactor step. Do you have any thoughts about you might expand it?

2019-10-25 00:33 UTC

Tyson, thank you for writing. I agree that the refactoring step is both important and compelling. I can't, however, imagine how a checklist would be useful.

The point of The Checklist Manifesto is that checklists help identify avoidable mistakes. A checklist isn't intended to describe an algorithm, but rather to make sure that crucial steps aren't forgotten.

Another important point from The Checklist Manifesto is that a checklist is only effective if it's not too big. A checklist that tries to cover every eventuality isn't useful, because then people don't follow it.

As you write, refactoring is a big topic, covered by several books. All the creativity and experience that goes into refactoring doesn't seem like something that can easily be expressed as an effective checklist.

I don't mind being proven wrong, though, so by all means give it a go.

2019-10-25 21:51 UTC

Tautological assertion

Monday, 14 October 2019 18:39:00 UTC

It's surprisingly easy to write a unit test assertion that never fails.

Recently I was mob programming with a pair of IDQ's programmers. We were starting a new code base, using test-driven development (TDD). This was the first test we wrote:

[Fact]
public async Task HandleObserveUnitStatusStartsSaga()
{
    var subscribers = new List<Guid> { Guid.Parse("{4D093799-9CCC-4135-8CB3-8661985A5853}") };
    var sut = new StatusPolicy
    {
        Data = new StatusPolicyData
        {
            UnitId = 123,
            Subscribers = subscribers
        }
    };
 
    var subscriber = Guid.Parse("{003C5527-7747-4C7A-980E-67040DB738C3}");
    var message = new ObserveUnitStatus(123, subscriber);
    var context = new TestableMessageHandlerContext();
    await sut.Handle(messagecontext);
 
    Assert.Contains(subscribersut.Data.Subscribers);
}

This unit test uses xUnit.net 2.4.0 and NServiceBus 7.1.10 on .NET Core 2.2. The System Under Test (SUT) is intended to be an NServiceBus Saga that monitors a resource for status changes. If a unit changes status, the Saga will alert its subscribers.

The test verifies that when a new subscriber wishes to observe a unit, then its ID is added to the policy's list of subscribers.

The test induced us to implement Handle like this:

public Task Handle(ObserveUnitStatus messageIMessageHandlerContext context)
{
    Data.Subscribers.Add(message.SubscriberId);
    return Task.CompletedTask;
}

Following the red-green-refactor cycle of TDD, this seemed an appropriate implementation.

Enter the Devil #

I often use the Devil's advocate technique to figure out what to do next, so I made this change to the Handle method:

public Task Handle(ObserveUnitStatus messageIMessageHandlerContext context)
{
    Data.Subscribers.Clear();
    Data.Subscribers.Add(message.SubscriberId);
    return Task.CompletedTask;
}

The change is that the method first deletes all existing subscribers. This is obviously wrong, but it passes all tests. That's no surprise, since I intentionally introduced the change to make us improve the test.

False negative #

We had to write a new test, or improve the existing test, so that the defect I just introduced would be caught. I suggested an improvement to the existing test:

[Fact]
public async Task HandleObserveUnitStatusStartsSaga()
{
    var subscribers = new List<Guid> { Guid.Parse("{4D093799-9CCC-4135-8CB3-8661985A5853}") };
    var sut = new StatusPolicy
    {
        Data = new StatusPolicyData
        {
            UnitId = 123,
            Subscribers = subscribers
        }
    };
 
    var subscriber = Guid.Parse("{003C5527-7747-4C7A-980E-67040DB738C3}");
    var message = new ObserveUnitStatus(123, subscriber);
    var context = new TestableMessageHandlerContext();
    await sut.Handle(messagecontext);
 
    Assert.Contains(subscribersut.Data.Subscribers);
    Assert.Superset(
        expectedSubsetnew HashSet<Guid>(subscribers),
        actualnew HashSet<Guid>(sut.Data.Subscribers));
}

The only change is the addition of the last assertion.

Smugly I asked the keyboard driver to run the tests, anticipating that it would now fail.

It passed.

We'd just managed to write a false negative. Even though there's a defect in the code, the test still passes. I was nonplussed. None of us expected the test to pass, yet it does.

It took us a minute to figure out what was wrong. Before you read on, try to figure it out for yourself. Perhaps it's immediately clear to you, but it took three people with decades of programming experience a few minutes to spot the problem.

Aliasing #

The problem is aliasing. While named differently, subscribers and sut.Data.Subscribers is the same object. Of course one is a subset of the other, since a set is considered to be a subset of itself.

The assertion is tautological. It can never fail.

Fixing the problem #

It's surprisingly easy to write tautological assertions when working with mutable state. This regularly happens to me, perhaps a few times a month. Once you've realised that this has happened, however, it's easy to address.

subscribers shouldn't change during the test, so make it immutable.

[Fact]
public async Task HandleObserveUnitStatusStartsSaga()
{
    IEnumerable<Guidsubscribers =
        new[] { Guid.Parse("{4D093799-9CCC-4135-8CB3-8661985A5853}") };
    var sut = new StatusPolicy
    {
        Data = new StatusPolicyData
        {
            UnitId = 123,
            Subscribers = subscribers.ToList()
        }
    };
 
    var subscriber = Guid.Parse("{003C5527-7747-4C7A-980E-67040DB738C3}");
    var message = new ObserveUnitStatus(123, subscriber);
    var context = new TestableMessageHandlerContext();
    await sut.Handle(messagecontext);
 
    Assert.Contains(subscribersut.Data.Subscribers);
    Assert.Superset(
        expectedSubsetnew HashSet<Guid>(subscribers),
        actualnew HashSet<Guid>(sut.Data.Subscribers));
}

An array strictly isn't immutable, but declaring it as IEnumerable<Guid> hides the mutation capabilities. The test now has to copy subscribers to a list before assigning it to the policy's data. This anti-aliases subscribers from sut.Data.Subscribers, and causes the test to fail. After all, there's a defect in the Handle method.

You now have to remove the offending line:

public Task Handle(ObserveUnitStatus messageIMessageHandlerContext context)
{
    Data.Subscribers.Add(message.SubscriberId);
    return Task.CompletedTask;
}

This makes the test pass.

Summary #

This article shows an example where I was surprised by aliasing. An assertion that I thought would fail turned out to be a false negative.

You can easily make the mistake of writing a test that always passes. If you haven't tried it, you may think that you're too smart to do that, but it regularly happens to me. Other TDD practitioners have told me that it also happens to them.

This is the reason that the red-green-refactor process encourages you to run each new test and see it fail. If you haven't seen it fail, you don't know if you've avoided a tautology.


Devil's advocate

Monday, 07 October 2019 15:00:00 UTC

How do you know when you have enough test cases. The Devil's Advocate technique can help you decide.

When I review unit tests, I often utilise a technique I call Devil's Advocate. I do the same whenever I consider if I have a sufficient number of test cases. The first time I explicitly named the technique was, I think, in my Outside-in TDD Pluralsight course, in which I also discuss the so-called Gollum style variation. I don't think, however, that I've ever written an article explicitly about this topic. The current text attempts to rectify that omission.

Coverage #

Programmers new to unit testing often struggle with identifying useful test cases. I sometimes see people writing redundant unit tests, while, on the other hand, forgetting to add important test cases. How do you know which test cases to add, and how do you know when you've added enough?

I may return to the first question in another article, but in this, I wish to address the second question. How do you know that you have a sufficient set of test cases?

You may think that this is a question of turning on code coverage. Surely, if you have 100% code coverage, that's sufficient?

It's not. Consider this simple class:

public class MaîtreD
{
    public MaîtreD(int capacity)
    {
        Capacity = capacity;
    }
 
    public int Capacity { get; }
 
    public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
    {
        var reservedSeats = reservations.Sum(r => r.Quantity);
 
        if (Capacity < reservedSeats + reservation.Quantity)
            return false;
 
        return true;
    }
}

This class implements the (simplified) decision logic for an online restaurant reservation system. The CanAccept method has a cyclomatic complexity of 2, so it should be easy to cover with a pair of unit tests:

[Fact]
public void CanAcceptWithNoPriorReservations()
{
    var reservation = new Reservation
    {
        Date = new DateTime(2018, 8, 30),
        Quantity = 4
    };
    var sut = new MaîtreD(capacity: 10);
 
    var actual = sut.CanAccept(new Reservation[0], reservation);
 
    Assert.True(actual);
}
 
[Fact]
public void CanAcceptOnInsufficientCapacity()
{
    var reservation = new Reservation
    {
        Date = new DateTime(2018, 8, 30),
        Quantity = 4
    };
    var sut = new MaîtreD(capacity: 10);
 
    var actual = sut.CanAccept(
        new[] { new Reservation { Quantity = 7 } },
        reservation);
 
    Assert.False(actual);
}

These two tests together completely cover the CanAccept method:

Screen shot showing that the CanAccept method is 100% covered.

You'd think that this is a sufficient number of test cases of the method, then.

As the Devil reads the Bible #

In Scandinavia we have an idiom that Kent Beck (who's worked with Norwegian companies) has also encountered:

"TIL: "like the devil reads the Bible"--meaning someone who carefully reads a book to subvert its intent"

We have the same saying in Danish, and the Swedes also use it.

If you think of a unit test suite as an executable specification, you may consider if you can follow the specification to the letter while intentionally introduce a defect. You can easily do that with the above CanAccept method:

public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
{
    var reservedSeats = reservations.Sum(r => r.Quantity);
 
    if (Capacity <= reservedSeats + reservation.Quantity)
        return false;
 
    return true;
}

This still passes both tests, and still has a code coverage of 100%, yet it's 'obviously' wrong.

Can you spot the difference?

Instead of a less-than comparison, it now uses a less-than-or-equal comparison. You could easily, inadvertently, make such a mistake while programming. It belongs in the category of off-by-one errors, which is one of the most common type of bugs.

This is, in a nutshell, the Devil's Advocate technique. The intent isn't to break the software by sneaking in defects, but to explore how effectively the test suite detects bugs. In the current (simplified) example, the effectiveness of the test suite isn't impressive.

Add test cases #

The problem introduced by the Devil's Advocate is an edge case. If the reservation under consideration fits the restaurant's remaining capacity, but entirely consumes it, the MaîtreD class should still accept it. Currently, however, it doesn't.

It'd seem that the obvious solution is to 'fix' the unit test:

[Fact]
public void CanAcceptWithNoPriorReservations()
{
    var reservation = new Reservation
    {
        Date = new DateTime(2018, 8, 30),
        Quantity = 10
    };
    var sut = new MaîtreD(capacity: 10);
 
    var actual = sut.CanAccept(new Reservation[0], reservation);
 
    Assert.True(actual);
}

Changing the requested Quantity to 10 does, indeed, cause the test to fail.

Beyond mutation testing #

Until this point, you may think that the Devil's Advocate just looks like an ad-hoc, informally-specified, error-prone, manual version of half of mutation testing. So far, the change I made above could also have been made during mutation testing.

What I sometimes do with the Devil's Advocate technique is to experiment with other, less heuristically driven changes. For instance, based on my knowledge of the existing test cases, it's not too difficult to come up with this change:

public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
{
    var reservedSeats = reservations.Sum(r => r.Quantity);
 
    if (reservation.Quantity != 10)
        return false;
 
    return true;
}

That's an even simpler implementation than the original, but obviously wrong.

This should prompt you to add at least one other test case:

[Theory]
[InlineData( 4)]
[InlineData(10)]
public void CanAcceptWithNoPriorReservations(int quantity)
{
    var reservation = new Reservation
    {
        Date = new DateTime(2018, 8, 30),
        Quantity = quantity
    };
    var sut = new MaîtreD(capacity: 10);
 
    var actual = sut.CanAccept(new Reservation[0], reservation);
 
    Assert.True(actual);
}

Notice that I converted the test to a parametrised test. This breaks the Devil's latest attempt, while the original implementation passes all tests.

The Devil, not to be outdone, now switches tactics and goes after the reservations instead:

public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
{
    return !reservations.Any();
}

This still passes all tests, including the new test case. This indicates that you'll need to add at least one test case with existing reservations, but where there's still enough capacity to accept another reservation:

[Fact]
public void CanAcceptWithOnePriorReservation()
{
    var reservation = new Reservation
    {
        Date = new DateTime(2018, 8, 30),
        Quantity = 4
    };
    var sut = new MaîtreD(capacity: 10);
 
    var actual = sut.CanAccept(
        new[] { new Reservation { Quantity = 4 } },
        reservation);
 
    Assert.True(actual);
}

This new test fails, prompting you to correct the implementation of CanAccept. The Devil, however, can do this:

public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
{
    var reservedSeats = reservations.Sum(r => r.Quantity);
    return reservedSeats != 7;
}

This is still not correct, but passes all tests. It does, however, look like you're getting closer to a proper implementation.

Reverse Transformation Priority Premise #

If you find this process oddly familiar, it's because it resembles the Transformation Priority Premise (TPP), just reversed.

“As the tests get more specific, the code gets more generic.”

When I test-drive code, I often try to follow the TPP, but when I review code with tests, the code and the tests are already in place, and it's my task to assess both.

Applying the Devil's Advocate review technique to CanAccept, it seems as though I'm getting closer to a proper implementation. It does, however, require more tests. As your next move you may, for instance, consider parametrising the test case that verifies what happens when capacity is insufficient:

[Theory]
[InlineData(7)]
[InlineData(8)]
public void CanAcceptOnInsufficientCapacity(int reservedSeats)
{
    var reservation = new Reservation
    {
        Date = new DateTime(2018, 8, 30),
        Quantity = 4
    };
    var sut = new MaîtreD(capacity: 10);
 
    var actual = sut.CanAccept(
        new[] { new Reservation { Quantity = reservedSeats } },
        reservation);
 
    Assert.False(actual);
}

That doesn't help much, though, because this passes all tests:

public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
{
    var reservedSeats = reservations.Sum(r => r.Quantity);
    return reservedSeats < 7;
}

Compared to the initial, 'desired' implementation, there's at least two issues with this code:

  • It doesn't consider reservation.Quantity
  • It doesn't take into account the Capacity of the restaurant
This indicates that you're going to have to add more test cases, varying both reservation.Quantity and Capacity. The happy-path test cases already varies reservation.Quantity a bit, but CanAcceptOnInsufficientCapacity does not, so perhaps you can follow the TPP by varying reservation.Quantity in that method as well:

[Theory]
[InlineData( 1, 10)]
[InlineData( 2,  9)]
[InlineData( 3,  8)]
[InlineData( 4,  7)]
[InlineData( 4,  8)]
[InlineData( 5,  6)]
[InlineData( 6,  5)]
[InlineData(10,  1)]
public void CanAcceptOnInsufficientCapacity(int quantityint reservedSeats)
{
    var reservation = new Reservation
    {
        Date = new DateTime(2018, 8, 30),
        Quantity = quantity
    };
    var sut = new MaîtreD(capacity: 10);
 
    var actual = sut.CanAccept(
        new[] { new Reservation { Quantity = reservedSeats } },
        reservation);
 
    Assert.False(actual);
}

This makes it harder for the Devil to come up with a malevolent implementation. Harder, but not impossible.

It seems clear that since all test cases still use a hard-coded capacity, it ought to be possible to write an implementation that ignores the Capacity, but at this point I don't see a simple way to avoid looking at reservation.Quantity:

public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
{
    var reservedSeats = reservations.Sum(r => r.Quantity);
    return reservedSeats + reservation.Quantity < 11;
}

This implementation passes all the tests. The last batch of test cases forced the Devil to consider reservation.Quantity. This strongly implies that if you vary Capacity as well, the proper implementation out to emerge.

Diminishing returns #

What happens, then, if you add just one test case with a different Capacity?

[Theory]
[InlineData( 1, 10, 10)]
[InlineData( 2,  9, 10)]
[InlineData( 3,  8, 10)]
[InlineData( 4,  7, 10)]
[InlineData( 4,  8, 10)]
[InlineData( 5,  6, 10)]
[InlineData( 6,  5, 10)]
[InlineData(10,  1, 10)]
[InlineData( 1,  1,  1)]
public void CanAcceptOnInsufficientCapacity(
    int quantity,
    int reservedSeats,
    int capacity)
{
    var reservation = new Reservation
    {
        Date = new DateTime(2018, 8, 30),
        Quantity = quantity
    };
    var sut = new MaîtreD(capacity);
 
    var actual = sut.CanAccept(
        new[] { new Reservation { Quantity = reservedSeats } },
        reservation);
 
    Assert.False(actual);
}

Notice that I just added one test case with a Capacity of 1.

You may think that this is about where the Devil ought to capitulate, but not so. This passes all tests:

public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
{
    var reservedSeats = 0;
    foreach (var r in reservations)
    {
        reservedSeats = r.Quantity;
        break;
    }
    return reservedSeats + reservation.Quantity <= Capacity;
}

Here you may feel the urge to protest. So far, all the Devil's Advocate implementations have been objectively simpler than the 'desired' implementation because it has involved fewer elements and has had a lower or equivalent cyclomatic complexity. This new attempt to circumvent the specification seems more complex.

It's also seems clearly ill-intentioned. Recall that the intent of the Devil's Advocate technique isn't to 'cheat' the unit tests, but rather to explore how well the test describe the desired behaviour of the system. The motivation is that it's easy to make off-by-one errors like inadvertently use <= instead of <. It doesn't seem quite as reasonable that a well-intentioned programmer accidentally would leave behind an implementation like the above.

You can, however, make it look less complicated:

public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
{
    var reservedSeats = reservations.Select(r => r.Quantity).FirstOrDefault();
    return reservedSeats + reservation.Quantity <= Capacity;
}

You could argue that this still looks intentionally wrong, but I've seen much code that looks like this. It seems to me that there's a kind of programmer who seems generally uncomfortable thinking in collections; they seem to subconsciously gravitate towards code that deals with singular objects. Code that attempts to get 'the' value out of a collection is, unfortunately, not that uncommon.

Still, you might think that at this point, you've added enough test cases. That's reasonable.

The Devil's Advocate technique isn't an algorithm; it has no deterministic exit criterion. It's just a heuristic that I use to explore the quality of tests. There comes a point where subjectively, I judge that the test cases sufficiently describe the desired behaviour.

You may find that we've reached that point now. You could, for example, argue that in order to calculate reservedSeats, reservations.Sum(r => r.Quantity) is simpler than reservations.Select(r => r.Quantity).FirstOrDefault(). I'd be inclined to agree.

There's diminishing returns to the Devil's Advocate technique. Once you find that the gains from insisting on intentionally pernicious implementations are smaller than the effort required to add more test cases, it's time to stop and commit to the test cases now in place.

Test case variability #

Tests specify desired behaviour. If the tests contain less variability than the code they cover, then how can you be certain that the implementation code is correct?

The discussion now moves into territory where I usually exercise a great deal of judgement. Read the following for inspiration, not as rigid instructions. My intent with the following is not to imply that you must always go to like extremes, but simply to demonstrate what you can do. Depending on circumstances (such as the cost of a defect in production), I may choose to do the following, and sometimes I may choose to skip it.

If you consider the original implementation of CanAccept at the top of the article, notice that it works with reservations of indefinite size. If you think of reservations as a finite collection, it can contain zero, one, two, ten, or hundreds of elements. Yet, no test case goes beyond a single existing reservation. This is, I think, a disconnect. The tests come not even close to the degree of variability that the method can handle. If this is a piece of mission-critical software, that could be a cause for concern.

You should add some test cases where there's two, three, or more existing reservations. People often don't do that because it seems that you'd now have to write a test method that exercises one or more test cases with two existing reservations:

[Fact]
public void CanAcceptWithTwoPriorReservations()
{
    var reservation = new Reservation
    {
        Date = new DateTime(2018, 8, 30),
        Quantity = 4
    };
    var sut = new MaîtreD(capacity: 10);
 
    var actual = sut.CanAccept(
        new[] { new Reservation { Quantity = 4 }, new Reservation { Quantity = 1 } },
        reservation);
 
    Assert.True(actual);
}

While this method now covers the two-existing-reservations test case, you need one to cover the three-existing-reservations test case, and so on. This seems repetitive, and probably bothers you at more than one level:

  • It's just plain tedious to have to add that kind of variability
  • It seems to violate the DRY principle
I don't hold the DRY principle as an absolute that must always be followed, but it often indicates a maintainability problem. I think this is the case here, because the new CanAcceptWithTwoPriorReservations test method looks a lot like the previous CanAcceptWithOnePriorReservation method. If someone makes changes to the MaîtreD class, they would have to go and revisit all those test methods.

What you can do instead is to parametrise the key values of the collection(s) in question. While you can't put collections of objects in [InlineData] attributes, you can put arrays of constants. For existing reservations, the key values are the quantities, so supply an array of integers as a test argument:

[Theory]
[InlineData( 4, new int[0])]
[InlineData(10, new int[0])]
[InlineData( 4, new[] { 4 })]
[InlineData( 4, new[] { 4, 1 })]
[InlineData( 2, new[] { 2, 1, 3, 2 })]
public void CanAcceptWhenCapacityIsSufficient(int quantityint[] reservationQantities)
{
    var reservation = new Reservation
    {
        Date = new DateTime(2018, 8, 30),
        Quantity = quantity
    };
    var sut = new MaîtreD(capacity: 10);
 
    var reservations = reservationQantities.Select(q => new Reservation { Quantity = q });
    var actual = sut.CanAccept(reservationsreservation);
 
    Assert.True(actual);
}

This single test method replaces the previous three 'happy path' test methods. The first four [InlineData] annotations reproduce the previous test cases, whereas the fifth [InlineData] annotation adds a new test case with four existing reservations.

I gave the method a new name to better reflect the more general nature of it.

Notice that the CanAcceptWhenCapacityIsSufficient method uses Select to turn the array of integers into a collection of Reservation objects.

You may think that I cheated, since I didn't supply any other values, such as the Date property, to the existing reservations. This is easily addressed:

[Theory]
[InlineData( 4, new int[0])]
[InlineData(10, new int[0])]
[InlineData( 4, new[] { 4 })]
[InlineData( 4, new[] { 4, 1 })]
[InlineData( 2, new[] { 2, 1, 3, 2 })]
public void CanAcceptWhenCapacityIsSufficient(int quantityint[] reservationQantities)
{
    var date = new DateTime(2018, 8, 30);
    var reservation = new Reservation
    {
        Date = date,
        Quantity = quantity
    };
    var sut = new MaîtreD(capacity: 10);
 
    var reservations =
        reservationQantities.Select(q => new Reservation { Quantity = q, Date = date });
    var actual = sut.CanAccept(reservationsreservation);
 
    Assert.True(actual);
}

The only change compared to before is that date is now a variable assigned not only to reservation, but also to all the Reservation objects in reservations.

Towards property-based testing #

Looking at a test method like CanAcceptWhenCapacityIsSufficient it should bother you that the capacity is still hard-coded. Why don't you make that a test argument as well?

[Theory]
[InlineData(10,  4, new int[0])]
[InlineData(10, 10, new int[0])]
[InlineData(10,  4, new[] { 4 })]
[InlineData(10,  4, new[] { 4, 1 })]
[InlineData(10,  2, new[] { 2, 1, 3, 2 })]
[InlineData(20, 10, new[] { 2, 2, 2, 2 })]
[InlineData(20,  4, new[] { 2, 2, 4, 1, 3, 3 })]
public void CanAcceptWhenCapacityIsSufficient(
    int capacity,
    int quantity,
    int[] reservationQantities)
{
    var date = new DateTime(2018, 8, 30);
    var reservation = new Reservation
    {
        Date = date,
        Quantity = quantity
    };
    var sut = new MaîtreD(capacity);
 
    var reservations =
        reservationQantities.Select(q => new Reservation { Quantity = q, Date = date });
    var actual = sut.CanAccept(reservationsreservation);
 
    Assert.True(actual);
}

The first five [InlineData] annotations just reproduce the test cases that were already present, whereas the bottom two annotations are new test cases with another capacity.

How do I come up with new test cases? It's easy: In the happy-path case, the sum of existing reservation quantities, plus the requested quantity, must be less than or equal to the capacity.

It sometimes helps to slightly reframe the test method. If you allow the collection of existing reservations to be the most variable element in the test method, you can express the other values relative to that input. For example, instead of supplying the capacity as an absolute number, you can express a test case's capacity in relation to the existing reservations:

[Theory]
[InlineData(6,  4, new int[0])]
[InlineData(0, 10, new int[0])]
[InlineData(2,  4, new[] { 4 })]
[InlineData(1,  4, new[] { 4, 1 })]
[InlineData(0,  2, new[] { 2, 1, 3, 2 })]
[InlineData(2, 10, new[] { 2, 2, 2, 2 })]
[InlineData(1,  4, new[] { 2, 2, 4, 1, 3, 3 })]
public void CanAcceptWhenCapacityIsSufficient(
    int capacitySurplus,
    int quantity,
    int[] reservationQantities)
{
    var date = new DateTime(2018, 8, 30);
    var reservation = new Reservation
    {
        Date = date,
        Quantity = quantity
    };
    var reservedSeats = reservationQantities.Sum();
    var capacity = reservedSeats + quantity + capacitySurplus;
    var sut = new MaîtreD(capacity);
 
    var reservations =
        reservationQantities.Select(q => new Reservation { Quantity = q, Date = date });
    var actual = sut.CanAccept(reservationsreservation);
 
    Assert.True(actual);
}

Notice that the value supplied as a test argument is now named capacitySurplus. This represents the surplus capacity for each test case. For example, in the first test case, the capacity was previously supplied as the absolute number 10. The requested quantity is 4, and since there's no prior reservations in that test case, the capacity surplus, after accepting the reservation, is 6.

Likewise, in the second test case, the requested quantity is 10, and since the absolute capacity is also 10, when you reframe the test case, the surplus capacity, after accepting the reservation, is 0.

This seems odd if you aren't used to it. You'd probably intuitively think of a restaurant's Capacity as 'the most absolute' number, in that it's often a number that originates from physical constraints.

When you're looking for test cases, however, you aren't looking for test cases for a particular restaurant. You're looking for test cases for an arbitrary restaurant. In other words, you're looking for test inputs that belong to the same equivalence class.

Property-based testing #

I haven't explicitly stated this yet, but both the capacity and each reservation Quantity should be a positive number. This should really have been captured as a proper domain object, but I chose to keep these values as primitive integers in order to not complicate the example too much.

If you look at the test parameters for the latest incarnation of CanAcceptWhenCapacityIsSufficient, you may now observe the following:

  • capacitySurplus can be an arbitrary non-negative number
  • quantity can be an arbitrary positive number
  • reservationQantities can be an arbitrary array of positive numbers, including the empty array
This isn't too hard to express with, say, FsCheck (2.14.0):

[Property]
public void CanAcceptWhenCapacityIsSufficient(
    NonNegativeInt capacitySurplus,
    PositiveInt quantity,
    PositiveInt[] reservationQantities)
{
    var date = new DateTime(2018, 8, 30);
    var reservation = new Reservation
    {
        Date = date,
        Quantity = quantity.Item
    };
    var reservedSeats = reservationQantities.Sum(x => x.Item);
    var capacity = reservedSeats + quantity.Item + capacitySurplus.Item;
    var sut = new MaîtreD(capacity);
 
    var reservations =
        reservationQantities.Select(q => new Reservation { Quantity = q.Item, Date = date });
    var actual = sut.CanAccept(reservationsreservation);
 
    Assert.True(actual);
}

This refactoring takes advantage of FsCheck's built-in wrapper types NonNegativeInt and PositiveInt. If you'd like an introduction to FsCheck, you could watch my Introduction to Property-based Testing with F# Pluralsight course.

By default, FsCheck runs each property 100 times, so now, instead of seven test cases, you now have 100.

Limits to the Devil's Advocate technique #

There's a limit to the Devil's Advocate technique. Unless you're working with a problem where you can exhaust the entire domain of possible test cases, your testing strategy is always going to be a sampling strategy. You run your automated tests with either hard-coded values or randomly generated values, but regardless, a test run isn't going to cover all possible input combinations.

For example, a truly hostile Devil could make this change to the CanAccept method:

public bool CanAccept(IEnumerable<ReservationreservationsReservation reservation)
{
    if (reservation.Quantity == 3953911)
        return true;
 
    var reservedSeats = reservations.Sum(r => r.Quantity);
    return reservedSeats + reservation.Quantity <= Capacity;
}

Even if you increase the number of test cases that FsCheck generates to, say, 100,000, it's unlikely to find the poisonous branch. The chance of randomly generating a quantity of exactly 3953911 isn't that great.

The Devil's Advocate technique doesn't guarantee that you'll have enough test cases to protect yourself against all sorts of odd defects. It does, however, still work well as an analysis tool to figure out if there's 'enough' test cases.

Conclusion #

The Devil's Advocate technique is a heuristic you can use to evaluate whether more test cases would improve confidence in the test suite. You can use it to review existing (test) code, but you can also use it as inspiration for new test cases that you should consider adding.

The technique is to deliberately implement the system under test incorrectly. The more incorrect you can make it, the more test cases you'll be likely to have to add.

When there's only a few test cases, you can probably get away with a decidedly unsound implementation that still passes all tests. These are often simpler than the 'intended' implementation. In this phase of applying the heuristic, this clearly demonstrates the need for more test cases.

At a later stage, you'll have to go deliberately out of your way to produce a wrong implementation that still passes all tests. When that happens, it may be time to stop.

The intent of the technique is to uncover how many test cases you need to protect against common defects in the future. Thus, it's not a measure of current code coverage.


Comments

When there's only a few test cases, you can probably get away with a decidedly unsound implementation that still passes all tests. These are often simpler than the 'intended' implementation. In this phase of applying the heuristic, this clearly demonstrates the need for more test cases.

At a later stage, you'll have to go deliberately out of your way to produce a wrong implementation that still passes all tests. When that happens, it may be time to stop.

I like to think of this behavior as a phrase transition.

Unless you're working with a problem where you can exhaust the entire domain of possible test cases, your testing strategy is always going to be a sampling strategy.

I agree with this in practice, but it is not always true in theory. A counter eaxample is polynomial interpolation.

Normally we think of a polynomial in an indeterminate x of degree n as being specified by a list of n + 1 coefficients, where the ith coefficient is the coefficient of xi. Evaluating this polynomial given a value for x is easy; it just involves exponentiation, multiplication, and addition. Polynomial evaluation has a conceptual inverse called polynomial interpolation. In this direction, the input is evaluations at n + 1 points in "general position" and the output is the n + 1 coefficients. For example, a line is a polynomial of degree 1 and two points are in general position if they are not the same point. This is commonly expressed the phrase "Any two (distinct) points defines a line." Three points are in general position if they are not co-linear, where co-linear means that all three points are on the same line. In general, n + 1 points are in general position if they are not all on the same polynomial of degree n.

Anyway, here is the point. If a pure function is known to implement some polynomial of degree (at most) n, then even if the domain is infinite, there exists n + 1 inputs such that it is sufficient to test this function for correctness on those inputs.

This is why I think the phrase transition in the Devil's advocate testing is critical. There is some objective measure of complexity of the function under test (such as cyclomatic complexity), and we have an intuitive sense that a certain number of tests is sufficient for testing functions with that complexity. If the Devil is allowed to add monomials to the polynomial (or, heaven forbid, modify the implementation so that it is not a polynomial), then any finite number of tests can be circumvented. If instead the Devil is only allowed to modify the coefficients of the polynomial, then we have a winning strategy.

Here you may feel the urge to protest. So far, all the Devil's Advocate implementations have been objectively simpler than the 'desired' implementation because it has involved fewer elements and has had a lower or equivalent cyclomatic complexity. This new attempt to circumvent the specification seems more complex.

I think it would be exceedingly intersting if you can formally define what you mean here by "objectively". In the case of a polynomial (and speaking slightly roughly), changing the "first" nonzero coefficient to 0 decreases the complexity (i.e. the degree of the polynomial) while any other change to that coefficient or any change to any other coefficient maintains the complexity.

2019-10-25 01:32 UTC

Tyson, thank you for writing. What I meant by objectively simpler I partially explain in the same paragraph. I consider cyclomatic complexity one of hardly any useful measurements in software development. As I also imply in the article, I consider Robert C. Martin's Transformation Priority Premise to include a good ranking of code constructs, e.g. that using a constant is simpler than using a variable, and so on.

I don't think you need to reach for polynomial interpolation in order to make your point. Just consider a function that returns a constant value, like this one:

public static string Foo(int i)
{
    return "foo";
}

You can make a similar argument about this function: You only need a single test value in order to demonstrate that it works as intended. I suppose you could view that as a zero-degree polynomial.

Beyond what you think of as the phase transition I sometimes try to see what happens if I slightly increase the complexity of a function. For the Foo function, it could be a change like this:

public static string Foo(int i)
{
    if (i < -1000)
        return "bar";
    return "foo";
}

Unless you just happened to pick a number less than -1000 for your test value, your test will not discover such a change.

Your argument attempts to guard against that sort of change by assuming that we can somehow 'forbid' a change from a polynomial to something irregular. Real code doesn't work that way. Real code is rarely a continuous function, but rather discrete. That's the reason we have a concept such as edge case, because code branches at discrete values.

A polynomial is a single function, regardless of degree. Implemented in code, it'll have a cyclomatic complexity of 1. That may not even be worth testing, because you'd essentially only be reproducing the implementation code in your test.

The purpose of the Devil's Advocate technique isn't to demonstrate correctness; that's what unit tests are for. The purpose of the Devil's Advocate technique is to critique the tests.

In reality, I never imagine that some malicious developer gains access to the source code. On the other hand, we all make mistakes, and I try to imagine what a likely mistake might look like.

2019-10-26 3:57 UTC

Page 27 of 76

"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!