Constrained Non-Determinism

This may turn out to be the most controversial of my Zero-Friction TDD posts so far, as it supposedly goes against conventional wisdom. However, I have found this approach to be really powerful since I began using it about a year ago.

In my previous post, I explained how Derived Values help ensure that tests act as Executable Specification. In short, a test should clearly specify the relationship between input and outcome, as this test does:

[TestMethod]
public void InvertWillReverseText()
{
    // Fixture setup
    string anonymousText = "ploeh";
    string expectedResult =
        new string(anonymousText.Reverse().ToArray());
    MyClass sut = new MyClass();
    // Exercise system
    string result = sut.Invert(anonymousText);
    // Verify outcome
    Assert.AreEqual<string>(expectedResult, result,
        "DoWork");
    // Teardown
}

However, it is very tempting to just hardcode the expected value. Consistently using Derived Values to establish the relationship between input and outcome requires discipline.

To help myself enforce this discipline, I use well-defined, but essentially random, input, because when the input is random, I don't know the value at design time, and hence, it is impossible for me to accidentally hard-code any assertions.

[TestMethod]
public void InvertWillReverseText_Cnd()
{
    // Fixture setup
    string anonymousText = Guid.NewGuid().ToString();
    string expectedResult =
        new string(anonymousText.Reverse().ToArray());
    MyClass sut = new MyClass();
    // Exercise system
    string result = sut.Invert(anonymousText);
    // Verify outcome
    Assert.AreEqual<string>(expectedResult, result,
        "DoWork");
    // Teardown
}

For strings, I prefer Guids, as the above example demonstrates. For numbers, I often just use the sequence of natural numbers (i.e. 1, 2, 3, 4, 5...). For booleans, I often use an alternating sequence (i.e. true, false, true, false...).

While this technique causes input to become non-deterministic, I always pick the non-deterministic value-generating algorithm in such a way that it creates 'nice' values; I call this principle Constrained Non-Determinism. Values are carefully generated to stay far away from any boundary conditions that may cause the SUT to behave differently in each test run.

Conventional unit testing wisdom dictates that unit tests should be deterministic, so how can I possibly endorse this technique?

To understand this, it's important to know why the rule about deterministic unit tests exist. It exists because we want to be certain that each time we execute a test suite, we verify the exact same behavior as we did the last time (given that no tests changed). Since we also use test suites as regression tests, it's important that we can be confident that each and every test run verifies the exact same specification.

Constrained Non-Determinism doesn't invalidate that goal, because the algorithm that generates the values must be carefully picked to always create values that stay within the input's Equivalence Class.

In a surprisingly large set of APIs, strings, for example, are treated as opaque values that don't influence behavior in themselves. Many enterprise applications mostly store and read data from persistent data stores, and the value of a string in itself is often inconsequential from the point of view of the code's execution path. Data stores may have constraints on the length of strings, so Constrained Non-Determinism dictates that you should pick the generating algorithm so that the string length always stays within (or exceeds, if that's what you want to test) the constraint. Guid.ToString always returns a string with the length of 36, which is fine for a large number of scenarios.

Note that Constrained Non-Determinism is only relevant for Anonymous Variables. For input where the value holds a particular meaning in the context of the SUT, you will still need to hand-pick values as always. E.g. if the input is expected to be an XML string conforming to a particular schema, a Guid string makes no sense.

A secondary benefit of Constrained Non-Determinism is that you don't have to pause to come up with values for Anonymous Variables when you are writing the test.

While this advice may be controversial, I can only recommend it - I've been using this technique for about a year now, and have only become more fond of it as I have gained more experience with it.

Comments

Carl J #

Hey Phoeh, this is @CarlJ (if you didn't guess).

One suggestion/tip when using random values to test, is to output the random value when the test fails. With randomness, you may come across that 1 in 100 value that breaks the test, but you won't have any idea of what it was, which makes it a bit hard to replicate and fix. In my Unit Test project, I have a common method that you pass the value to and it will print it to the output screen when it fails.

As for my question on Twitter. I'm dealing with a really large database (700Gbs), which returns an infinite number of combinations of data through stored procs. There are no INSERT/UPDATE/DELETEs (that's done in a totally different project). What I want to test is how the code handles the data that is returned from the procs, which is based on a user's selection from multiple radio buttons, and drop down boxes (single and multiple selection).

This data that they select from, comes from the database too, which we have little control over if it's valid or not.

So my question on Twitter was, should I create a method(s) that generate random parameters (which come from the DB) that the proc accepts? What I'm testing is not if the data itself is valid, but whether the code handles some weird anomalies? Or should I just use parameters that I know return valid data?

I've already experimented with creating a proc that generates random parameters from the data within the database, and using those values in my test. Amazingly, it's found a lot of issues that I was able to go back and fix.

The reason why I ask, is because I've heard that this goes beyond "Unit Testing".

Thanks,
Carl J

2010-10-08 14:52 UTC

Mark Seemann #

It sounds to me like Pex would fit your scenario very well. Basically, that would allow you to write a Pex test that creates deterministic test cases for each and every code path through the SUT. I do realize that data comes from the database, but it's still input. In testing terminology, we call that Indirect Input.

If you can't inject a Test Double that provides all the different data combinations, you should be able to use Moles for that part of the task.

2010-10-08 15:57 UTC

Published: Thursday, 05 March 2009 20:23:05 UTC

Constrained Non-Determinism by Mark Seemann

Comments

Wish to comment?

Published

Tags