Constrained Non-Determinism

Thursday, 05 March 2009 20:23:05 UTC

This may turn out to be the most controversial of my Zero-Friction TDD posts so far, as it supposedly goes against conventional wisdom. However, I have found this approach to be really powerful since I began using it about a year ago.

In my previous post, I explained how Derived Values help ensure that tests act as Executable Specification. In short, a test should clearly specify the relationship between input and outcome, as this test does:

[TestMethod]
public void InvertWillReverseText()
{
    // Fixture setup
    string anonymousText = "ploeh";
    string expectedResult =
        new string(anonymousText.Reverse().ToArray());
    MyClass sut = new MyClass();
    // Exercise system
    string result = sut.Invert(anonymousText);
    // Verify outcome
    Assert.AreEqual<string>(expectedResult, result,
        "DoWork");
    // Teardown
}

However, it is very tempting to just hardcode the expected value. Consistently using Derived Values to establish the relationship between input and outcome requires discipline.

To help myself enforce this discipline, I use well-defined, but essentially random, input, because when the input is random, I don't know the value at design time, and hence, it is impossible for me to accidentally hard-code any assertions.

[TestMethod]
public void InvertWillReverseText_Cnd()
{
    // Fixture setup
    string anonymousText = Guid.NewGuid().ToString();
    string expectedResult =
        new string(anonymousText.Reverse().ToArray());
    MyClass sut = new MyClass();
    // Exercise system
    string result = sut.Invert(anonymousText);
    // Verify outcome
    Assert.AreEqual<string>(expectedResult, result,
        "DoWork");
    // Teardown
}

For strings, I prefer Guids, as the above example demonstrates. For numbers, I often just use the sequence of natural numbers (i.e. 1, 2, 3, 4, 5...). For booleans, I often use an alternating sequence (i.e. true, false, true, false...).

While this technique causes input to become non-deterministic, I always pick the non-deterministic value-generating algorithm in such a way that it creates 'nice' values; I call this principle Constrained Non-Determinism. Values are carefully generated to stay far away from any boundary conditions that may cause the SUT to behave differently in each test run.

Conventional unit testing wisdom dictates that unit tests should be deterministic, so how can I possibly endorse this technique?

To understand this, it's important to know why the rule about deterministic unit tests exist. It exists because we want to be certain that each time we execute a test suite, we verify the exact same behavior as we did the last time (given that no tests changed). Since we also use test suites as regression tests, it's important that we can be confident that each and every test run verifies the exact same specification.

Constrained Non-Determinism doesn't invalidate that goal, because the algorithm that generates the values must be carefully picked to always create values that stay within the input's Equivalence Class.

In a surprisingly large set of APIs, strings, for example, are treated as opaque values that don't influence behavior in themselves. Many enterprise applications mostly store and read data from persistent data stores, and the value of a string in itself is often inconsequential from the point of view of the code's execution path. Data stores may have constraints on the length of strings, so Constrained Non-Determinism dictates that you should pick the generating algorithm so that the string length always stays within (or exceeds, if that's what you want to test) the constraint. Guid.ToString always returns a string with the length of 36, which is fine for a large number of scenarios.

Note that Constrained Non-Determinism is only relevant for Anonymous Variables. For input where the value holds a particular meaning in the context of the SUT, you will still need to hand-pick values as always. E.g. if the input is expected to be an XML string conforming to a particular schema, a Guid string makes no sense.

A secondary benefit of Constrained Non-Determinism is that you don't have to pause to come up with values for Anonymous Variables when you are writing the test.

While this advice may be controversial, I can only recommend it - I've been using this technique for about a year now, and have only become more fond of it as I have gained more experience with it.


Comments

Hey Phoeh, this is @CarlJ (if you didn't guess).

One suggestion/tip when using random values to test, is to output the random value when the test fails. With randomness, you may come across that 1 in 100 value that breaks the test, but you won't have any idea of what it was, which makes it a bit hard to replicate and fix. In my Unit Test project, I have a common method that you pass the value to and it will print it to the output screen when it fails.

As for my question on Twitter. I'm dealing with a really large database (700Gbs), which returns an infinite number of combinations of data through stored procs. There are no INSERT/UPDATE/DELETEs (that's done in a totally different project). What I want to test is how the code handles the data that is returned from the procs, which is based on a user's selection from multiple radio buttons, and drop down boxes (single and multiple selection).

This data that they select from, comes from the database too, which we have little control over if it's valid or not.

So my question on Twitter was, should I create a method(s) that generate random parameters (which come from the DB) that the proc accepts? What I'm testing is not if the data itself is valid, but whether the code handles some weird anomalies? Or should I just use parameters that I know return valid data?

I've already experimented with creating a proc that generates random parameters from the data within the database, and using those values in my test. Amazingly, it's found a lot of issues that I was able to go back and fix.

The reason why I ask, is because I've heard that this goes beyond "Unit Testing".

Thanks,
Carl J
2010-10-08 14:52 UTC
It sounds to me like Pex would fit your scenario very well. Basically, that would allow you to write a Pex test that creates deterministic test cases for each and every code path through the SUT. I do realize that data comes from the database, but it's still input. In testing terminology, we call that Indirect Input.

If you can't inject a Test Double that provides all the different data combinations, you should be able to use Moles for that part of the task.
2010-10-08 15:57 UTC

Derived Values Ensure Executable Specification

Tuesday, 03 March 2009 20:01:29 UTC

In this Zero-Friction TDD post, I'd like to take a detour around the concept of tests as Executable Specification.

An important aspect of test maintainability is readability. Tests should act both as Executable Specification as well as documentation, which puts a lot of responsibility on the test.

One facet of test readability is to make the relationship between the Fixture, the SUT and the verification as easy to understand as possible. In other words, it should be clear to the Test Reader what is being asserted, and why.

Consider a test like this one:

[TestMethod]
public void InvertWillReverseText_Naïve()
{
    // Fixture setup
    MyClass sut = new MyClass();
    // Exercise system
    string result = sut.Invert("ploeh");
    // Verify outcome
    Assert.AreEqual<string>("heolp", result, "DoWork");
    // Teardown
}

Since this test is so simple, I expect that you can easily figure out that it implies that the Invert method should simply reverse its input argument, but one of the reasons this seems to be evident is because of the proximity of the two strings, as well as the test's name.

In a test of a more complex API, this may not be quite as evident.

[TestMethod]
public void DoItWillReturnCorrectResult_Naïve()
{
    // Fixture setup
    MyClass sut = new MyClass();
    // Exercise system
    int result = sut.DoIt("ploeh");
    // Verify outcome
    Assert.AreEqual<int>(42, result, "DoIt");
    // Teardown
}

In this test, there's no apparent relationship between the input (ploeh) and the output (42). Whatever the algorithm is behind the DoIt method, it's completely opaque to the Test Reader, and the test fails in its role as specification and documentation.

Returning to the first example, it would be better if the relationship between input and output was explicitly described:

[TestMethod]
public void InvertWillReverseText()
{
    // Fixture setup
    string anonymousText = "ploeh";
    string expectedResult =
        new string(anonymousText.Reverse().ToArray());
    MyClass sut = new MyClass();
    // Exercise system
    string result = sut.Invert(anonymousText);
    // Verify outcome
    Assert.AreEqual<string>(expectedResult, result,
        "DoWork");
    // Teardown
}

In this case, the input and expected outcome are clearly related, and we call the expectedResult variable a Derived Value, since we explicitly derive the expected result from the input.

Note that I'm not asking you to re-implement the whole algorithm in the test, but only to establish a relationship. One of the main rules of thumb of unit testing is that a test should never contain conditional branches, so there must be at least one test case per path though the SUT.

In the example, the Invert method actually looks like this:

public string Invert(string message)
{
    double d;
    if (double.TryParse(message, out d))
    {
        return (1d / d).ToString();
    }
 
    return new string(message.Reverse().ToArray());
}

Note that the above test only reproduces that part of the algorithm that corresponds to the Equivalence Class defined by the input, whereas the branch that is triggered by a number string can be tested by another test case that doesn't specify string reversion.

[TestMethod]
public void InvertWillInvertNumber()
{
    // Fixture setup
    double anonymousNumber = 10;
    string numberText = anonymousNumber.ToString();
    string expectedResult = 
        (1d / anonymousNumber).ToString();
    MyClass sut = new MyClass();
    // Exercise system
    string result = sut.Invert(numberText);
    // Verify outcome
    Assert.AreEqual<string>(expectedResult, result,
        "DoWork");
    // Teardown
}

In this way, we can break down the test cases to individual Executable Specifications that define the expected behavior for each Equivalence Class.

While such tests more clearly provide both specification and documentation, it requires discipline to write tests in this way. Particularly when the algorithm is so simple as is the case here, it's very tempting to just hard-code the values directly into the assertion.

In a future post, I'll explain how we can force ourselves to do the right thing per default.


Updating Detached Entities With LINQ To Entities

Sunday, 22 February 2009 20:45:36 UTC

When working with the ObjectContext in LINQ To Entities, a lot of operations are easily performed as long as you work with the same ObjectContext instance: You can retrieve entities from storage by selecting them; update or delete these entities and create new entities, and the ObjectContext will keep track of all this for you, so the changes are correctly applied to the store when you call SaveChanges.

This is all well and good, but not particularly useful when you start working with layered applications. In this case, LINQ To Entities is just a persistence technology that you (or someone else) decided to use to implement the Data Access Layer. A few years ago, I tended to implement my Data Access Components in straight ADO.NET; and a lot of people prefer NHibernate or similar tools - but I digress…

When LINQ To Entities is just an implementation detail of a service, lifetime management becomes important, so it is commonly recommended that any ObjectContext instance is instantiated when needed and disposed immediately after use.

This means that you will have a lot of detached entities in your system. Entities are likely to be returned to the calling code as interface, and when updating, a client will simply pass a reference to some implementation of that interface.

public void CompleteAtSource(IRecord record)

Since we should always follow the Liskov Substitution Principle, we should not even try to cast the interface to an entity. Instead, we must populate a new instance of the entity in question with the correct data and save it.

That's not hard, but since we are creating a new instance of an entity that represents data that is already in the database, we must attach it to the ObjectContext so that it can start tracking it again.

Now we are getting to the heat of the matter, because this is done with the AttachTo method, which is woefully inadequately documented.

At first, I couldn't get it to work, and it wasn't very apparent to me what I did wrong, so although the answer is very simple, this post might save you a bit of time.

This was my first attempt:

using (MessageEntities store = 
    new MessageEntities(this.connectionString))
{
    Message m = new Message();
    m.Id = record.Id;
    m.InputReference = record.InputReference;
    m.State = 2;
    m.Text = record.Text;
 
    store.AttachTo("Messages", m);
 
    store.SaveChanges();
}

I find this approach very intuitive: Build the entity from the input parameter's data, attach it to the store and save the changes. Unfortunately, this approach is wrong.

What happens is that when you invoke AttachTo, the state of the entity becomes Unchanged, and thus, not updated.

The solution is so simple that I'm surprised it took me so long to arrive at it: Simply call AttachTo right after setting the Id property:

using (MessageEntities store = 
    new MessageEntities(this.connectionString))
{
    Message m = new Message();
    m.Id = record.Id;
 
    store.AttachTo("Messages", m);
 
    m.InputReference = record.InputReference;
    m.State = 2;
    m.Text = record.Text;
 
    store.SaveChanges();
}

You can't invoke AttachTo before adding the Id, since this method requires that the entity has a populated EntityKey before it can be attached, but as soon as you begin updating properties after the call to AttachTo, the entity's state changes to Modified, and SaveChanges now updates the data in the database.

That you have to follow this specific sequence when re-attaching data to the ObjectContext is poorly documented and not enforced by the API, so I thought I'd share this in case it would save someone else a bit of time.


Comments

Luckily, the ADO.NET team implemented the attach logic correctly in the Entity framework as opposed to Linq to Sql.
2009-03-10 12:39 UTC

SUT Factory

Friday, 13 February 2009 07:56:21 UTC

In my Zero-Friction TDD series, I focus on establishing a set of good habits that can potentially make you more productive while writing tests TDD style. While being able to quickly write good tests is important, this is not the only quality on which you should focus.

Maintainability, not only of your production code, but also of your test code, is important, and the DRY principle is just as applicable here.

Consider a test like this:

[TestMethod]
public void SomeTestUsingConstructorToCreateSut()
{
    // Fixture setup
    MyClass sut = new MyClass();
    // Exercise system
    // ...
    // Verify outcome
    // ...
    // Teardown
}

Such a test represents an anti-pattern you can easily fall victim to. The main item of interest here is that I create the SUT using its constructor. You could say that I have hard-coded this particular constructor usage into my test.

This is not a problem if there's only one test of MyClass, but once you have many, this starts to become a drag on your ability to refactor your code.

Imagine that you want to change the constructor of MyClass from the default constructor to one that takes a dependency, like this:

public MyClass(IMyInterface dependency)

If you have many (in this case, not three, but dozens) tests using the default constructor, this simple change will force you to visit all these tests and modify them to be able to compile again.

If, instead, we use a factory to create the SUT in each test, there's a single place where we can go and update the creation logic.

[TestMethod]
public void SomeTestUsingFactoryToCreateSut()
{
    // Fixture setup
    MyClass sut = MyClassFactory.Create();
    // Exercise system
    // ...
    // Verify outcome
    // ...
    // Teardown
}

The MyClassFactory class is a test-specific helper class (more formally, it's part of our SUT API Encapsulation) that is part of the unit test project. Using this factory, we only need to modify the Create method to implement the constructor change.

internal static MyClass Create()
{
    IMyInterface fake = new FakeMyInterface();
    return new MyClass(fake);
}

Instead of having to modify many individual tests to support the signature change of the constructor, there's now one central place where we can go and do that. This pattern supports refactoring much better, so consider making this a habit of yours.

One exception to this rule concerns tests that explicitly deal with the constructor, such as this one:

[ExpectedException(typeof(ArgumentNullException))]
[TestMethod]
public void CreateWithNullMyInterfaceWillThrow()
{
    // Fixture setup
    IMyInterface nullMyInterface = null;
    // Exercise system
    new MyClass(nullMyInterface);
    // Verify outcome (expected exception)
    // Teardown
}

In a case like this, where you explicitly want to deal with the constructor in an anomalous way, I consider it reasonable to deviate from the rule of using a factory to create the SUT. Although this may result in a need to fix the SUT creation logic in more than one place, instead of only in the factory itself, it's likely to be constrained to a few places instead of dozens or more, since normally, you will only have a handful of these explicit constructor tests.

Compared to my Zero-Friction TDD tips and tricks, this particular advice has the potential to marginally slow you down. However, this investments pays off when you want to refactor your SUT's constructor, and remember that you can always just write the call to the factory and move on without implementing it right away.


Comments

Another approach is to let the test fixture hold the SUT in a field and then instantiate it in the test initialize method. This way all tests will have access to a default instance of the SUT, without cluttering the test itself with details of how it was created. Since dependencies will also be created in test init, your tests will also be able to access the stubs or mocks for verification.

2009-02-16 19:20 UTC
Hi Martin

Thank you for your comment.

While you are right that technically, this is another option, I don't like to use Implicit Setup, since it doesn't clearly communicate intent. If you have dozens of test cases in a single Test Class, the Setup may not be very apparant; in essence, it's clouding the state of the Fixture, since it's not readily visible (it may be in a completely different secion of the file).

Another reason I don't like this approach is that it tightly couples the Test Class to the Fixture, and it makes it harder to vary the Fixture within the same Test Class.

Explicitly setting up the Fixture provides a greater degree of flexibility, since you can always overload the SUT Factory to create the SUT in different ways.
2009-02-17 13:09 UTC
I like this pattern. What's your opinion on doing the same thing within the method decorated with [TestIntialize] attr (MS Test), or even in constructor (i.e xUnit)? It achieves the same result (or may be even better) IMO.
2011-05-24 10:54 UTC
Raj, the problem with this approach is that in order to be maintainable you would need to adopt the Testcase Class per Fixture pattern, because if you don't the test class will eventually suffer from low cohesion. However, most people (myself included) tend to find this pattern counter-intuitive and rather prefer Testcase Class per Class.
2011-05-24 19:07 UTC

Zero-Friction TDD

Wednesday, 28 January 2009 14:46:44 UTC

In my original post on Zero-Friction TDD, I continually updated the list of posts in the series, so that there would always be a central 'table of contents' for this topic.

Since I'll lose the ability to keep editing any of my previous postings on the old ploeh blog, the present post now contains the most updated list of Zero-Friction TDD articles:


Living In Interesting Times

Wednesday, 28 January 2009 09:03:29 UTC

As readers of my old MSDN blog will know, ploeh blog is moving to this new site.

Responding to the current financial crisis, Microsoft is cutting costs and laying off 1400 employees. During that process, the entire Microsoft Dynamics Mobile Team is being disbanded, which is very sad, since it was a very nice place to work. The team spirit was great, and we were really committed to agile development methodologies, but all good things must end...

Currently, I can't even begin to guess what the future looks like for me, although to regular readers of my blog I can state that I sincerely intend to keep writing as I always have. If you subscribed to my old blog, then please subscribe here instead. The blog is moving because the old blog belongs to Microsoft, and only employees can post, so I'll soon be writing my last post on the old blog.

Until now, it's always been Microsoft's policy to retain the old blogs, even when the original authors leave the company, so while I will not be able to post to the old blog, I expect the old posts to be around for a long time yet.

Professionally, I don't know what I will do now. If I can find new employment in these times, I may simply decide to take on new challenges with a new employer. However, I'm also considering free-lancing for a while: Coding, mentoring, lecturing, writing...

If you are in the position where you think you can use my services, whether for full-time employment or just a few days, please let me know. Keep in mind that I'm based in Copenhagen, Denmark, and while I can certainly travel after work, I cannot permanently move due to family obligations.


Comments

I'm very sorry to hear that; but you are a very smart person so I'm sure you'll have no problem finding a new job

I wish you the best

2009-02-11 22:42 UTC

We're currently looking for senior developers with MS competencies. So have a look at our website and send me your CV, if you're interested.

Regards

Simon B. Jensen
Head of Development
IT Practice

2009-02-12 13:25 UTC

Mark,

You're a good man. A software genius. The fact that you got laid off is a crime. I would hire you any day, and I wish you the best. Friend me on Facebook. brianlambert@gmail.com.

All my best,

Brian

2009-02-24 05:34 UTC

Page 73 of 73

"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!