# Property-based testing is not the same as partition testing by Mark Seemann

*Including an example of property-based testing without much partitioning.*

A tweet from Brian Marick induced me to read a paper by Dick Hamlet and Ross Taylor called *Partition Testing Does Not Inspire Confidence*. In general, I find the conclusion fairly intuitive, but on the other hand hardly an argument against property-based testing.

I'll later return to why I find the conclusion intuitive, but first, I'd like to address the implied connection between partition testing and property-based testing. I'll also show a detailed example.

The source code used in this article is available on GitHub.

### Not the same #

The Hamlet & Taylor paper is exclusively concerned with partition testing, which makes sense, since it's from 1990. As far as I'm aware, property-based testing wasn't invented until later.

Brian Marick extends its conclusions to property-based testing:

This seems to imply that property-based testing isn't effective, because (if you accept the paper's conclusions) partition testing isn't effective."I've been a grump about property-based testing because I bought into the conclusion of Hamlet&Taylor's 1990 "Partition testing does not inspire confidence""

There's certainly overlap between partition testing and property-based testing, but it's not complete. Some property-based testing isn't partition testing, or the other way around:

To be fair, the overlap may easily be larger than the figure implies, but you can certainly describes properties without having to partition a function's domain.

In fact, the canonical example of property-based testing (that reversing a list twice yields the original list: `reverse (reverse xs) == xs`

) does *not* rely on partitioning. It works for all finite lists.

You may think that this is only because the case is so simple, but that's not the case. You can also avoid partitioning on the slightly more complex problem presented by the Diamond kata. In fact, the domain for that problem is so small that you don't need a property-based framework.

You may argue that the Diamond kata is another toy problem, but I've also solved a realistic, complex business problem with property-based testing without relying on partitioning. Granted, the property shown in that article doesn't sample uniformly from the entire domain of the System Under Test, but the property (there's only one) doesn't rely on partitioning. Instead, it relies on incremental tightening of preconditions to tease out the desired behaviour.

I'll show another example next.

### FizzBuzz via partitioning #

When introducing equivalence classes and property-based testing in workshops, I sometimes use the FizzBuzz kata as an example. When I do this, I first introduce the concept of equivalence classes and then proceed to explain how instead of manually picking values from each partition, you can randomly sample from them:

[<Property(QuietOnSuccess = true)>] let ``FizzBuzz.transform returns Buzz`` (number : int) = (number % 5 = 0 && number % 3 <> 0) ==> lazy let actual = FizzBuzz.transform number let expected = "Buzz" expected = actual

(That's F# code, but the rest of the code in this article is going to be Haskell.)

While this gently introduces the concept of testing based on randomly sampling inputs, it relies heavily on partitioning. The above example filters inputs so that it only runs with numbers that are divisible by five, but not by three.

As at least one workshop attendee objected, it's getting perilously close to reproducing the implementation logic in the test. It always hurts when someone directs valid criticism at you, but he was right. That's not a problem with property-based testing, though, but rather with the way I presented it.

We can do better.

### FizzBuzz with proper properties #

The trick to 'true' property-based testing is identifying proper properties for the problem being solved. Granted, this can be difficult and often requires some creative thinking (which is also why I find it so enjoyable). Fortunately, certain patterns tend to recur; for example, Scott Wlaschin has a small collection of property-based testing patterns.

As the FizzBuzz kata is described, the domain for a `fizzBuzz`

function is only the numbers from one to 100. Let's be generous, however, and expand it to all integers, since it makes no practical difference.

In Haskell, for example, we might aim for a function with this API:

fizzBuzz :: (Integral a, Show a) => a -> String

Is it possible to test-drive the correct implementation with QuickCheck without relying on partitioning?

I must admit that I can't figure out how to entirely avoid partitioning, but it's possible to bootstrap the process using only a single partition. If you know of a way to entirely do without partitioning, leave a comment.

### FizzBuzz #

In order to anchor the behaviour, we have to describe how at least a single value translates to a string, for example that all multiples of both three and five translate to "FizzBuzz". It might be enough to simply state that a small number like `0`

or `15`

translates to `"FizzBuzz"`

, but we might as well exercise that entire partition:

testProperty "Divisible by both 3 and 5" $ \ (seed :: Int) -> let i = seed * 3 * 5 actual = fizzBuzz i in "FizzBuzz" === actual

Here I take *any* integer `seed`

and use it to produce an integer `i`

which is guaranteed to belong to the partition that always produces the output `"FizzBuzz"`

.

Certainly, this tests only a single partition, but as Johannes Link points out, property-based testing still performs randomised testing *within* the partition.

The simplest implementation that passes the test is this:

fizzBuzz :: Integral a => a -> String fizzBuzz _ = "FizzBuzz"

From here, however, it's possible to describe the rest of the problem without relying on partition testing.

### At least one number in three consecutive values #

How to proceed from there long eluded me. Then it dawned on me that while it's hard to test *a single call* to the `fizzBuzz`

function without relying on partitioning, you can examine the output from projecting a small range of inputs to outputs.

What happens if we pick a random number, use it as an origin to enumerate three numbers in total (that is: two more numbers), and then call `fizzBuzz`

with each of them?

Imagine what happens if we randomly pick *10*. In that case, we're going to enumerate three numbers, starting with *10: 10, 11, 12*. What's the expected output of applying these three numbers to `fizzBuzz`

? It's *Buzz, 11, Fizz*. Let's try a few more, and make a table of it:

i |
i+1 |
i+2 |
---|---|---|

10 → Buzz |
11 → 11 |
12 → Fizz |

11 → 11 |
12 → Fizz |
13 → 13 |

12 → Fizz |
13 → 13 |
14 → 14 |

13 → 13 |
14 → 14 |
15 → FizzBuzz |

14 → 14 |
15 → FizzBuzz |
16 → 16 |

Do you notice a pattern?

There's more than a single pattern, but one is that there's *always at least one number* among the three results. Even if you have both a *Fizz* and a *Buzz*, as is the case with *10, 11, 12*, at least one of the results (*11*) remains a number. Think about it some more to convince yourself that this should always be the case for three consecutive numbers.

That's a *property* of `fizzBuzz`

, and it holds universally (also for negative integers).

You can turn it into a QuickCheck property like this:

testProperty "At least one number in 3 consecutive values" $ \ (i :: Int) -> let range = [i..i+2] actual = fizzBuzz <$> range in counterexample (show range ++ "->" ++ show actual) $ any (\x -> isJust (readMaybe x :: Maybe Int)) actual

This test doesn't rely on input partitioning. It works for *all* integers.

In this test I used QuickCheck's `counterexample`

function to provide a helpful message in case of failure. Running the test suite against the above version of `fizzBuzz`

yields a failure like this:

```
At least one number in 3 consecutive values: [Failed]
*** Failed! Falsified (after 1 test):
0
[0,1,2]->["FizzBuzz","FizzBuzz","FizzBuzz"]
(used seed -6204080625786338123)
```

Here we see that the sequence `[0,1,2]`

produces the output `["FizzBuzz","FizzBuzz","FizzBuzz"]`

, which is not only wrong, but is specifically incorrect in the sense that none of the values can be parsed as an integer.

Given the current implementation, that's hardly surprising.

Using the Devil's Advocate technique, I chose to pass both tests with this implementation:

fizzBuzz :: Integral a => a -> String fizzBuzz i = if i `mod` 15 == 0 then "FizzBuzz" else "2112"

The property I just added doesn't check whether the number is one of the input numbers, so the implementation can get away with returning the hard-coded string `"2112"`

.

### At least one Fizz in three consecutive values #

Take a look at the above table. Do you notice any other patterns?

Of each set of three results, there's always a string that *starts with Fizz*. Sometimes, as we see with the input *15*, the output is *FizzBuzz*, so it's not always just *Fizz*, but there's always a string that starts with *Fizz*.

This is another universal property of the `fizzBuzz`

function, which we can express as a test:

testProperty "At least one Fizz in 3 consecutive values" $ \ (i :: Int) -> let range = [i..i+2] actual = fizzBuzz <$> range in counterexample (show range ++ "->" ++ show actual) $ any ("Fizz" `isPrefixOf`) actual

Again, no partitioning is required to express this property. The arbitrary parameter `i`

is unconstrained.

To pass all tests, I implemented `fizzBuzz`

like this:

fizzBuzz :: Integral a => a -> String fizzBuzz i = if i `mod` 3 == 0 then "FizzBuzz" else "2112"

It doesn't look as though much changed, but notice that the modulo check changed from *modulo 15* to *modulo 3*. While incorrect, it passes all tests.

### Only one Buzz in five consecutive values #

Using the same reasoning as above, another property emerges. If, instead of looking at sequences of three input arguments, you create five values, only one of them should result in a *Buzz* result; e.g. *8, 9, 10, 11, 12* should result in *8, Fizz, Buzz, 11, Fizz*. Sometimes, however, the *Buzz* value is *FizzBuzz*, for example when the origin is *11: 11, 12, 13, 14, 15* should produce *11, Fizz, 13, 14, FizzBuzz*.

Like the above property, there's only one *Buzz*, but sometimes it's part of a compound word. What's clear, though, is that there should be only one result that ends with *Buzz*.

Not only is the idea similar to the one above, so is the test:

testProperty "Only one Buzz in 5 consecutive values" $ \ (i :: Int) -> let range = [i..i+4] actual = fizzBuzz <$> range in counterexample (show range ++ "->" ++ show actual) $ 1 == length (filter ("Buzz" `isSuffixOf`) actual)

Again, no partitioning is required to express this property.

This version of `fizzBuzz`

passes all tests:

fizzBuzz :: Integral a => a -> String fizzBuzz i | i `mod` 5 == 0 = "FizzBuzz" fizzBuzz i | i `mod` 3 == 0 = "Fizz" fizzBuzz _ = "2112"

We're not quite there yet, but we're getting closer.

### At least one literal Buzz in ten consecutive values #

What's wrong with the above implementation?

It never returns *Buzz*. How can we express a property that forces it to do that?

We can keep going in the same vein. We know that if we sample a sufficiently large sequence of numbers, it might produce a *FizzBuzz* value, but even if it does, there's going to be a *Buzz* value five positions before and after. For example, if the input sequence contains *30* (producing *FizzBuzz*) then both *25* and *35* should produce *Buzz*.

How big a range should we sample to be certain that there's at least one *Buzz*?

If it's not immediately clear, try setting up a table similar to the one above:

i |
i+1 |
i+2 |
i+3 |
i+4 |
i+5 |
i+6 |
i+7 |
i+8 |
i+9 |
---|---|---|---|---|---|---|---|---|---|

10 →Buzz |
11 →11 |
12 →Fizz |
13 →13 |
14 →14 |
15 →FizzBuzz |
16 →16 |
17 →17 |
18 →Fizz |
19 →19 |

11 →11 |
12 →Fizz |
13 →13 |
14 →14 |
15 →FizzBuzz |
16 →16 |
17 →17 |
18 →Fizz |
19 →19 |
20 →Buzz |

17 →17 |
18 →Fizz |
19 →19 |
20 →Buzz |
21 →Fizz |
22 →22 |
23 →23 |
24 →Fizz |
25 →Buzz |
26 →26 |

Notice that as one *Buzz* drops off to the left, a new one appears to the right. Additionally, there may be more than one literal *Buzz*, but there's always at least one (that is, one that's exactly *Buzz*, and not just ending in *Buzz*).

That's another universal property: for any consecutive sequence of numbers of length ten, there's at least one exact *Buzz*. Here's how to express that as a QuickCheck property:

testProperty "At least one literal Buzz in 10 values" $ \ (i :: Int) -> let range = [i..i+9] actual = fizzBuzz <$> range in counterexample (show range ++ "->" ++ show actual) $ elem "Buzz" actual

Again, no partitioning is required to express this property.

This version of `fizzBuzz`

passes all tests:

fizzBuzz :: Integral a => a -> String fizzBuzz i | i `mod` 15 == 0 = "FizzBuzz" fizzBuzz i | i `mod` 3 == 0 = "Fizz" fizzBuzz i | i `mod` 5 == 0 = "Buzz" fizzBuzz _ = "2112"

What's left? Only that the number is still hard-coded.

### Numbers round-trip #

How to get rid of the hard-coded number?

From one of the above properties, we know that if we pick an arbitrary consecutive sequence of three numbers, at least one of the results will be a string representation of the input number.

It's not guaranteed to be the origin, though. If the origin is, say, *3*, the input sequence is *3, 4, 5*, which should yield the resulting sequence *Fizz, 4, Buzz*.

Since we don't know which number(s) will remain, how can we check that it translates correctly? We can use a variation of a common property-based testing pattern - the one that Scott Wlaschin calls *There and back again*.

We can take any sequence of three outputs and try to parse them back to integers. All successfully parsed numbers must belong to the input sequence.

That's another universal property. Here's how to express that as a QuickCheck property:

testProperty "Numbers round-trip" $ \ (i :: Int) -> let range = [i..i+2] actual = fizzBuzz <$> range numbers = catMaybes $ readMaybe <$> actual in counterexample (show range ++ "->" ++ show actual) $ all (`elem` range) numbers

The parsed `numbers`

may contain one or two elements, but in both cases, all of them must be an element of `range`

.

Again, no partitioning is required to express this property.

This version of `fizzBuzz`

passes all tests:

fizzBuzz :: (Integral a, Show a) => a -> String fizzBuzz i | i `mod` 15 == 0 = "FizzBuzz" fizzBuzz i | i `mod` 3 == 0 = "Fizz" fizzBuzz i | i `mod` 5 == 0 = "Buzz" fizzBuzz i = show i

This looks good. Let's call it a day.

Not so fast, though.

### Redundant property? #

With the new round-trip property, isn't the property titled *At least one number in 3 consecutive values* redundant?

You might think so, but it's not. What happens if we remove it?

If you remove the *At least one number in 3 consecutive values* property, the Devil's Advocate can corrupt `fizzBuzz`

like this:

fizzBuzz :: (Integral a, Show a) => a -> String fizzBuzz i | i `mod` 15 == 0 = "FizzBuzz" fizzBuzz i | i `mod` 3 == 0 = "Fizz" fizzBuzz i | i `mod` 5 == 0 = "Buzz" fizzBuzz _ = "Pop"

This passes all tests if you remove *At least one number in 3 consecutive values*. Why doesn't the *Numbers round-trip* property fail?

It doesn't fail because with the above implementation of `fizzBuzz`

its *numbers* list is always empty. This property doesn't require *numbers* to be non-empty. It doesn't have to, because that's the job of the *At least one number in 3 consecutive values* property. Thus, that property isn't redundant. Leave it in.

### Intuition behind the paper #

What about the results from the Hamlet & Taylor paper? Are the conclusions in the paper wrong?

They may be, but that's not my take. Rather, the way I understand the paper, it says that partition testing isn't much more efficient at detecting errors than pure random sampling.

I've been using the rather schizophrenic version of the Devil's Advocate technique (the one that I call Gollum style) for so long that this conclusion rings true for me.

Consider a truly adversarial developer from Hell. He or she could subvert `fizzBuzz`

like this:

fizzBuzz :: (Integral a, Show a) => a -> String fizzBuzz 18641 = "Pwnd" fizzBuzz i | i `mod` 15 == 0 = "FizzBuzz" fizzBuzz i | i `mod` 3 == 0 = "Fizz" fizzBuzz i | i `mod` 5 == 0 = "Buzz" fizzBuzz i = show i

The test suite is *very unlikely* to detect the error - even if you ask it to run each property a million times:

$ stack test --ta "-a 1000000" FizzBuzzHaskellPropertyBased> test (suite: FizzBuzz-test, args: -a 1000000) Divisible by both 3 and 5: [OK, passed 1000000 tests] At least one number in 3 consecutive values: [OK, passed 1000000 tests] At least one Fizz in 3 consecutive values: [OK, passed 1000000 tests] Only one Buzz in 5 consecutive values: [OK, passed 1000000 tests] At least one literal Buzz in 10 values: [OK, passed 1000000 tests] Numbers round-trip: [OK, passed 1000000 tests] Properties Total Passed 6 6 Failed 0 0 Total 6 6 FizzBuzzHaskellPropertyBased> Test suite FizzBuzz-test passed

How many times should we run these properties before we'd expect the *At least one number in 3 consecutive values* property to detect the error?

In Haskell, `Int`

is an integer type with at least the range [-2^29 .. 2^29-1] - that is, from -536,870,912 to 536,870,911, for a total range of 1,073,741,823 numbers.

In order to detect the error, the *At least one number in 3 consecutive values* property needs to hit *18,641*, which it can only do if QuickCheck supplies an `i`

value of *18,639*, *18,640*, or * 18,641*. That's three values out of 1,073,741,823.

If we assume a uniform distribution, the chance of detecting the error is *3 / 1,073,741,823*, or approximately one in 333 million.

Neither property-based testing nor randomised testing is likely to detect this kind of error. That's basically the intuition that makes sense to me when reading the Hamlet & Taylor paper. If you don't know where to look, partition testing isn't going to help you detect errors like the above.

I can live with that. After all, the value I get out of property-based testing is as a variation on test-driven development, rather than only quality assurance. It enables me to incrementally flesh out a problem in a way that example-based testing sometimes can't.

### Conclusion #

There's a big overlap between partition testing and property-based testing. Often, identifying equivalence classes is the first step to expressing a property. A conspicuous example can be seen in my article series Types + Properties = Software, which shows a detailed walk-through of the Tennis kata done with FsCheck. For a hybrid approach, see Roman numerals via property-based TDD.

In my experience, it's much easier to partition a domain into equivalence classes than it is to describe universal properties of a system. Thus, many properties I write tend to be a kind of partition testing. On the other hand, it's more satisfying when you can express universal properties. I'm not sure that it's always possible, but I find that when it is, it better decouples the tests from implementation details.

Based on the FizzBuzz example shown here, you may find it unappealing that there's more test code than 'production code'. Clearly, for a problem like FizzBuzz, this is unwarranted. That, however, wasn't the point with the example. The point was to show an easily digestible example of universal properties. For a more realistic example, I'll refer you to the scheduling problem I also linked to earlier. While the production code ultimately turned out to be compact, it's far from trivial.