A Range kata implementation in Haskell

A first crack at the exercise.

This article is an instalment in a short series of articles on the Range kata. Here I describe my first attempt at the exercise. As I usually advise people on doing katas, the first time you try your hand at a kata, use the language with which you're most comfortable. To be honest, I may be most habituated to C#, having programmed in it since 2002, but on the other hand, I currently 'think in Haskell', and am often frustrated with C#'s lack of structural equality, higher-order abstractions, and support for functional expressions.

Thus, I usually start with Haskell even though I always find myself struggling with the ecosystem. If you do, too, the source code is available on GitHub.

I took my own advice by setting out with the explicit intent to follow the Range kata description as closely as possible. This kata doesn't beat about the bush, but instead just dumps a set of test cases on you. It wasn't clear if this is the most useful set of tests, or whether the order in which they're represented is the one most conducive to a good experience of test-driven development, but there was only one way to find out.

I quickly learned, however, that the suggested test cases were insufficient in describing the behaviour in enough details.

Containment #

I started by adding the first two test cases as inlined HUnit test lists:

"integer range contains" ~: do
  (r, candidate, expected) <-
    [
      ((Closed 2, Open 6), [2,4], True),
      ((Closed 2, Open 6), [-1,1,6,10], False)
    ]
  let actual = r `contains` candidate
  return $ expected ~=? actual

I wasn't particularly keen on going full Devil's Advocate on the exercise. I could, on the other hand, trivially pass both tests with this obviously degenerate implementation:

import Data.List
 
data Endpoint a = Open a | Closed a deriving (Eq, Show)
 
contains _ candidate = [2] `isPrefixOf` candidate

Reluctantly, I had to invent some additional test cases:

"integer range contains" ~: do
  (r, candidate, expected) <-
    [
      ((Closed   2 ,   Open  6),       [2,4],  True),
      ((Closed   2 ,   Open  6), [-1,1,6,10], False),
      ((Closed (-1), Closed 10), [-1,1,6,10],  True),
      ((Closed (-1),   Open 10), [-1,1,6,10], False),
      ((Closed (-1),   Open 10),  [-1,1,6,9],  True),
      ((  Open   2,  Closed  6),     [3,5,6],  True),
      ((  Open   2,    Open  6),       [2,5], False),
      ((  Open   2,    Open  6),          [],  True),
      ((Closed   2,  Closed  6),     [3,7,4], False)
    ]
  let actual = r `contains` candidate
  return $ expected ~=? actual

This was when I began to wonder whether it would have been easier to use property-based testing. That would entail, however, a departure from the kata's suggested test cases, so I decided to stick to the plan and then perhaps return to property-based testing when repeating the exercise.

Ultimately I implemented the contains function this way:

contains :: (Foldable t, Ord a) => (Endpoint a, Endpoint a) -> t a -> Bool
contains (lowerBound, upperBound) =
  let isHighEnough = case lowerBound of
        Closed x -> (x <=)
        Open   x -> (x <)
      isLowEnough = case upperBound of
        Closed y -> (<= y)
        Open   y ->  (< y)
      isContained x = isHighEnough x && isLowEnough x
  in all isContained

In some ways it seems a bit verbose to me, but I couldn't easily think of a simpler implementation.

One of the features I find so fascinating about Haskell is how general it enables me to be. While the tests use integers for concision, the contains function works with any Ord instance; not only Integer, but also Double, Word, Day, TimeOfDay, or some new type I can't even predict.

All points #

The next function suggested by the kata is a function to enumerate all points in a range. There's only a single test case, so again I added some more:

"getAllPoints" ~: do
  (r, expected) <-
    [
      ((Closed 2,   Open 6), [2..5]),
      ((Closed 4,   Open 8), [4..7]),
      ((Closed 2, Closed 6), [2..6]),
      ((Closed 4, Closed 8), [4..8]),
      ((  Open 2, Closed 6), [3..6]),
      ((  Open 4, Closed 8), [5..8]),
      ((  Open 2,   Open 6), [3..5]),
      ((  Open 4,   Open 8), [5..7])
    ]
  let actual = allPoints r
  return $ expected ~=? actual

Ultimately, after I'd implemented the next feature, I refactored the allPoints function to make use of it, and it became a simple one-liner:

allPoints :: (Enum a, Num a) => (Endpoint a, Endpoint a) -> [a]
allPoints = uncurry enumFromTo . endpoints

The allPoints function also enabled me to express the kata's ContainsRange test cases without introducing a new API:

"ContainsRange" ~: do
  (r, candidate, expected) <-
    [
      ((Closed 2,   Open  5), allPoints (Closed 7, Open   10), False),
      ((Closed 2,   Open  5), allPoints (Closed 3, Open   10), False),
      ((Closed 3,   Open  5), allPoints (Closed 2, Open   10), False),
      ((Closed 2,   Open 10), allPoints (Closed 3, Closed  5),  True),
      ((Closed 3, Closed  5), allPoints (Closed 3, Open    5),  True)
    ]
  let actual = r `contains` candidate
  return $ expected ~=? actual

As I've already mentioned, the above implementation of allPoints is based on the next feature, endpoints.

Endpoints #

The kata also suggests a function to return the two endpoints of a range, as well as some test cases to describe it. Once more, I had to add more test cases to adequately describe the desired functionality:

"endPoints" ~: do
  (r, expected) <-
    [
      ((Closed 2,   Open 6), (2, 5)),
      ((Closed 1,   Open 7), (1, 6)),
      ((Closed 2, Closed 6), (2, 6)),
      ((Closed 1, Closed 7), (1, 7)),
      ((  Open 2,   Open 6), (3, 5)),
      ((  Open 1,   Open 7), (2, 6)),
      ((  Open 2, Closed 6), (3, 6)),
      ((  Open 1, Closed 7), (2, 7))
    ]
  let actual = endpoints r
  return $ expected ~=? actual

The implementation is fairly trivial:

endpoints :: (Num a1, Num a2) => (Endpoint a2, Endpoint a1) -> (a2, a1)
endpoints (Closed x, Closed y) = (x  , y)
endpoints (Closed x,   Open y) = (x  , y-1)
endpoints (  Open x, Closed y) = (x+1, y)
endpoints (  Open x,   Open y) = (x+1, y-1)

One attractive quality of algebraic data types is that the 'algebra' of the type(s) tell you how many cases you need to pattern-match against. Since I'm treating a range as a pair of Endpoint values, and since each Endpoint can be one of two cases (Open or Closed), there's exactly 2 * 2 = 4 possible combinations (since a tuple is a product type).

That fits with the number of pattern-matches required to implement the function.

Overlapping ranges #

The final interesting feature is a predicate to determine whether one range overlaps another. As has become a refrain by now, I didn't find the suggested test cases sufficient to describe the desired behaviour, so I had to add a few more:

"overlapsRange" ~: do
  (r, candidate, expected) <-
    [
      ((Closed 2, Open  5), (Closed 7, Open 10), False),
      ((Closed 2, Open 10), (Closed 3, Open  5),  True),
      ((Closed 3, Open  5), (Closed 3, Open  5),  True),
      ((Closed 2, Open  5), (Closed 3, Open 10),  True),
      ((Closed 3, Open  5), (Closed 2, Open 10),  True),
      ((Closed 3, Open  5), (Closed 1, Open  3), False),
      ((Closed 3, Open  5), (Closed 5, Open  7), False)
    ]
  let actual = r `overlaps` candidate
  return $ expected ~=? actual

I'm not entirely happy with the implementation:

overlaps :: (Ord a1, Ord a2) =>
            (Endpoint a1, Endpoint a2) -> (Endpoint a2, Endpoint a1) -> Bool
overlaps (l1, h1) (l2, h2) =
  let less (Closed x) (Closed y) = x <= y
      less (Closed x)   (Open y) = x <  y
      less   (Open x) (Closed y) = x <  y
      less   (Open x)   (Open y) = x <  y
  in l1 `less` h2 && l2 `less` h1

Noth that the code presented here is problematic in isolation, but if you compare it to the above contains function, there seems to be some repetition going on. Still, it's not quite the same, but the code looks similar enough that it bothers me. I feel that some kind of abstraction is sitting there, right before my nose, mocking me because I can't see it. Still, the code isn't completely duplicated, and even if it was, I can always invoke the rule of three and let it remain as it is.

Which is ultimately what I did.

Equality #

The kata also suggests some test cases to verify that it's possible to compare two ranges for equality. Dutifully I added those test cases to the code base, even though I knew that they'd automatically pass.

"Equals" ~: do
  (x, y, expected) <-
    [
      ((Closed 3, Open  5), (Closed 3, Open  5),  True),
      ((Closed 2, Open 10), (Closed 3, Open  5), False),
      ((Closed 2, Open  5), (Closed 3, Open 10), False),
      ((Closed 3, Open  5), (Closed 2, Open 10), False)
    ]
  let actual = x == y
  return $ expected ~=? actual

In the beginning of this article, I called attention to C#'s regrettable lack of structural equality. Here's an example of what I mean. In Haskell, these tests automatically pass because Endpoint is an Eq instance (by declaration), and all pairs of Eq instances are themselves Eq instances. Simple, elegant, powerful.

Conclusion #

As a first pass at the (admittedly uncomplicated) Range kata, I tried to follow the 'plan' implied by the kata description's test cases. I quickly became frustrated with their lack of completion. They were adequate in indicating to a human (me) what the desired behaviour should be, but insufficient to satisfactorily describe the desired behaviour.

I could, of course, have stuck with only those test cases, and instead of employing the Devil's Advocate technique (which I actively tried to avoid) made an honest effort to implement the functionality.

The things is, however, that I don't trust myself. At its essence, the Range kata is all about edge cases, which are where most bugs tend to lurk. Thus, these are exactly the cases that should be covered by tests.

Having made enough 'dumb' programming mistakes during my career, I didn't trust myself to be able to write correct implementations without more test coverage than originally suggested. That's the reason I added more tests.

On the other hand, I more than once speculated whether property-based testing would make this work easier. I decided to pursue that idea during my second pass at the kata.

Next: A Range kata implementation in F#.

Comments

Petr Kadlec #

I’d have another test case for the Equality function: ((Open 2, Open 6), (Closed 3, Closed 5), True). While it is nice Haskell provides (automatic) structural equality, I don’t think we want to say that the (2, 6) range (on integers!) is something else than the [3, 5] range.

But yes, this opens a can of worms: While (2, 6) = [3, 5] on integers, (2.0, 6.0) is obviously different than [3.0, 5.0] (on reals/Doubles/…). I have no idea: In Haskell, could you write an implementation of a function which would behave differently depending on whether the type argument belongs to a typeclass or not?

2024-01-09 13:38 UTC

Mark Seemann #

Petr, thank you for writing. I don't think I'd add that (or similar) test cases, but it's a judgment call, and it's partly language-specific. What you're suggesting is to consider things that are equivalent equal. I agree that for integers this would be the case, but it wouldn't be for rational numbers, or floating points (or real numbers, if we had those in programming).

In Haskell it wouldn't really be idiomatic, because equality is defined by the Eq type class, and most types just go with the default implementation. What you suggest requires writing an explicit Eq instance for Endpoint. It'd be possible, but then you'd have to deal explicitly with the various integer representations separately from other representations that use floating points.

The distinction between equivalence and equality is largely artificial or a convenient hand wave. To explain what I mean, consider mathematical expressions. Obviously, 3 + 1 is equal to 2 + 2 when evaluated, but they're different expressions. Thus, on an expression level, we don't consider those two expressions equal. I think of the integer ranges (2, 6) and [3, 6] the same way. They evaluate to the same, but there aren't equal.

I don't think that this is a strong argument, mind. In other programming languages, I might arrive at a different decision. It also matters what client code needs to do with the API. In any case, the decision to not consider equivalence the same as equality is congruent with how Haskell works.

The existence of floating points and rational numbers, however, opens another can of worms that I happily glossed over, since I had a completely different goal with the kata than producing a reusable library.

Haskell actually supports rational numbers with the % operator:

ghci> 1%2
1 % 2

This value represents ½, to be explicit.

Unfortunately, according to the specification (or, at least, the documentation) of the Enum type class, the two 'movement' operations succ and pred jump by increments of 1:

ghci> succ $ 1%2
3 % 2
ghci> succ $ succ $ 1%2
5 % 2

The same is the case with floating points:

ghci> succ 1.5
2.5
ghci> succ $ succ 1.5
3.5

This is unfortunate when it comes to floating points, since it would be possible to enumerate all floating points in a range. (For example, if a single-precision floating point occupies 32 bits, there's a finite number of them, and you can enumerate them.)

As Sonat Süer points out, this means that the allPoints function is fundamentally broken for floating points and rational numbers (and possibly other types as well).

One way around that in Haskell would be to introduce a new type class for the purpose of truly enumerating ranges, and either implement it correctly for floating points, or explicitly avoid making Float and Double instances of that new type class. This, on the other hand, would have the downside that all of a sudden, the allPoints function wouldn't support any custom type of which I, as the implementer, is unaware.

If this was a library that I'd actually have to ship as a reusable API, I think I'd start by not including the allPoints function, and then see if anyone asks for it. If or when that happens, I'd begin a process to chart why people need it, and what could be done to serve those needs in a useful and mathematically consistent manner.

2024-01-13 19:51 UTC

Published: Monday, 08 January 2024 07:06:00 UTC

A Range kata implementation in Haskell by Mark Seemann