ploeh blog

A parser and interpreter for a very small language

2025-07-07T06:39:00+00:00

A single Haskell script file.

I recently took the final exam in a course on programming language design. One of the questions was about a tiny language, and since this was a take-home exam running over many days, I had time to spare. Although it wasn't part of any questions, I decided to write an interpreter to back up some claims I made in my answers.

This article documents my prototype parser and interpreter.

Language description #

To be clear, the exam question was not to implement an interpreter, but rather some questions about attributes of the language. The description here is reprinted with kind permission from Torben Ægidius Mogensen.

Consider a functional language where values can be Booleans and pairs. A syntax for the language is given below:

Program → Function⁺

Function → Fid Pattern⁺ = Exp

Pattern → Vid | true | false | (Pattern, Pattern)

Exp → Vid | true | false | Fid Exp⁺ | (Exp)

where Fid denotes function identifiers (which are lower case) and Vid denotes variable identifiers (which are upper case). There can be multiple rules for each functions, but rules must have disjoint patterns. All function calls must be fully applied (no partial applications, so no higher-order functions). A program is executed by calling any function with any argument constructed by pairs and Booleans. An example program is
and true X = X
and false X = false

alltrue true = true
alltrue false = false
alltrue (X, Y) = and (alltrue X) (alltrue Y)
Calling alltrue (true, (false, true)) will return false, but alltrue ((true, true), (true, true)) will return true.

The exam goes on to ask some questions about termination as a property of the language, and whether or not it's Turing complete, but that's not the scope of this article. Rather, I'd like to describe a prototype parser and interpreter I wrote as a single throwaway script file in Haskell.

Declarations and imports #

The code is a single Haskell module that I interacted with via GHCi (the GHC REPL). It starts with a single pragma, a module declaration, and imports.

{-# LANGUAGE FlexibleContexts #-}
module Bopa where
 
import Control.Monad.Identity (Identity)
import Data.Bifunctor (first)
import Data.Foldable (find)
import Data.List.NonEmpty (NonEmpty((:|)))
import qualified Data.List.NonEmpty as NE
import Data.Set (Set)
import qualified Data.Set as Set
import Text.Parsec

The parsec API requires the FlexibleContexts language pragma. The name, Bopa I simply derived from Bools and Pairs, although I'm aware that this little combination of letters has quite an alternative connotation for many Danes, including myself.

Apart from the base library, the packages parsec and containers are required. I didn't use an explicit build system, but if they're not already present on your system, you can ask GHCi to load them.

AST #

The above language description is a context-free grammar, which translates easily into Haskell type declarations.

type Program = NonEmpty Function
data Function =
  Function { fid :: String, fpats :: NonEmpty Pattern, fbody :: Exp }
  deriving (Eq, Show)
data Pattern =
  VarPat String | TPat | FPat | PairPat Pattern Pattern deriving (Eq, Show)
data Exp =
  VarExp String | TExp | FExp | CallExp String (NonEmpty Exp)
  deriving (Eq, Show)

The description doesn't explicitly state how to interpret the superscript +, but I've interpreted it as meaning one or more. Therefore, a Program is a NonEmpty list of Function values. The same line of reasoning applies for the other places where the + sign appears.

Notice that there's more than one representation of Boolean values; TPat is the true pattern, while TExp is the true expression, and likewise for false.

These types describe the entire language, and you can, in principle, create programs directly using this API. While I didn't do that (because I wrote a parser instead), here's what the above and function looks like as an abstract syntax tree (AST):

Function "and" (TPat :| [VarPat "X"]) (VarExp "X") :| [Function "and" (FPat :| [VarPat "X"]) FExp]

Recall that the :| operator is the NonEmpty data constructor.

Source code parsers #

I wanted to be able to write programs directly in the Bopa language, and not just as ASTs, so the next step was writing parsers for each of the data types defined above. As strongly implied by the above imports, I used the parsec package for that.

The Program type is only an alias, and once I have a parser for Function, that one should be straightforward. The parser of Function values is, however, more involved.

functionParser :: Stream s m Char => ParsecT s () m Function
functionParser = do
  fnid <- many1 lower
  let pp = many1 (char ' ') >> patternParser
  -- Next line based on https://stackoverflow.com/a/65570028/126014
  patterns <- (:|) <$> pp <*> pp `manyTill` try (many1 (char ' ') >> char '=')
  skipMany1 (char ' ')
  Function fnid patterns <$> expParser

I readily admit that I don't have much experience with parsec, so it's possible that this could be done more elegantly. As the comment indicates, I struggled somewhat with a detail or two. I had trouble making it consume patterns until it meets the '=' character.

The functionParser depends on another parser named pairPatParser, which again is composed from smaller parsers that handle each case of the Pattern sum type.

varPatParser :: Stream s m Char => ParsecT s () m Pattern
varPatParser = VarPat <$> many1 upper
 
tPatParser :: Stream s m Char => ParsecT s () m Pattern
tPatParser = string "true" >> return TPat
 
fPatParser :: Stream s m Char => ParsecT s () m Pattern
fPatParser = string "false" >> return FPat
 
pairPatParser :: Stream s m Char => ParsecT s () m Pattern
pairPatParser = do
  _ <- char '('
  p1 <- patternParser
  _ <- char ','
  _ <- skipMany $ char ' '
  p2 <- patternParser
  _ <- char ')'
  return $ PairPat p1 p2
 
patternParser :: Stream s m Char => ParsecT s () m Pattern
patternParser =
  try varPatParser <|> try tPatParser <|> try fPatParser <|> try pairPatParser

You might argue that the first three are so simple that they may not really qualify for the status of top-level values, but being a parsec newbie, I found that it helped me to structure the code that way. The only one of those values more complicated than a one-liner is, obviously, pairPatParser. I later discovered between and sepBy1, so it's possible I could also have defined pairPatParser as a composition of such combinators. I didn't, however, try, since this is, after all, throwaway prototype code, and what's there already works as intended.

As an aside, I would usually keenly attempt such refactorings, but I was working without automated tests. Yes, shocking, I know, but setting up unit tests for Haskell is, unfortunately, a bit of a hassle, and given the nature of the work, I considered doing without tests a reasonable trade-off.

This takes care of parsing Pattern values, but notice that functionParser also depends on expParser, which, not surprisingly, parses Exp values. Like patternParser it does that by defining a helper parser for each sum type case, and then combining them into one larger parser.

varExpParser :: Stream s m Char => ParsecT s () m Exp
varExpParser = VarExp <$> many1 upper
 
tExpParser :: Stream s m Char => ParsecT s () m Exp
tExpParser = string "true" >> return TExp
 
fExpParser :: Stream s m Char => ParsecT s () m Exp
fExpParser = string "false" >> return FExp
 
callExpParser :: Stream s m Char => ParsecT s () m Exp
callExpParser = do
  fnid <- many1 lower
  skipMany1 (char ' ')
  exps <- NE.fromList <$> expParser `sepBy1` many1 (char ' ')
  return $ CallExp fnid exps
 
expParser :: Stream s m Char => ParsecT s () m Exp
expParser =
  try varExpParser <|>
  try tExpParser <|>
  try fExpParser <|>
  try callExpParser <|>
  between (char '(') (char ')') expParser

Even though I generally favoured implementing each sum type case in a separate, named parser, I inlined parsing of the parenthesized expression; partly because it's so simple, and partly because I didn't know what to call it.

You can see that at this point, I'd discovered the between and sepBy1 combinators.

Finally, it's possible to compose all these smaller parsers together to a parser of Bopa programs.

programParser :: Stream s m Char => ParsecT s () m (NonEmpty Function)
programParser = NE.fromList <$> functionParser `sepEndBy1` many1 endOfLine

This, however, is parser. How do you run it?

Here's a way:

parseProgram :: Stream s Identity Char
             => s -> Either ParseError (NonEmpty Function)
parseProgram = parse programParser ""

You may, for example, try to parse the above and function:

ghci> parseProgram "and true X = X\nand false X = false"
Right (Function {fid = "and", fpats = TPat :| [VarPat "X"], fbody = VarExp "X"} :|
      [Function {fid = "and", fpats = FPat :| [VarPat "X"], fbody = FExp}])

(Output manually formatted to improve readability.)

In practice, however, I didn't much do that. Instead, I created source code files and loaded them with the basic file-reading APIs included in the base package. You'll see examples of this later.

Arguments #

As described, running a program requires construction of a Boolean value, or pairs of Boolean values, something the language itself does not allow. That's the reason I haven't yet modelled it.

data Arg = TArg | FArg | PairArg Arg Arg deriving (Eq, Ord, Show)

Notice that true and false gets yet another representation as either TArg or FArg.

If I want to be able to run programs by typing alltrue (true, (false, true)), instead of painstakingly creating ASTs, I need a parser for this data type as well. That's not going to be a source code parser, but rather part of a command-line parser.

tArgParser :: Stream s m Char => ParsecT s () m Arg
tArgParser = string "true" >> return TArg
 
fArgParser :: Stream s m Char => ParsecT s () m Arg
fArgParser = string "false" >> return FArg
 
pairArgParser :: Stream s m Char => ParsecT s () m Arg
pairArgParser = do
  _ <- char '('
  p1 <- argParser
  _ <- char ','
  _ <- skipMany $ char ' '
  p2 <- argParser
  _ <- char ')'
  return $ PairArg p1 p2
 
argParser :: Stream s m Char => ParsecT s () m Arg
argParser = tArgParser <|> fArgParser <|> pairArgParser

To be honest, I think that I just copied and pasted pairPatParser and changed a few things. It looks that way, doesn't it?

Entry points #

In order to execute a program, you need more than arguments. You need to define which function to call. I decided that this was close enough to defining a program entry point that it gave name to the next type.

data Entry = Entry String (NonEmpty Arg) deriving (Eq, Show)

The String value identifies the desired function by name, and the NonEmpty list supplies the arguments.

Since I wish to be able to run a program by writing e.g. alltrue ((true, true), (true, true)), I need a parser for that, too.

entryParser :: Stream s m Char => ParsecT s () m Entry
entryParser = do
  fnid <- many1 lower
  skipMany1 (char ' ')
  args <- NE.fromList <$> argParser `sepBy1` many1 (char ' ')
  return $ Entry fnid args

This, again, is a parser; it's convenient to also define a function to run it against input.

parseEntry :: Stream s Identity Char => s -> Either ParseError Entry
parseEntry = parse entryParser ""

Let's see if it works:

ghci> parseEntry "alltrue ((true, true), (true, true))"
Right (Entry "alltrue" (PairArg (PairArg TArg TArg) (PairArg TArg TArg) :| []))

That seems promising.

Parameter binding #

Armed with the ability to parse programs as well as entry points, 'all' that remains is to execute the program. To that end, I wrote an interpreter. It works with a few helper functions, the first of which attempts to bind patterns to arguments.

For example, if we have a variable-name pattern such as X and an argument such as (true, false), we can bind X to that value. Some examples will help, but I'll show the function first, and then talk you through it.

-- Attempt pattern matching and, if possible, bind variables to arguments.
-- Returns an association list of bound variables (an 'environment'), if any.
-- Returns Left with an error message if no match.
tryBind :: NonEmpty Pattern -> NonEmpty Arg -> Either String [(String, Arg)]
tryBind (VarPat p :| []) (arg :| []) = Right [(p, arg)]
tryBind (TPat :| []) (TArg :| []) = Right []
tryBind (FPat :| []) (FArg :| []) = Right []
tryBind (PairPat p1 p2 :| []) ((PairArg a1 a2) :| []) =
  let b1 = tryBind (NE.singleton p1) (NE.singleton a1)
      b2 = tryBind (NE.singleton p2) (NE.singleton a2)
  in (++) <$> b1 <*> b2
tryBind (pat :| (p:ps)) (arg :| (a:as)) =
  let b  = tryBind (NE.singleton pat) (NE.singleton arg)
      bs = tryBind (p :| ps) (a :| as)
  in (++) <$> b <*> bs
tryBind _ args = Left ("Could not match " ++ show args ++ ".")

Notice the type declaration: The function takes a NonEmpty list of Pattern values, and another NonEmpty list of Arg values. The first precondition in order to achieve a successful result is that these two lists need to have the same length. If we have more arguments than patterns, we run out of patterns. If we have more patterns than arguments, we can't bind all the parameters in the patterns, and partial application is not allowed.

The first four rules of the tryBind function attempt to match a single Pattern value to a single Arg value; notice the use of the :| NonEmpty data constructor: In all four cases, the tail of the NonEmpty lists only matches the empty list [].

The first rule, for example, has a single variable pattern, where p is the variable name, and a single argument arg, so that pattern matching succeeds and the variable name is bound to the argument. Here's an example:

ghci> tryBind (VarPat "X" :| []) (PairArg TArg FArg :| [])
Right [("X",PairArg TArg FArg)]

The result is a variable environment in which the variable name X is bound to the value PairArg TArg FArg (that is, (true, false)).

Sometimes, when matching literals, no variables are bound, in which case the environment is empty:

ghci> tryBind (TPat :| []) (TArg :| [])
Right []

While the environment itself is empty, the result is still a Right case, indicating that the pattern matched the argument. This, of course, need not be the case:

ghci> tryBind (TPat :| []) (FArg :| [])
Left "Could not match FArg :| []."

The rule that attempts to match a pair with a pair argument recursively calls tryBind for the left and the right element, and then uses the Applicative nature of Either to compose those two results.

ghci> tryBind (PairPat TPat (VarPat "Y") :| []) (PairArg TArg FArg :| [])
Right [("Y",FArg)]

In this example, you see how a pair pattern composed of (true, Y) matches the argument (true, false), resulting in the variable environment where Y is bound to false.

The final Right-valued match is when there's more than a single pattern, and more than a single argument. In that case, the function recursively calls itself with the heads of each NonEmpty list, as well as the tails of each NonEmpty list.

ghci> tryBind (PairPat TPat (VarPat "Y") :| [VarPat "Z"]) (PairArg TArg FArg :| [PairArg FArg TArg])
Right [("Y",FArg),("Z",PairArg FArg TArg)]

In this example, we try to bind the variables in the patterns (true, Y) Z with the arguments (true, false) (false, true), producing the variable environment where Y is bound to false, and Z is bound to (false, true).

This exhausts all the legal bindings, so the final, wildcard pattern in tryBind returns a Left value indicating the failure. You've already seen an example of that, above.

That function is a bit of a mouthful, but fortunately, we've now covered a major part of the interpreter.

Pattern matching #

The tryBind function attempts to bind a single list of patterns to a list of arguments. A function may, however, list several (non-overlapping) rules, so if the first pattern list doesn't match, the interpreter must try the second, the third, and so on, until there are no more patterns to try. While tryBind does the heavy lifting, another function goes through the list of rules.

-- Goes through one or more function rules, looking for a match.
-- All the functions in the function list are assumed to have the same name, so
-- that they are all rules of the same function.
-- This precondition is not checked here, but handled by the caller. This isn't
-- the best implementation decision, but this is, after all, a prototype.
tryMatch :: NonEmpty Function
         -> NonEmpty Arg
         -> Either [Char] ([(String, Arg)], Exp)
tryMatch (Function _ pats body :| []) args = (, body) <$> tryBind pats args
tryMatch (Function _ pats body :| (f : fs)) args =
  case tryBind pats args of
    Right b -> Right (b, body)
    Left _ -> tryMatch (f :| fs) args

There are two (Haskell) rules for tryMatch: One where there's only one Function rule, and one where there's more than one.

In the first case, tryMatch delegates to tryBind, but if the binding attempt succeeds, also returns the body.

ghci> tryMatch (Function "and" (FPat :| [VarPat "X"]) FExp :| []) (FArg :| [TArg])
Right ([("X",TArg)],FExp)

This example attempts to bind the second rule of the above and function. Compare the input to the AST for and shown above. The result is a tuple where the first, or left, element is the variable environment, and the second, or right, element is the expression that matched.

It's important to return the matching expression, since tryMatch doesn't in itself evaluate the body. In case of multiple rules, the interpreter needs to know which body is associated with the matching pattern.

ghci> tryMatch (Function "and" (TPat :| [VarPat "X"]) (VarExp "X") :|
               [Function "and" (FPat :| [VarPat "X"]) FExp])
               (TArg :| [TArg])
Right ([("X",TArg)],VarExp "X")
ghci> tryMatch (Function "and" (TPat :| [VarPat "X"]) (VarExp "X") :|
               [Function "and" (FPat :| [VarPat "X"]) FExp])
               (FArg :| [TArg])
Right ([("X",TArg)],FExp)

(Inputs manually formatted for improved readability.)

These two examples try to pattern match the above and function. In the first example, the input is true false, which matches the first rule and true X = X. Therefore, the return value is Right ([("X",TArg)],VarExp "X"), indicating a new variable environment in which X is bound to true, and the matching body is VarExp "X", indicating that the variable X is returned.

In the second example, the input is (false, true), which now matches the second rule and false X = false. The returned tuple now indicates that X is still bound to true, but the returned body is now FExp, indicating the constant return value false.

In both cases, tryMatch starts in the second (Haskell) rule, since there are two parameters. In the first example, the first call to tryBind immediately returns a Right result, which is then returned. In the second example, on the other hand, the first call to tryBind returns a Left-value result, which causes tryMatch to recurse back on itself with the remaining (Bopa) rules.

Evaluation #

Given a variable environment and an expression, it's now possible to evaluate the expression to a value.

-- Evaluate an expression, given a program (AST) and an environment.
-- Also required as input is a set used for cycle detection. Set elements are
-- tuples, each consisting of a function identifier (name) and arguments to that
-- function. If the evaluator recursively sees that tuple again, it has detected
-- a cycle, and stops further evaluation.
eval :: Foldable t
     => Set (String, NonEmpty Arg)
     -> t (NonEmpty Function)
     -> [(String, Arg)]
     -> Exp
     -> Either String Arg
eval _ _ env (VarExp name) =
  maybe (Left ("Could not find variable " ++ name ++ ".")) Right $
  lookup name env
eval _ _ _ TExp = Right TArg
eval _ _ _ FExp = Right FArg
eval observedCalls prog env (CallExp fnid exps) = do
  rules <-
    maybe (Left ("Could not find function " ++ fnid ++ ".")) Right $
    find ((fnid ==) . fid . NE.head) prog
  args <- traverse (eval observedCalls prog env) exps
  (env', body) <- tryMatch rules args
  if Set.member (fnid, args) observedCalls
  then Left "Cycle detected."
  else eval (Set.insert (fnid, args) observedCalls) prog env' body

This looks like quite a mouthful, but notice that almost half of this code listing is a comment and a type declaration.

As the comment indicates, this function includes cycle detection, which was prompted by the exam questions related to the property of termination. You'll see an example of this later.

The eval function pattern matches the four different cases of the Exp sum type. In the first case, if the expression is a variable expression, it tries to lookup the variable in the environment. If found, it's returned; otherwise, an error message is returned.

The two next (Haskell) rules simply translate the Boolean representations from patterns to argument values.

Finally, if the expression is a function call, more work needs to be done. First, eval tries to find the function in the program. The eval function expects the program prog to be grouped in function rules. For example, it'd expect the above and function to be a NonEmpty list of Function values, and it'd expect, say, alltrue to be another NonEmpty list containing three Function values.

If eval finds the named function, it proceeds to evaluate all the expressions (exps) that make up the arguments. It traverses exps and calls itself recursively for each argument.

Armed with both rules and args it calls tryMatch to get a new variable environment and the body that matched. If it gets past the cycle detection, it proceeds to call itself recursively with the new environment and the body that matched.

Supplying a direct example of calling this function is becoming awkward, as it requires balancing quite a few parentheses, but it can be done.

ghci> eval
        Set.empty
        [Function "and" (TPat :| [VarPat "X"]) (VarExp "X") :|
            [Function "and" (FPat :| [VarPat "X"]) FExp]]
        [("X",TArg)]
        TExp
Right TArg

(Input manually formatted for improved readability.)

This example starts with an empty cycle-detection set, the rules group for and, a variable environment in which X is already bound to true, and evaluates the expression TExp (i.e. true). The result is TArg (i.e. true) wrapped in Right, indicating that evaluation was successful.

Interpretation #

All building blocks for an interpreter are now in place.

-- Interpret a program (AST), given an entry point and its arguments.
interpret :: Foldable f => f Function -> Entry -> Either String Arg
interpret prog (Entry fnid args) = do
  let functions =  NE.groupWith fid prog -- Group function rules together
  -- The rules that make up `fnid`:
  rules <-
    maybe (Left ("Could not find function " ++ fnid ++ ".")) Right $
    find ((fnid ==) . fid . NE.head) functions
  (env, body) <- tryMatch rules args
  eval Set.empty functions env body

This function expects that the program (prog) supplied to it is the raw result of parsing a program. The parser doesn't group identically-named function rules together, so that's the first thing that interpret does.

It then proceeds to look through functions to find the function indicated by the entry point. If it succeeds, it calls tryMatch to identify the environment and the body to be evaluated. Finally, it calls eval with these values.

ghci> interpret
        [Function "and" (TPat :| [VarPat "X"]) (VarExp "X"),
         Function "and" (FPat :| [VarPat "X"]) FExp]
        (Entry "and" (TArg :| [TArg]))
Right TArg

(Input manually formatted for improved readability.)

Like all the above examples, this example processes the and function, calling it with the input values true true, which returns a value representing true, just as we'd expect.

The interpreter seems to be working as intended, but it works on the AST. It's time to connect the parsers with the interpreter.

Formatting results #

It'd be more convenient if we feed some source code and a function call into a function and have it spit out the result. In order to make the result prettier, I first added a little formatter for Arg:

formatArg :: Arg -> String
formatArg TArg = "true"
formatArg FArg = "false"
formatArg (PairArg a1 a2) = "(" ++ formatArg a1 ++ ", " ++ formatArg a2 ++ ")"

Not surprisingly, formatArg calls itself recursively in order to deal with pairs, and nested pairs.

ghci> formatArg (PairArg TArg (PairArg FArg TArg))
"(true, (false, true))"

It's not really required in order to parse and run a program, but I think that such a function should produce output that looks like the input fed into it.

Running programs #

All building blocks are now in place to compose a function that parses and runs a program.

-- Run a given program source and a command that identifies entry point and
-- arguments.
-- Despite the generalized type, it can be called as
-- String -> String -> Either String String
run :: (Stream s1 Identity Char, Stream s2 Identity Char)
    => s1 -> s2 -> Either String String
run source cmd = do
  prog <- first show $ parseProgram source
  exec <- first show $ parseEntry cmd
  formatArg <$> interpret prog exec

As the comment suggests, you can call it by feeding it two string literals:

ghci> run "and true X = X\nand false X = false" "and true true"
Right "true"

Having to supply entire programs from the REPL gets old fast, however, so instead you can save source code as files. I saved the original examples (containing and and alltrue) in a file named ex.bopa. This enabled me to load the file and call functions in it:

ghci> run <$> readFile "ex.bopa" <*> pure "alltrue (true, (false, true))"
Right "false"
ghci> run <$> readFile "ex.bopa" <*> pure "alltrue ((true, true), (true, true))"
Right "true"

Those are the two examples originally included in the exam set, and fortunately the results are correct.

A few more examples #

I wanted to subject my code to a bit more testing, so wrote a few more example programs. This one I saved in a file called evenodd.bopa:

and  true X = X
and false X = false

or  true X = true
or false X = X

not  true = false
not false =  true

odd  true = true
odd false = true
odd (X, Y) = or (and (odd X) (even Y)) (and (even X) (odd Y))

even X = not (odd X)

The idea with odd is that it indicates whether the input contains an odd number of Boolean values; of course, even is then the negation of odd.

ghci> run <$> readFile "evenodd.bopa" <*> pure "odd true"
Right "true"
ghci> run <$> readFile "evenodd.bopa" <*> pure "even true"
Right "false"
ghci> run <$> readFile "evenodd.bopa" <*> pure "odd (true, false)"
Right "false"
ghci> run <$> readFile "evenodd.bopa" <*> pure "even (true, false)"
Right "true"
ghci> run <$> readFile "evenodd.bopa" <*> pure "odd (true, (false, true))"
Right "true"

Ad hoc tests like these gave me confidence that things aren't completely wrong.

Cycle detection #

Finally, you may be curious to see whether the cycle detection works. The simplest example I could come up with was this:

ghci> run "forever X = forever X" "forever false"
Left "Cycle detected."

Even so, I also wanted to test that it works for a small cycle that involves more than one function, so I saved the following in a file called tictactoe.bopa:

tic X = tac X

tac X = toe X

toe X = tic X

foo (false, Y) = Y
foo (true, Y) = tic Y

These functions may cause an infinite cycle, depending on input.

ghci> run <$> readFile "tictactoe.bopa" <*> pure "foo (false, (true, false))"
Right "(true, false)"
ghci> run <$> readFile "tictactoe.bopa" <*> pure "foo (true, (true, false))"
Left "Cycle detected."

The run function implements an algorithm that is always able to determine, in finite time, whether a program terminates or not. Thus, in case you're wondering: The language isn't Turing complete.

Conclusion #

Implementing a parser and interpreter for the Bopa language wasn't part of the exam question, but I had some time to spare, and also found that I had trouble describing, in unambiguous terms, how to detect termination. I decided to write the interpreter to show a code example, and then took on the parser as an extra exercise.

It took me a long day of intense coding to produce the prototype shown here, including the various example Bopa programs. No AI was involved. It was fun.

This blog is totally free, but if you like it, please consider supporting it.

Song recommendations from Haskell combinators

2025-06-30T05:54:00+00:00

Traversing lists of IO. A refactoring.

This article is part of a series named Alternative ways to design with functional programming. In the previous article, you saw how to refactor the example code base to a composition of standard F# combinators. It's a pragmatic solution to the problem of dealing with lots of data in a piecemeal fashion, but although it uses concepts and programming constructs from functional programming, I don't consider it a proper functional architecture.

You'd expect the Haskell version to be the most idiomatic of the three language variations, but ironically, I had more trouble making the code in this article look nice than I had with the F# variation. You'll see what the problem is later, but it boils down to a combination of Haskell's right-to-left default composition order, and precedence rules of some of the operators.

Please consult the previous articles for context about the example code base. The code shown in this article is from the combinators Git branch. It refactors the code shown in the article Porting song recommendations to Haskell.

The goal is to extract pure functions from the overall recommendations algorithm and compose them using standard combinators, such as =<<, <$>, and traverse.

Getting rid of local mutation #

My first goal was to get rid of the IORef-based local mutation shown in the 'baseline' code base. That wasn't too difficult. If you're interested in the micro-commits I made to get to that milestone, you can consult the Git repository. The interim result looked like this:

getRecommendations srvc un = do
  -- 1. Get user's own top scrobbles
  -- 2. Get other users who listened to the same songs
  -- 3. Get top scrobbles of those users
  -- 4. Aggregate the songs into recommendations

  -- Impure
  scrobbles <- getTopScrobbles srvc un
 
  -- Pure
  let scrobblesSnapshot = take 100 $ sortOn (Down . scrobbleCount) scrobbles
 
  -- Impure
  recommendations <-
    join <$>
    traverse (\scrobble ->
      fmap join $
      traverse (\otherListener ->
        fmap scrobbledSong .
        take 10 .
        sortOn (Down . songRating . scrobbledSong) .
        filter (songHasVerifiedArtist . scrobbledSong) <$>
        getTopScrobbles srvc (userName otherListener)) .
      take 20 <$>
      sortOn (Down . userScrobbleCount) .
      filter ((10_000 <=) . userScrobbleCount) =<<
      getTopListeners srvc (songId $ scrobbledSong scrobble))
    scrobblesSnapshot
 
  -- Pure
  return $ take 200 $ sortOn (Down . songRating) recommendations

Granted, it's not the most readable way to present the algorithm, but it is, after all, only an intermediate step. As usual, I'll remind the reader that Haskell code should, by default, be read from right to left. When split over multiple lines, this also means that an expression should be read from the bottom to the top. Armed with that knowledge (and general knowledge of Haskell), combined with some helpful indentation, it's not altogether unreadable, but not something I'd like to come back to after half a year. And definitely not something I would foist upon (hypothetical) colleagues.

The careful reader may notice that I've decided to use the reverse bind operator =<<, rather than the standard >>= operator. I usually do that with Haskell, because most of Haskell is composed from right to left, and =<< is consistent with that direction. The standard >>= operator, on the other hand, composes monadic actions from left to right. You could argue that that's more natural (to Western audiences), but since everything else stays right-to-left biased, using >>= confuses the reading direction.

As a Westerner, I prefer left-to-right reading order, but in general I've found it hard to fight Haskell's bias in the other direction.

As the -- Pure and -- Impure comments indicate, interleaving the pure functions with impure actions makes the entire expression impure. The more I do that, the less pure code remains.

Single expression #

Going from from the above snapshot to a single impure expression doesn't require many more steps.

getRecommendations srvc un =
  -- 1. Get user's own top scrobbles
  -- 2. Get other users who listened to the same songs
  -- 3. Get top scrobbles of those users
  -- 4. Aggregate the songs into recommendations

  take 200 . sortOn (Down . songRating) <$>
  ((\scrobbles ->
    join <$>
    traverse (\scrobble ->
      fmap join $
      traverse (\otherListener ->
        fmap scrobbledSong .
        take 10 .
        sortOn (Down . songRating . scrobbledSong) .
        filter (songHasVerifiedArtist . scrobbledSong) <$>
        getTopScrobbles srvc (userName otherListener)) .
      take 20 <$>
      sortOn (Down . userScrobbleCount) .
      filter ((10_000 <=) . userScrobbleCount) =<<
      getTopListeners srvc (songId $ scrobbledSong scrobble))
    (take 100 $ sortOn (Down . scrobbleCount) scrobbles)) =<<
  getTopScrobbles srvc un)

Neither did it improve readability.

Helper functions #

As in previous incarnations of this exercise, it helps if you extract some well-named helper functions, like this one:

getUsersOwnTopScrobbles :: [Scrobble] -> [Scrobble]
getUsersOwnTopScrobbles = take 100 . sortOn (Down . scrobbleCount)

As a one-liner, that one perhaps isn't that impressive, but none of them are particularly complicated. The biggest function is this:

getTopScrobblesOfOtherUsers :: [Scrobble] -> [Song]
getTopScrobblesOfOtherUsers =
  fmap scrobbledSong .
  take 10 .
  sortOn (Down . songRating . scrobbledSong) .
  filter (songHasVerifiedArtist . scrobbledSong)

You can see the rest in the Git repository. None of them are exported by the module, which makes them implementation details that you may decide to change or remove at a later date.

You can now compose the overall action.

getRecommendations srvc un =
  -- 1. Get user's own top scrobbles
  -- 2. Get other users who listened to the same songs
  -- 3. Get top scrobbles of those users
  -- 4. Aggregate the songs into recommendations

  (aggregateTheSongsIntoRecommendations . getTopScrobblesOfOtherUsers) . join <$>
  ((traverse (getTopScrobbles srvc . userName) .
      getOtherUsersWhoListenedToTheSameSongs) . join =<<
    (traverse (getTopListeners srvc . (songId . scrobbledSong)) .
      getUsersOwnTopScrobbles =<< getTopScrobbles srvc un))

Some of the parentheses break over multiple lines in a non-conventional way. This is my best effort to format the code in a way that emphasises the four steps comprising the algorithm, while still staying within the bounds of the language, and keeping hlint silent.

I could try to argue that if you squint a bit, the operators and other glue like join should fade into the background, but in this case, I don't even buy that argument myself.

It bothers me that it's so hard to compose the code in a way that approaches being self-documenting. I find that the F# composition in the previous article does a better job of that.

Syntactic sugar #

The stated goal in this article is to demonstrate how it's possible to use standard combinators to glue the algorithm together. I've been complaining throughout this article that, while possible, it leaves the code less readable than desired.

That one reader who actually knows Haskell is likely frustrated with me. After all, the language does offer a way out. Using the syntactic sugar of do notation, you can instead write the composition like this:

getRecommendations srvc un = do
  -- 1. Get user's own top scrobbles
  -- 2. Get other users who listened to the same songs
  -- 3. Get top scrobbles of those users
  -- 4. Aggregate the songs into recommendations

  userTops <- getTopScrobbles srvc un <&> getUsersOwnTopScrobbles
  otherListeners <- 
    traverse (getTopListeners srvc . (songId . scrobbledSong)) userTops <&>
    getOtherUsersWhoListenedToTheSameSongs . join
  songs <-
    traverse (getTopScrobbles srvc . userName) otherListeners <&>
    getTopScrobblesOfOtherUsers . join
  return $ aggregateTheSongsIntoRecommendations songs

By splitting the process up into steps with named variables, you can achieve the much-yearned-for top-to-bottom reading order. Taking advantage of the <&> operator from Data.Functor we also get left-to-right reading order on each line.

That's the best I've been able to achieve under the constraint that the IO-bound operations stay interleaved with pure functions.

Conclusion #

Mixing pure functions with impure actions like this is necessary when composing whole programs (usually at the entry point; i.e. main), but shouldn't be considered good functional-programming style in general. The entire getRecommendations action is impure, being non-deterministic.

Still, even Haskell code eventually needs to compose code in this way. Therefore, it's relevant covering how this may be done. Even so, alternative architectures exist.

Next: Song recommendations with pipes and filters.

This blog is totally free, but if you like it, please consider supporting it.

Song recommendations from F# combinators

2025-06-23T05:49:00+00:00

Traversing sequences of tasks. A refactoring.

This article is part of a series named Alternative ways to design with functional programming. In the previous article, you saw how to refactor the example code base to a composition of standard combinators. It's a pragmatic solution to the problem of dealing with lots of data in a piecemeal fashion, but although it uses concepts and programming constructs from functional programming, I don't consider it a proper functional architecture.

Porting the C# code to F# doesn't change that part, but most F# developers will probably agree that this style of programming is more idiomatic in F# than in C#.

Please consult the previous articles for context about the example code base. The code shown in this article is from the fsharp-combinators Git branch. It refactors the code shown in the article Porting song recommendations to F#.

The goal is to extract pure functions from the overall recommendations algorithm and compose them using standard combinators, such as bind, map, and traverse.

Composition from combinators #

Let's start with the completed composition, and subsequently look at the most interesting parts.

type RecommendationsProvider (songService : SongService) =
    member _.GetRecommendationsAsync =
        // 1. Get user's own top scrobbles
        // 2. Get other users who listened to the same songs
        // 3. Get top scrobbles of those users
        // 4. Aggregate the songs into recommendations
        songService.GetTopScrobblesAsync
        >> Task.bind (
            getOwnTopScrobbles
            >> TaskSeq.traverse (
                _.Song.Id
                >> songService.GetTopListenersAsync
                >> Task.bind (
                    getTopScrobbles
                    >> TaskSeq.traverse (
                        _.UserName
                        >> songService.GetTopScrobblesAsync
                        >> Task.map aggregateRecommendations)))
            >> Task.map (Seq.flatten >> Seq.flatten >> takeTopRecommendations))

This is a single expression with nested subexpressions, and you may notice that it's completely point-free. This may be a little hardcore even for most F# programmers, since F# idiomatically favours explicit lambda expressions and the pipeline operator |>.

Although I'm personally fascinated by point-free programming, I might consider a more fun alternative if working in a team. You can see such a variation in the Git repository in an intermediary commit. The reason that I've pulled so heavily in this direction here is that it more clearly demonstrate why we call such functions combinators: They provide the glue that enable us to compose functions together.

If you're wondering, >> is also a combinator. In Haskell it's more common with 'unpronounceable' operators such as >>=, ., &, etc., and I'd argue that such terse operators can make code more readable.

The functions getOwnTopScrobbles, getTopScrobbles, aggregateRecommendations, and takeTopRecommendations are helper functions. Here's one of them:

let private getOwnTopScrobbles scrobbles =
    scrobbles |> Seq.sortByDescending (fun s -> s.ScrobbleCount) |> Seq.truncate 100

The other helpers are also simple, single-expression functions like this one.

As Oleksii Holub implies, you could make each of these small functions public if you wished to test them individually.

Let's now look at the various building blocks that enable this composition.

Combinators #

The F# base library comes with more standard combinators than are generally available for C#, not only for lists, but also Option and Result values. On the other hand, when it comes to asynchronous monads, the F# base library offers task and async computation expressions, but no Task module. You'll need to add Task.bind and Task.map yourself, or import a library that exports those combinators. The article Asynchronous monads shows the implementation used here.

The traverse implementation shouldn't be too surprising, either, but here I implemented it directly, instead of via sequence.

let traverse f xs =
    let go acc x = task {
        let! x' = x
        let! acc' = acc
        return Seq.append acc' [x'] }
    xs |> Seq.map f |> Seq.fold go (task { return [] })

Finally, the flatten function is the standard implementation that goes via monadic bind. In F#'s Seq module, bind is called collect.

let flatten xs = Seq.collect id xs

That all there is to it.

Conclusion #

In this article, you saw how to port the C# code from the previous article to F#. Since this style of programming is more idiomatic in F#, more building blocks and language features are available, and hence this kind of refactoring is better suited to F#.

I still don't consider this proper functional architecture, but it's pragmatic and I could see myself writing code like this in a professional setting.

Next: Song recommendations from Haskell combinators.

This blog is totally free, but if you like it, please consider supporting it.

Song recommendations from C# combinators

2025-06-16T07:41:00+00:00

LINQ-style composition, including SelectMany and Traverse.

This article is part of a larger series titled Alternative ways to design with functional programming. In the previous article, I described, in general terms, a pragmatic small-scale architecture that may look functional, although it really isn't.

Please consult the previous articles for context about the example code base. The code shown in this article is from the combinators Git branch.

The goal is to extract pure functions from the overall recommendations algorithm and compose them using standard combinators, such as SelectMany (monadic bind), Select, and Traverse.

Composition from combinators #

Let's start with the completed composition, and subsequently look at the most interesting parts.

public Task<IReadOnlyList<Song>> GetRecommendationsAsync(string userName)
{
    // 1. Get user's own top scrobbles
    // 2. Get other users who listened to the same songs
    // 3. Get top scrobbles of those users
    // 4. Aggregate the songs into recommendations
 
    return _songService.GetTopScrobblesAsync(userName)
        .SelectMany(scrobbles => UserTopScrobbles(scrobbles)
            .Traverse(scrobble => _songService
                .GetTopListenersAsync(scrobble.Song.Id)
                .Select(TopListeners)
                .SelectMany(users => users
                    .Traverse(user => _songService
                        .GetTopScrobblesAsync(user.UserName)
                        .Select(TopScrobbles))
                    .Select(Songs)))
        .Select(TakeTopRecommendations));
}

This is a single expression with nested subexpressions.

The functions UserTopScrobbles, TopListeners, TopScrobbles, Songs, and TakeTopRecommendations are private helper functions. Here's one of them:

private static IEnumerable<Scrobble> UserTopScrobbles(IEnumerable<Scrobble> scrobbles)
{
    return scrobbles.OrderByDescending(scrobble => scrobble.ScrobbleCount).Take(100);
}

The other helpers are also simple, single-expression functions like this one.

As Oleksii Holub implies, you could make each of these small functions public if you wished to test them individually.

Let's now look at the various building blocks that enable this composition.

Asynchronous monad #

C# (or .NET) in general only comes with standard combinators for IEnumerable<T>, so whenever you need them for other monads, you have to define them yourself (or pull in a reusable library that defines them). For the above composition, you'll need SelectMany and Select for Task computations. You can see implementations in the article Asynchronous monads, so I'll not repeat the code here.

One exception is this extension method, which is a variant monadic return, which I'm not sure if I've published before:

internal static Task<T> AsTask<T>(this T source)
{
    return Task.FromResult(source);
}

Nothing much is going on here, since it's just a wrapper of Task.FromResult. The this keyword, however, makes AsTask an extension method, which makes usage marginally prettier. It's not used in the above composition, but, as you'll see below, in the implementation of Traverse.

Traversal #

The traversal could be implemented from a hypothetical Sequence action, but you can also implement it directly, which is what I chose to do here.

internal static Task<IEnumerable<TResult>> Traverse<T, TResult>(
    this IEnumerable<T> source,
    Func<T, Task<TResult>> selector)
{
    return source
        .Select(selector)
        .Aggregate(
            Enumerable.Empty<TResult>().AsTask(),
            async (acc, x) => (await acc).Append(await x));
}

Mapping selector over source produces a sequence of tasks. The Aggregate expression subsequently inverts the containers to a single task that contains a sequence of result values.

That's really all there is to it.

Conclusion #

In the previous article, I made no secret of my position on this refactoring. For the example at hand, the benefit is at best marginal. The purpose of this article isn't to insist that you must write code like this. Rather, it's a demonstration of what's possible.

If you have a problem which is similar, but more complicated, refactoring to standard combinators may be a good idea. After all, a standard combinator like SelectMany, Traverse, etc. is well-understood and lawful. You should expect combinators to be defect-free, so using them instead of ad-hoc code constructs like nested loops with conditionals could help eliminate some trivial bugs.

Additionally, if you're working with a team comfortable with these few abstractions, code assembled from standard combinators may actually turn out to be more readable that code buried in ad-hoc imperative control flow. And if not everyone on the team is on board with this style, perhaps it's an opportunity to push the envelope a bit.

Of course, if you use a language where such constructs are already idiomatic, colleagues should already be used to this style of programming.

Next: Song recommendations from F# combinators.

This blog is totally free, but if you like it, please consider supporting it.

Song recommendations from combinators

2025-06-09T14:02:00+00:00

Interleaving impure actions with pure functions. Not really functional programming.

This article is part of a larger article series about alternative ways to design with functional programming, particularly when faced with massive data loads. In the previous few articles, you saw functional architecture at its apparent limit. With sufficiently large data sizes, the Impureim Sandwich pattern starts to buckle. That's really not an indictment of that pattern; only an observation that no design pattern applies universally.

In this and the next few articles, we'll instead look at a more pragmatic option. In this article I'll discuss the general idea, and follow up in other articles with examples in three different languages.

In this overall article series, I'm using Oleksii Holub's inspiring article Pure-Impure Segregation Principle as an outset for the code example. Previous articles in this article series have already covered the basics, but the gist of it is a song recommendation service that uses past play information ('scrobbles') to suggest new songs to a user.

Separating pure functions from impure composition #

In the original article, Oleksii Holub suggests a way to separate pure functions from impure actions: We may extract as much pure code from the overall algorithm as possible, but we're still left with pure functions and impure actions mixed together.

Here's my reproduction of that suggestion, with trivial modifications:

// Pure
public static IReadOnlyList<int> HandleOwnScrobbles(IReadOnlyCollection<Scrobble> scrobbles) =>
    scrobbles
        .OrderByDescending(s => s.ScrobbleCount)
        .Take(100)
        .Select(s => s.Song.Id)
        .ToArray();
 
// Pure
public static IReadOnlyList<string> HandleOtherListeners(IReadOnlyCollection<User> users) =>
    users
        .Where(u => u.TotalScrobbleCount >= 10_000)
        .OrderByDescending(u => u.TotalScrobbleCount)
        .Take(20)
        .Select(u => u.UserName)
        .ToArray();
 
// Pure
public static IReadOnlyList<Song> HandleOtherScrobbles(IReadOnlyCollection<Scrobble> scrobbles) =>
    scrobbles
        .Where(s => s.Song.IsVerifiedArtist)
        .OrderByDescending(s => s.Song.Rating)
        .Take(10)
        .Select(s => s.Song)
        .ToArray();
 
// Pure
public static IReadOnlyList<Song> FinalizeRecommendations(IReadOnlyList<Song> songs) =>
    songs
        .OrderByDescending(s => s.Rating)
        .Take(200)
        .ToArray();
 
public async Task<IReadOnlyList<Song>> GetRecommendationsAsync(string userName)
{
    // Impure
    var scrobbles = await _songService.GetTopScrobblesAsync(userName);
 
    // Pure
    var songIds = HandleOwnScrobbles(scrobbles);
 
    var recommendationCandidates = new List<Song>();
    foreach (var songId in songIds)
    {
        // Impure
        var otherListeners = await _songService
            .GetTopListenersAsync(songId);
 
        // Pure
        var otherUserNames = HandleOtherListeners(otherListeners);
 
        foreach (var otherUserName in otherUserNames)
        {
            // Impure
            var otherScrobbles = await _songService
                .GetTopScrobblesAsync(otherUserName);
 
            // Pure
            var songsToRecommend = HandleOtherScrobbles(otherScrobbles);
 
            recommendationCandidates.AddRange(songsToRecommend);
        }
    }
 
    // Pure
    return FinalizeRecommendations(recommendationCandidates);
}

As Oleksii Holub writes,

"However, instead of having one cohesive element to reason about, we ended up with multiple fragments, each having no meaning or value of their own. While unit testing of individual parts may have become easier, the benefit is very questionable, as it provides no confidence in the correctness of the algorithm as a whole."

I agree with that assessment, but still find it warranted to pursue the idea a little further. After all, my goal with this overall article series isn't to be prescriptive, but rather descriptive. By presenting and comparing alternatives, we become aware of more options. This, hopefully, helps us choose the 'right tool for the job'.

Triple-decker sandwich? #

If we look closer at this alternative, however, we find that we only need to deal with three impure actions. We might, then, postulate that this is an expanded sandwich - a triple-decker sandwich, if you will.

To be clear, I don't find this a reasonable argument. Even if you accept expanding the sandwich metaphor to add a pure validation step, the number of layers, and the structure of the sandwich would still be known at compile time. You may start at the impure boundary, then phase into pure validation, return to another impure step to gather data, call your 'main' pure function, and finally write the result to some kind of output. To borrow a figure from the What's a sandwich? article:

On the other hand, this isn't what the above code suggestion does. The problem with the song recommendation algorithm is that the impure actions cascade. While we start with a single impure out-of-process query, we then use the result of that to loop over, and perform n more queries. This, in fact, happens again, nested in the outer loop, so in terms of network calls, we're looking at an O(n²) algorithm.

We can actually be more precise than that, because the 'outer' queries actually limit their result sets. The first query only considers the top 100 results, so we know that GetTopListenersAsync is going to be called at most 100 times. The result of this call is again limited to the top 20, so that the inner calls to GetTopScrobblesAsync run at most 20 * 100 = 2,000 times. In all, the upper limit is 1 + 100 + 2,000 = 2,101 network calls. (Okay, so really, this is just an O(1) algorithm, although 1 ~ 2,101.)

Not that that isn't going to take a bit of time.

All that said, it's not execution time that concerns me in this context. Assume that the algorithm is already as optimal as possible, and that those 2,101 network calls are necessary. What rather concerns me here is how to organize the code in a way that's as maintainable as possible. As usual, when that's the main concern, I'll remind the reader to consider the example problem as a stand-in for a more complicated problem. Even Oleksii Holub's original code example is only some fifty-odd lines of code, which in itself hardly warrants all the hand-wringing we're currently subjecting it to.

Rather, what I'd like to address is the dynamic back-and-forth between pure function and impure action. Each of these thousands of out-of-process calls are non-deterministic. If you're tasked with maintaining or editing this algorithm, your brain will be taxed by all that unpredictable behaviour. Many subtle bugs lurk there.

The more we can pull the code towards pure functions the better, because referential transparency fits in your head.

So, to be explicit, I don't consider this kind of composition as an expanded Impureim Sandwich.

Standard combinators #

Is it possible to somehow improve, even just a little, on the above suggestion? Can we somehow make it look a little 'more functional'?

We could use some standard combinators, like monadic bind, traversals, and so on.

To be honest, for the specific song-recommendations example, the benefit is marginal at best, but doing it would still demonstrate a particular technique. We'd be able to get rid of the local mutation of recommendationCandidates, but that's about it.

Even so, refactoring to self-contained expressions makes other refactoring easier. As a counter-example, imagine that you'd like to extract the inner foreach loop in the above code example to a helper method.

private async Task CollectOtherUserTopScrobbles(
    List<Song> recommendationCandidates,
    IReadOnlyList<string> otherUserNames)
{
    foreach (var otherUserName in otherUserNames)
    {
        // Impure
        var otherScrobbles = await _songService
            .GetTopScrobblesAsync(otherUserName);
 
        // Pure
        var songsToRecommend = HandleOtherScrobbles(otherScrobbles);
 
        recommendationCandidates.AddRange(songsToRecommend);
    }
}

The call site would then look like this:

// Pure
var otherUserNames = HandleOtherListeners(otherListeners);
 
// Impure
await CollectOtherUserTopScrobbles(recommendationCandidates, otherUserNames);

In this specific example, such a refactoring isn't too difficult, but it's more complicated than it could be. Because of state mutation, we have to pass the object to be modified, in this case recommendationCandidates, along as a method argument. Here, there's only one, but if you have code where you change the state of two objects, you'd have to pass two extra parameters, and so on.

You've most likely worked in a real code base where you have tried to extract a helper method, only to discover that it's so incredibly tangled with the objects that it modifies that you need a long parameter list. What should have been a simplification is in danger of making everything worse.

On the other hand, self-contained expressions, even if, as in this case, they're non-deterministic, don't mutate state. In general, this tends to make it easier to extract subexpressions as helper methods, if only because they are less coupled to the rest of the code. They may required inputs as parameters, but at least you don't have to pass around objects to be modified.

Thus, the reason I find it worthwhile to include articles about this kind of refactoring is that, since it demonstrates how to refactor to a more expression-based style, you may be able to extrapolate to your own context. And who knows, you may encounter a context where more substantial improvements can be made by moving in this direction.

As usual in this article series, you'll see how to apply this technique in three different languages.

Song recommendations from C# combinators
Song recommendations from F# combinators
Song recommendations from Haskell combinators

All that said, it's important to underscore that I don't consider this proper functional architecture. Even the Haskell example is too non-deterministic to my tastes.

Conclusion #

Perhaps the most pragmatic approach to a problem like the song-recommendations example is to allow the impure actions and pure functions to interleave. I don't mean to insist that functional programming is the only way to make code maintainable. You can organize code according to other principles, and some of them may also leave you with a code base that can serve its mission well, now and in the future.

Another factor to take into account is the skill level of the team tasked with maintaining a code base. What are they comfortable with?

Not that I think you should settle for status quo. Progress can only be made if you push the envelop a little, but you can also come up with a code base so alien to your colleagues that they can't work with it at all.

I could easily imagine a team where the solution in the next three articles is the only style they'd be able to maintain.

Next: Song recommendations from C# combinators.

This blog is totally free, but if you like it, please consider supporting it.

Testing races with a slow Decorator

2025-06-02T08:03:00+00:00

Delaying database interactions for test purposes.

In chapter 12 in Code That Fits in Your Head, I cover a typical race condition and how to test for it. The book comes with a pedagogical explanation of the problem, including a diagram in the style of Designing Data-Intensive Applications. In short, the problem occurs when two or more clients are competing for the last remaining seats in a particular time slot.

In my two-day workshop based on the book, I also cover this scenario. The goal is to show how to write automated tests for this kind of non-deterministic behaviour. In the book, and in the workshop, my approach is to rely on the law of large numbers. An automated test attempts to trigger the race condition by trying 'enough' times. A timeout on the test assumes that if the test does not trigger the condition in the allotted time window, then the bug is addressed.

At one of my workshops, one participant told me of a more efficient and elegant way to test for this. I wish I could remember exactly at which workshop it was, and who the gentleman was, but alas, it escapes me.

Reproducing the condition #

How do you deterministically reproduce non-deterministic behaviour? The default answer is almost tautological. You can't, since it's non-deterministic.

The irony, however, is that in the workshop, I deterministically demonstrate the problem. The problem, in short, is that in order to decide whether or not to accept a reservation request, the system first reads data from its database, runs a fairly complex piece of decision logic, and finally writes the reservation to the database - if it decides to accept it, based on what it read. When competing processes vie for the last remaining seats, a race may occur where both (or all) base their decision on the same data, so they all come to the conclusion that they still have enough remaining capacity. Again, refer to the book, and its accompanying code base, for the details.

How do I demonstrate this condition in the workshop? I go into the Controller code and insert a temporary, human-scale delay after reading from the database, but before making the decision:

var reservations = await Repository.ReadReservations(r.At);
 
await Task.Delay(TimeSpan.FromSeconds(10));
 
if (!MaitreD.WillAccept(DateTime.Now, reservations, r))
    return NoTables500InternalServerError();

await Repository.Create(restaurant.Id, reservation);

Then I open two windows, from which I, within a couple of seconds of each other, try to make competing reservations. When the bug is present, both reservations are accepted, although, according to business rules, only one should be.

So that's how to deterministically demonstrate the problem. Just insert a long enough delay.

We can't, however, leave such delays in the production code, so I never even considered that this simple technique could be used for automated testing.

Slowing things down with a Decorator #

That's until my workshop participant told me his trick: Why don't you slow down the database interactions for test-purposes only? At first, I thought he had in mind some nasty compiler pragmas or environment hacks, but no. Why don't you use a Decorator to slow things down?

Indeed, why not?

Fortunately, all database interaction already takes place behind an IReservationsRepository interface. Adding a test-only, delaying Decorator is straightforward.

public sealed class SlowReservationsRepository : IReservationsRepository
{
    private readonly TimeSpan halfDelay;
 
    public SlowReservationsRepository(
        TimeSpan delay,
        IReservationsRepository inner)
    {
        Delay = delay;
        halfDelay = delay / 2;
        Inner = inner;
    }
 
    public TimeSpan Delay { get; }
    public IReservationsRepository Inner { get; }
 
    public async Task Create(int restaurantId, Reservation reservation)
    {
        await Task.Delay(halfDelay);
        await Inner.Create(restaurantId, reservation);
        await Task.Delay(halfDelay);
    }
 
    public async Task Delete(int restaurantId, Guid id)
    {
        await Task.Delay(halfDelay);
        await Inner.Delete(restaurantId, id);
        await Task.Delay(halfDelay);
    }
 
    public async Task<Reservation?> ReadReservation(
        int restaurantId,
        Guid id)
    {
        await Task.Delay(halfDelay);
        var result = await Inner.ReadReservation(restaurantId, id);
        await Task.Delay(halfDelay);
        return result;
    }
 
    public async Task<IReadOnlyCollection<Reservation>> ReadReservations(
        int restaurantId,
        DateTime min,
        DateTime max)
    {
        await Task.Delay(halfDelay);
        var result = await Inner.ReadReservations(restaurantId, min, max);
        await Task.Delay(halfDelay);
        return result;
    }
 
    public async Task Update(int restaurantId, Reservation reservation)
    {
        await Task.Delay(halfDelay);
        await Inner.Update(restaurantId, reservation);
        await Task.Delay(halfDelay);
    }
}

This one uniformly slows down all operations. I arbitrarily decided to split the Delay in half, in order to apply half of it before each action, and the other half after. Honestly, I didn't mull this over too much; I just thought that if I did it that way, I wouldn't have to speculate whether it would make a difference if the delay happened before or after the action in question.

Slowing down tests #

I added a few helper methods to the RestaurantService class that inherits from WebApplicationFactory<Startup>, mainly to enable decoration of the injected Repository. With those changes, I could now rewrite my test like this:

[Fact]
public async Task NoOverbookingRace()
{
    var date = DateTime.Now.Date.AddDays(1).AddHours(18.5);
    using var service = RestaurantService.CreateWith(repo =>
        new SlowReservationsRepository(
            TimeSpan.FromMilliseconds(100), repo));
 
    var task1 = service.PostReservation(new ReservationDtoBuilder()
        .WithDate(date)
        .WithQuantity(10)
        .Build());
    var task2 = service.PostReservation(new ReservationDtoBuilder()
        .WithDate(date)
        .WithQuantity(10)
        .Build());
    var actual = await Task.WhenAll(task1, task2);
 
    Assert.Single(
        actual,
        msg => msg.StatusCode == HttpStatusCode.InternalServerError);
    var ok = Assert.Single(actual, msg => msg.IsSuccessStatusCode);
    // Check that the reservation was actually created:
    var resp = await service.GetReservation(ok.Headers.Location);
    resp.EnsureSuccessStatusCode();
    var reservation = await resp.ParseJsonContent<ReservationDto>();
    Assert.Equal(10, reservation.Quantity);
}

The restaurant being tested has a maximum capacity of ten guests, so while it can accommodate either of the two requests, it can't make room for both.

For this example, I arbitrarily chose to configure the Decorator with a 100-millisecond delay. Every interaction with the database caused by that test gets a built-in 100-millisecond delay. 50 ms before each action, and 50 ms after.

The test starts both tasks, task1 and task2, without awaiting them. This allows them to run concurrently. After starting both tasks, the test awaits both of them with Task.WhenAll.

The assertion phase of the test is more involved than you may be used to see. The reason is that it deals with more than one possible failure scenario.

The first two assertions (Assert.Single) deal with the complete absence of transaction control in the application. In that case, both POST requests succeed, which they aren't supposed to. If the system works properly, it should accept one request and reject the other.

The rest of the assertions check that the successful reservation was actually created. That's another failure scenario.

The way I chose to deal with the race condition is standard in .NET. I used a TransactionScope. This is peculiar and, in my opinion, questionable API that enables you to start a transaction anywhere in your code, and then complete when you you're done. In the code base that accompanies Code That Fits in Your Head, it looks like this:

private async Task<ActionResult> TryCreate(Restaurant restaurant, Reservation reservation)
{
    using var scope = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
 
    var reservations = await Repository
        .ReadReservations(restaurant.Id, reservation.At)
        .ConfigureAwait(false);
    var now = Clock.GetCurrentDateTime();
    if (!restaurant.MaitreD.WillAccept(now, reservations, reservation))
        return NoTables500InternalServerError();
 
    await Repository.Create(restaurant.Id, reservation).ConfigureAwait(false);
 
    scope.Complete();
 
    return Reservation201Created(restaurant.Id, reservation);
}

Notice the scope.Complete() statement towards the end.

What happens if someone forgets to call scope.Complete()?

In that case, the thread that wins the race returns 201 Created, but when the scope goes out of scope, it's disposed of. If Complete() wasn't called, the transaction is rolled back, but the HTTP response code remains 201. Thus, the two assertions that inspect the response codes aren't enough to catch this particular kind of defect.

Instead, the test subsequently queries the System Under Test to verify that the resource was, indeed, created.

Wait time #

The original test shown in the book times out after 30 seconds if it can't produce the race condition. Compared to that, the refactored test shown here is fast. Even so, we may fear that it spends too much time doing nothing. How much time might that be?

The TryCreate helper method shown above is the only part of a POST request that interacts with the Repository. As you can see, it calls it twice: Once to read, and once to write, if it decides to do that. With a 100 ms delay, that's 200 ms.

And while the test issues two POST requests, they run in parallel. That's the whole point. It means that they'll still run in approximately 200 ms.

The test then issues a GET request to verify that the resource was created. That triggers another database read, which takes another 100 ms.

That's 300 ms in all. Given that these tests are part of a second-level test suite, and not your default developer test suite, that may be good enough.

Still, that's the POST scenario. I also wrote a test that checks for a race condition when doing PUT requests, and it performs more work.

[Fact]
public async Task NoOverbookingPutRace()
{
    var date = DateTime.Now.Date.AddDays(1).AddHours(18.5);
    using var service = RestaurantService.CreateWith(repo =>
        new SlowReservationsRepository(
            TimeSpan.FromMilliseconds(100), repo));
    var (address1, dto1) = await service.PostReservation(date, 4);
    var (address2, dto2) = await service.PostReservation(date, 4);
 
    dto1.Quantity += 2;
    dto2.Quantity += 2;
    var task1 = service.PutReservation(address1, dto1);
    var task2 = service.PutReservation(address2, dto2);
    var actual = await Task.WhenAll(task1, task2);
 
    Assert.Single(
        actual,
        msg => msg.StatusCode == HttpStatusCode.InternalServerError);
    var ok = Assert.Single(actual, msg => msg.IsSuccessStatusCode);
    // Check that the reservations now have consistent values:
    var client = service.CreateClient();
    var resp1 = await client.GetAsync(address1);
    var resp2 = await client.GetAsync(address2);
    resp1.EnsureSuccessStatusCode();
    resp2.EnsureSuccessStatusCode();
    var body1 = await resp1.ParseJsonContent<ReservationDto>();
    var body2 = await resp2.ParseJsonContent<ReservationDto>();
    Assert.Single(new[] { body1.Quantity, body2.Quantity }, 6);
    Assert.Single(new[] { body1.Quantity, body2.Quantity }, 4);
}

This test first has to create two reservations in a nice, sequential manner. Then it attempts to perform two concurrent updates, and finally it tests that all is as it should be: That both reservations still exist, but only one had its Quantity increased to 6.

This test first makes two POST requests, nicely serialized so as to avoid a race condition. That's 400 ms.

Each PUT request triggers three Repository actions, for a total of 300 ms (since they run in parallel).

Finally, the test issues two GET requests for verification, for another 2 times 100 ms. Now that I'm writing this, I realize that I could also have parallelized these two calls, but as you read on, you'll see why that's not necessary.

In all, this test waits for 900 ms. That's almost a second.

Can we improve on that?

Decreasing unnecessary wait time #

In the latter example, the 300 ms wait time for the parallel PUT requests are necessary to trigger the race condition, but the rest of the test's actions don't need slowing down. We can remove the unwarranted wait time by setting up two services: One slow, and one normal.

To be honest, I could have modelled this by just instantiating two service objects, but why do something as pedestrian as that when you can turn RestaurantService into a monomorphic functor?

internal RestaurantService Select(Func<IReservationsRepository, IReservationsRepository> selector)
{
    if (selector is null)
        throw new ArgumentNullException(nameof(selector));
 
    return new RestaurantService(selector(repository));
}

Granted, this is verging on the frivolous, but when writing code for a blog post, I think I'm allowed a little fun.

In any case, this now enables me to rewrite the test like this:

[Fact]
public async Task NoOverbookingRace()
{
    var date = DateTime.Now.Date.AddDays(1).AddHours(18.5);
    using var service = new RestaurantService();
    using var slowService =
        from repo in service
        select new SlowReservationsRepository(TimeSpan.FromMilliseconds(100), repo);
 
    var task1 = slowService.PostReservation(new ReservationDtoBuilder()
        .WithDate(date)
        .WithQuantity(10)
        .Build());
    var task2 = slowService.PostReservation(new ReservationDtoBuilder()
        .WithDate(date)
        .WithQuantity(10)
        .Build());
    var actual = await Task.WhenAll(task1, task2);
 
    Assert.Single(
        actual,
        msg => msg.StatusCode == HttpStatusCode.InternalServerError);
    var ok = Assert.Single(actual, msg => msg.IsSuccessStatusCode);
    // Check that the reservation was actually created:
    var resp = await service.GetReservation(ok.Headers.Location);
    resp.EnsureSuccessStatusCode();
    var reservation = await resp.ParseJsonContent<ReservationDto>();
    Assert.Equal(10, reservation.Quantity);
}

Notice how only the parallel execution of task1 and task2 run on the slow system. The rest runs as fast as it can. It's as if the client was hitting two different servers that just happen to connect to the same database. Now the test only waits for the 200 ms described above. The PUT test, likewise, only idles for 300 ms instead of 900 ms.

Near-deterministic tests #

Does this deterministically reproduce the race condition? In practice, it may move us close enough, but theoretically the race is still on. With the increased wait time, it's now much more unlikely that the race condition does not happen, but it still could.

Imagine that task1 queries the Repository. Just as it's received a response, but before task2 starts its query, execution is paused, perhaps because of garbage collection. Once the program resumes, task1 runs to completion before task2 reads from the database. In that case, task2 ends up making the right decision, rejecting the reservation. Even if no transaction control were in place.

This may not be a particularly realistic scenario, but I suppose it could happen if the computer is stressed in general. Even so, you might decide to make such false-negative scenarios even more unlikely by increasing the delay time. Of course, the downside is that tests take even longer to run.

Another potential problem is that there's no guarantee that task1 and task2 run in parallel. Even if the test doesn't await any of the tasks, both start executing immediately. There's an (unlikely) chance that task1 completes before task2 starts. Again, I don't consider this likely, but I suppose it could happen because of thread starvation, generation 2 garbage collection, the disk running full, etc. The point is that the test shown here is still playing the odds, even if the odds are really good.

Conclusion #

Instead of running a scenario 'enough' times that reproducing a race condition is likely, you can increase the odds to near-certainty by slowing down the race. In this example, the race involves a database, but you might also encounter race conditions internally in multi-threaded code. I'm not insisting that the technique described in this article applies universally, but if you can slow down certain interactions in the right way, you may be able reproduce problems as automated tests.

If you've ever troubleshot a race condition, you've probably tried inserting sleeps into the code in various places to understand the problem. As described above, a single, strategically-placed Task.Delay may be all you need to reproduce a problem. What escaped me for a long time, however, was that I didn't realize that I could cleanly insert such pauses into production code. Until my workshop participant suggested using a Decorator.

A delaying Decorator slows interactions with the database down sufficiently to reproduce the race condition as an automated test.

Comments

Nick #

Unfortunately I haven't read your book, so perhaps this question is already answered there: how would a transaction prevent a race condition? I would have expected to solve it using optimistic concurrency control.

My other point is regarding artificial delays. How is that deterministic? If you're injecting decorators to take advantage of seams in the code, and you know the number of operations, then you don't need delays at all. To highlight the race condition you need to ensure two requests have read the same data concurrently. After that you can allow the writes, expecting one to succeed and the other to fail. You could use a synchronization primitive, such as CountdownEvent. Initialize it with a count of 2 to represent your concurrent requests. Repository.ReadReservations() would call Signal() to decrement it after fetching from the database. Repository.Create() would call Wait() to ensure both reads had been made before writing to the database.

2025-06-15 17:58 UTC

Mark Seemann #

Nick, thank you for writing. Regarding the transaction, as the above TryCreate snippet shows, it wraps both the read and the write operation, and since the default isolation level is Serializable,

"no new data can be added during the transaction."

As to the second point, I don't think I've claimed that inducing an artificial delay as a Decorator is deterministic. To the contrary, I explicitly discussed how this is only near-deterministic.

That said, you're correct that there's an even better way, and I have another article queued up that takes an approach similar to the one that you describe.

2025-06-18 07:01 UTC

This blog is totally free, but if you like it, please consider supporting it.

Song recommendations as a Haskell Impureim Sandwich

2025-05-26T07:15:00+00:00

A pure function on potentially massive data.

This article is part of a larger article series called Alternative ways to design with functional programming. As the title suggests, these articles discuss various ways to apply functional-programming principles to a particular problem. All the articles engage with the same problem. In short, the task is to calculate song recommendations for a user, based on massive data sets. Earlier articles in this series give you detailed explanation of the problem.

In the previous article, you saw how to refactor the 'base' F# code base to a pure function. In this article, you'll see the same refactoring applied to the 'base' Haskell code base shown in Porting song recommendations to Haskell.

The Git branch discussed in this article is the pure-function branch in the Haskell Git repository.

Collecting all the data #

Like in the previous articles, we may start by adding two more methods to the SongService type class, which will enable us to enumerate all songs and all users. The full type class, with all four methods, then looks like this:

class SongService a where
  getAllSongs :: a -> IO [Song]
  getAllUsers :: a -> IO [User]
  getTopListeners :: a -> Int -> IO [User]
  getTopScrobbles :: a -> String -> IO [Scrobble]

If you compare with the type class definition shown in the article Porting song recommendations to Haskell, you'll see that getAllSongs and getAllUsers are the new methods.

They enable you to collect all top listeners, and all top scrobbles, even though it may take some time. To gather all the top listeners, we may add this collectAllTopListeners action:

collectAllTopListeners srvc = do
  songs <- getAllSongs srvc
  listeners <- newIORef Map.empty
  forM_ songs $ \song -> do
    topListeners <- getTopListeners srvc $ songId song
    modifyIORef listeners (Map.insert (songId song) topListeners)
  readIORef listeners

Likewise, you can amass all the top scrobbles with a similar action:

collectAllTopScrobbles srvc = do
  users <- getAllUsers srvc
  scrobbles <- newIORef Map.empty
  forM_ users $ \user -> do
    topScrobbles <- getTopScrobbles srvc $ userName user
    modifyIORef scrobbles (Map.insert (userName user) topScrobbles)
  readIORef scrobbles

As you may have noticed, they're so similar that, had there been more than two, we might consider extracting the similar parts to a reusable operation.

In both cases, we start with the action that enables us to enumerate all the resources (songs or scrobbles) that we're interested in. For each of these, we then invoke the action to get the 'top' resources for that song or scrobble. There's a massive n+1 problem here, but you could conceivably parallelize all these queries, as they're independent. Still, it's bound to take much time, possibly hours.

You may be wondering why I chose to use IORef values for both actions, instead of more idiomatic combinator-based expressions. Indeed, that is what I would usually do, but in this case, these two actions are heavily IO-bound already, and it makes the Haskell code more similar to the F# code. Not that that is normally a goal, but here I thought it might help you, the reader, to compare the different code bases.

All the data is kept in a Map per action, so two massive maps in all. Once these two actions return, we're done with the read phase of the Recawr Sandwich.

Referentially transparent function with local mutation #

As a first step, we may wish to turn the getRecommendations action into a referentially transparent function. If you look through the commits in the Git repository, you can see that I actually did this through a series of micro-commits, but here I only present a more coarse-grained version of the changes I made.

In this version, I've removed the srvc (SongService) parameter, and instead added the two maps topScrobbles and topListeners.

getRecommendations :: Map String [Scrobble] -> Map Int [User] -> String -> IO [Song]
getRecommendations topScrobbles topListeners un = do
  -- 1. Get user's own top scrobbles
  -- 2. Get other users who listened to the same songs
  -- 3. Get top scrobbles of those users
  -- 4. Aggregate the songs into recommendations

  let scrobbles = Map.findWithDefault [] un topScrobbles
  let scrobblesSnapshot = take 100 $ sortOn (Down . scrobbleCount) scrobbles
 
  recommendationCandidates <- newIORef []
  forM_ scrobblesSnapshot $ \scrobble -> do
    let otherListeners =
          Map.findWithDefault [] (songId $ scrobbledSong scrobble) topListeners
    let otherListenersSnapshot =
          take 20 $
          sortOn (Down . userScrobbleCount) $
          filter ((10_000 <=) . userScrobbleCount) otherListeners
 
    forM_ otherListenersSnapshot $ \otherListener -> do
      let otherScrobbles =
            Map.findWithDefault [] (userName otherListener) topScrobbles
      let otherScrobblesSnapshot =
            take 10 $
            sortOn (Down . songRating . scrobbledSong) $
            filter (songHasVerifiedArtist . scrobbledSong) otherScrobbles
 
      forM_ otherScrobblesSnapshot $ \otherScrobble -> do
        let song = scrobbledSong otherScrobble
        modifyIORef recommendationCandidates (song :)
 
  recommendations <- readIORef recommendationCandidates
  return $ take 200 $ sortOn (Down . songRating) recommendations

You've probably noticed that this action still looks impure, since it returns IO [Song]. Even so, it's referentially transparent, since it's fully deterministic and without side effects.

The way I refactored the action, this order of changes was what made most sense to me. Getting rid of the SongService parameter was more important to me than getting rid of the IO wrapper.

In any case, this is only an interim state towards a more idiomatic pure Haskell function.

A single expression #

A curious property of expression-based languages is that you can conceivably write functions in 'one line of code'. Granted, it would often be a terribly wide line, not at all readable, a beast to maintain, and often with poor performance, so not something you'd want to alway do.

In this case, however, we can do that, although in order to stay within an 80x24 box, we break the expression over multiple lines.

getRecommendations :: Map String [Scrobble] -> Map Int [User] -> String -> [Song]
getRecommendations topScrobbles topListeners un =
  -- 1. Get user's own top scrobbles
  -- 2. Get other users who listened to the same songs
  -- 3. Get top scrobbles of those users
  -- 4. Aggregate the songs into recommendations

  take 200 $
  sortOn (Down . songRating) $
  fmap scrobbledSong $
  (\otherListener -> take 10 $
      sortOn (Down . songRating . scrobbledSong) $
      filter (songHasVerifiedArtist . scrobbledSong) $
      Map.findWithDefault [] (userName otherListener) topScrobbles) =<<
  (\scrobble -> take 20 $
    sortOn (Down . userScrobbleCount) $
    filter ((10_000 <=) . userScrobbleCount) $
    Map.findWithDefault [] (songId $ scrobbledSong scrobble) topListeners) =<<
  take 100
  (sortOn (Down . scrobbleCount) $ Map.findWithDefault [] un topScrobbles)

This snapshot also got rid of the IORef value, and IO in general. The function is still referentially transparent, but now Haskell can also see that.

Even so, this looks dense and confusing. It doesn't help that Haskell should usually be read right-to-left, and bottom-to-top. Is it possible to improve the readability of this function?

Composition from smaller functions #

To improve readability and maintainability, we may now extract helper functions. The first one easily suggests itself.

getUsersOwnTopScrobbles :: Ord k => Map k [Scrobble] -> k -> [Scrobble]
getUsersOwnTopScrobbles topScrobbles un =
  take 100 $
  sortOn (Down . scrobbleCount) $ Map.findWithDefault [] un topScrobbles

Each of the subexpressions in the above code listing are candidates for the same kind of treatment, like this one:

getOtherUsersWhoListenedToTheSameSongs :: Map Int [User] -> Scrobble -> [User]
getOtherUsersWhoListenedToTheSameSongs topListeners scrobble =
  take 20 $
  sortOn (Down . userScrobbleCount) $
  filter ((10_000 <=) . userScrobbleCount) $
  Map.findWithDefault [] (songId $ scrobbledSong scrobble) topListeners

You can't see it from the code listings themselves, but the module doesn't export these functions. They remain implementation details.

With a few more helper functions, you can now implement the getRecommendations function by composing the helpers.

getRecommendations :: Map String [Scrobble] -> Map Int [User] -> String -> [Song]
getRecommendations topScrobbles topListeners un =
  -- 1. Get user's own top scrobbles
  -- 2. Get other users who listened to the same songs
  -- 3. Get top scrobbles of those users
  -- 4. Aggregate the songs into recommendations

  aggregateSongsIntoRecommendations $
  getTopSongsOfOtherUser topScrobbles =<<
  getOtherUsersWhoListenedToTheSameSongs topListeners =<<
  getUsersOwnTopScrobbles topScrobbles un

Since Haskell by default composes from right to left, when you break such a composition over multiple lines (in order to stay within a 80x24 box), it should be read bottom-up.

You can remedy this situation by importing the & operator from Data.Function:

getRecommendations :: Map String [Scrobble] -> Map Int [User] -> String -> [Song]
getRecommendations topScrobbles topListeners un = 
  getUsersOwnTopScrobbles topScrobbles un >>=
  getOtherUsersWhoListenedToTheSameSongs topListeners >>=
  getTopSongsOfOtherUser topScrobbles &
  aggregateSongsIntoRecommendations

Notice that I've named each of the helper functions after the code comments that accompanied the previous incarnations of this function. If we consider code comments apologies for not properly organizing the code, we've now managed to structure it in such a way that those apologies are no longer required.

Conclusion #

If you accept the (perhaps preposterous) assumption that it's possible to fit the required data in persistent data structures, refactoring the recommendation algorithm to a pure function isn't that difficult. That's the pure part of a Recawr Sandwich. While I haven't shown the actual sandwich here, it's quite straightforward. You can find it in the tests in the Haskell Git repository. Also, once you've moved all the data collection to the boundary of the application, you may no longer need the SongService type class.

I find the final incarnation of the code shown here to be quite attractive. While I've kept the helper functions private to the module, it's always an option to export them if you find that warranted. This could improve testability of the overall code base, albeit at the risk of increasing the surface area of the API that you have to maintain and secure.

There are always trade-offs to be considered. Even if you, eventually, find that for this particular example, the input data size is just too big to make this alternative viable, there are, in my experience, many other situations when this kind of architecture is a good solution. Even if the input size is a decent amount of megabytes, the simplification offered by an Impureim Sandwich may trump the larger memory footprint. As always, if you're concerned about performance, measure it.

This article concludes the overview of using an Recawr Sandwich to address the problem. Since it's, admittedly, a bit of a stretch to imagine running a program that uses terabytes (or more) of memory, we now turn to alternative architectures.

Next: Song recommendations from combinators.

This blog is totally free, but if you like it, please consider supporting it.

Song recommendations as an F# Impureim Sandwich

2025-05-19T08:06:00+00:00

A pure function on potentially massive data.

This article is part of a larger article series titled Alternative ways to design with functional programming. In the previous article, you saw an example of applying the Impureim Sandwich pattern to the problem at hand: A song recommendation engine that sifts through much historical data.

As already covered in Song recommendations as an Impureim Sandwich, the drawback, if you will, of a Recawr Sandwich is that you need to collect all data from impure sources before you can pass it to a pure function. It may happen that you need so much data that this strategy becomes untenable. This may be the case here, but surprisingly often, what strikes us humans as being vast amounts are peanuts for computers.

So even if you don't find this particular example realistic, I'll forge ahead and show how to apply the Recawr Sandwich pattern to this problem. This is essentially a port to F# of the C# code from the previous article. If you rather want to see some more realistic architectures to deal with the overall problem, you can always go back to the table of contents in the first article of the series.

In this article, I'm working with the fsharp-pure-function branch of the Git repository.

Collecting all the data #

Like in the previous article, we may start by adding two more members to the SongService interface, which will enable us to enumerate all songs and all users. The full interface, with all four methods, then looks like this:

type SongService =
    abstract GetAllSongs : unit -> Task<IEnumerable<Song>>
    abstract GetAllUsers : unit -> Task<IEnumerable<User>>
    abstract GetTopListenersAsync : songId : int -> Task<IReadOnlyCollection<User>>
    abstract GetTopScrobblesAsync : userName : string -> Task<IReadOnlyCollection<Scrobble>>

If you compare with the interface definition shown in the article Porting song recommendations to F#, you'll see that GetAllSongs and GetAllUsers are the new methods.

They enable you to collect all top listeners, and all top scrobbles, even though it may take some time. To gather all the top listeners, we may add this collectAllTopListeners action:

let collectAllTopListeners (songService : SongService) = task {
    let d = Dictionary<int, IReadOnlyCollection<User>> ()
    let! songs = songService.GetAllSongs ()
    do! songs |> TaskSeq.iter (fun s -> task {
        let! topListeners = songService.GetTopListenersAsync s.Id
        d.Add (s.Id, topListeners) } )
    return d :> IReadOnlyDictionary<_, _> }

Likewise, you can amass all the top scrobbles with a similar action:

let collectAllTopScrobbles (songService : SongService) = task {
    let d = Dictionary<string, IReadOnlyCollection<Scrobble>> ()
    let! users = songService.GetAllUsers ()
    do! users |> TaskSeq.iter (fun u -> task {
        let! topScrobbles = songService.GetTopScrobblesAsync u.UserName
        d.Add (u.UserName, topScrobbles) } )
    return d :> IReadOnlyDictionary<_, _> }

As you may have noticed, they're so similar that, had there been more than two, we might consider extracting the similar parts to a reusable operation.

All the data is kept in a dictionary per action, so two massive dictionaries in all. Once these two actions return, we're done with the read phase of the Recawr Sandwich.

Traversals #

You may have wondered about the above TaskSeq.iter action. That's not part of the standard library. What is it, and where does it come from?

It's a specialized traversal, designed to make asynchronous Commands more streamlined.

let iter f xs = task {
    let! units = traverse f xs
    Seq.iter id units }

If you've ever wondered why the identity function (id) is useful, here's an example. In the first line of code, units is a unit seq value; i.e. a sequence of unit values. To make TaskSeq.iter as easy to use as possible, it should turn that multitude of unit values into a single unit value. There's more than one way to do that, but I found that using Seq.iter was about the most succinct option I could think of. Be that as it may, Seq.iter requires as an argument a function that returns unit, and since we already have unit values, id does the job.

The iter action uses the TaskSeq module's traverse function, which is defined like this:

let traverse f xs =
    let go acc x = task {
        let! x' = x
        let! acc' = acc
        return Seq.append acc' [x'] }
    xs |> Seq.map f |> Seq.fold go (task { return [] })

The type of traverse is ('a -> #Task<'c>) -> 'a seq -> Task<'c seq>; that is, it applies an asynchronous action to each of a sequence of 'a values, and returns an asynchronous workflow that contains a sequence of 'c values.

Dictionary lookups #

In .NET, queries that may fail are idiomatically modelled with methods that take out parameters. This is also true for dictionary lookups. Since that kind of design doesn't compose well, it's useful to add a little helper function that instead may return an empty value. While you'd generally do that by returning an option value, in this case, an empty collection is more appropriate.

let findOrEmpty key (d : IReadOnlyDictionary<_, IReadOnlyCollection<_>>) =
   match d.TryGetValue key with
   | true, v -> v
   | _       -> List.empty

You may have noticed that I also added a similar helper function in the C# example, although there I called it GetOrEmpty.

Pure function with local mutation #

As a first step, we may wish to turn the GetRecommendationsAsync method into a pure function. If you look through the commits in the Git repository, you can see that I actually did this through a series of micro-commits, but here I only present a more coarse-grained version of the changes I made.

Instead of a method on a class, we now have a self-contained function that takes, as arguments, two dictionaries, but no SongService dependency.

let getRecommendations topScrobbles topListeners userName =
    // 1. Get user's own top scrobbles
    // 2. Get other users who listened to the same songs
    // 3. Get top scrobbles of those users
    // 4. Aggregate the songs into recommendations
 
    let scrobbles = topScrobbles |> Dict.findOrEmpty userName
    let scrobblesSnapshot =
        scrobbles
        |> Seq.sortByDescending (fun s -> s.ScrobbleCount)
        |> Seq.truncate 100
        |> Seq.toList
 
    let recommendationCandidates = ResizeArray ()
    for scrobble in scrobblesSnapshot do
        let otherListeners =
            topListeners |> Dict.findOrEmpty scrobble.Song.Id
        let otherListenersSnapshot =
            otherListeners
            |> Seq.filter (fun u -> u.TotalScrobbleCount >= 10_000)
            |> Seq.sortByDescending (fun u -> u.TotalScrobbleCount)
            |> Seq.truncate 20
            |> Seq.toList
 
        for otherListener in otherListenersSnapshot do
            let otherScrobbles =
                topScrobbles |> Dict.findOrEmpty otherListener.UserName
            let otherScrobblesSnapshot =
                otherScrobbles
                |> Seq.filter (fun s -> s.Song.IsVerifiedArtist)
                |> Seq.sortByDescending (fun s -> s.Song.Rating)
                |> Seq.truncate 10
                |> Seq.toList
 
            otherScrobblesSnapshot
            |> List.map (fun s -> s.Song)
            |> recommendationCandidates.AddRange
 
    let recommendations =
        recommendationCandidates
        |> Seq.sortByDescending (fun s -> s.Rating)
        |> Seq.truncate 200
        |> Seq.toList
        :> IReadOnlyCollection<_>
 
    recommendations

Since this is now a pure function, there's no need to run as an asynchronous workflow. The function no longer returns a Task, and I've also dispensed with the Async suffix.

The implementation still has imperative remnants. It initializes an empty ResizeArray (AKA List<T>), and loops through nested loops to repeatedly call AddRange.

Even though the function contains local state mutation, none of it escapes the function's scope. The function is referentially transparent because it always returns the same result when given the same input, and it has no side effects.

You might still wish that it was 'more functional', which is certainly possible.

A single expression #

In this case, however, we can do that, although in order to stay within an 80x24 box, we break the expression over multiple lines.

let getRecommendations topScrobbles topListeners userName =
    // 1. Get user's own top scrobbles
    // 2. Get other users who listened to the same songs
    // 3. Get top scrobbles of those users
    // 4. Aggregate the songs into recommendations
 
    topScrobbles
    |> Dict.findOrEmpty userName
    |> Seq.sortByDescending (fun s -> s.ScrobbleCount)
    |> Seq.truncate 100
    |> Seq.collect (fun scrobble ->
        topListeners
        |> Dict.findOrEmpty scrobble.Song.Id
        |> Seq.filter (fun u -> u.TotalScrobbleCount >= 10_000)
        |> Seq.sortByDescending (fun u -> u.TotalScrobbleCount)
        |> Seq.truncate 20
        |> Seq.collect (fun otherListener ->
            topScrobbles
            |> Dict.findOrEmpty otherListener.UserName
            |> Seq.filter (fun s -> s.Song.IsVerifiedArtist)
            |> Seq.sortByDescending (fun s -> s.Song.Rating)
            |> Seq.truncate 10
            |> Seq.map (fun s -> s.Song)))
    |> Seq.sortByDescending (fun s -> s.Rating)
    |> Seq.truncate 200
    |> Seq.toList
    :> IReadOnlyCollection<_>

To be honest, the four lines of comments push the function definition over the edge of 24 lines of code, but without them, this variation actually does fit an 80x24 box. Even so, I'm not arguing that this is the best possible way to organize and lay out this function.

You may rightly complain that it's too dense. Perhaps you're also concerned about the arrow code tendency.

I'm not disagreeing, but at least this represents a milestone where the function is not only referentially transparent, but also implemented without local mutation. Not that that really should be the most important criterion, but once you have an entirely expression-based implementation, it's usually easier to break it up into smaller building blocks.

Composition from smaller functions #

To improve readability and maintainability, we may now extract helper functions. The first one easily suggests itself.

let private getUsersOwnTopScrobbles topScrobbles userName =
    topScrobbles
    |> Dict.findOrEmpty userName
    |> Seq.sortByDescending (fun s -> s.ScrobbleCount)
    |> Seq.truncate 100

Each of the subexpressions in the above code listing are candidates for the same kind of treatment, like this one:

let private getOtherUsersWhoListenedToTheSameSongs topListeners scrobble =
    topListeners
    |> Dict.findOrEmpty scrobble.Song.Id
    |> Seq.filter (fun u -> u.TotalScrobbleCount >= 10_000)
    |> Seq.sortByDescending (fun u -> u.TotalScrobbleCount)
    |> Seq.truncate 20

Notice that these helper methods are marked private so that they remain implementation details within the module that exports the getRecommendations function.

With a few more helper functions, you can now implement the getRecommendations function by composing the helpers.

let getRecommendations topScrobbles topListeners =
    getUsersOwnTopScrobbles topScrobbles
    >> Seq.collect (
        getOtherUsersWhoListenedToTheSameSongs topListeners
        >> Seq.collect (getTopSongsOfOtherUser topScrobbles))
    >> aggregateSongsIntoRecommendations

Conclusion #

If you accept the (perhaps preposterous) assumption that it's possible to fit the required data in persistent data structures, refactoring the recommendation algorithm to a pure function isn't that difficult. That's the pure part of a Recawr Sandwich. While I haven't shown the actual sandwich here, it's identical to the example shown in Song recommendations as a C# Impureim Sandwich.

I find the final incarnation of the code shown here to be quite attractive. While I've kept the helper functions private, it's always an option to promote them to public functions if you find that warranted. This could improve testability of the overall code base, albeit at the risk of increasing the surface area of the API that you have to maintain and secure.

Before we turn to alternative architectures, we'll survey how this variation looks in Haskell. As is generally the case in this article series, if you don't care about Haskell, you can always go back to the table of contents in the first article in the series and instead navigate to the next article that interests you.

Next: Song recommendations as a Haskell Impureim Sandwich.

This blog is totally free, but if you like it, please consider supporting it.

Song recommendations proof-of-concept memory measurements

2025-05-12T07:52:00+00:00

An attempt at measurement, and some results.

This is an article in a larger series about functional programming design alternatives, and a direct continuation of the previous article. The question lingering after the Impureim Sandwich proof of concept is: What are the memory requirements of front-loading all users, songs, and scrobbles?

One can guess, as I've already done, but it's safer to measure. In this article, you'll find a description of the experiment, as well as some results.

Test program #

Since I don't measure application memory profiles that often, I searched the web to learn how, and found this answer by Jon Skeet. That's a reputable source, so I'm assuming that the described approach is appropriate.

I added a new command-line executable to the source code and made this the entry point:

const int size = 100_000;
 
static async Task Main()
{
    var before = GC.GetTotalMemory(true);
 
    var (listeners, scrobbles) = await Populate();
 
    var after = GC.GetTotalMemory(true);
 
    var diff = after - before;
 
    Console.WriteLine("Total memory: {0:N0}B.", diff);
 
    GC.KeepAlive(listeners);
    GC.KeepAlive(scrobbles);
}

listeners and scrobbles are two dictionaries of data, as described in the previous article. Together, they contain the data that we measure. Both are populated by this method:

private static async Task<(
    IReadOnlyDictionary<int, IReadOnlyCollection<User>>,
    IReadOnlyDictionary<string, IReadOnlyCollection<Scrobble>>)> Populate()
{
    var service = PopulateService();
    var listeners = await service.CollectAllTopListeners();
    var scrobbles = await service.CollectAllTopScrobbles();
    return (listeners, scrobbles);
}

The service variable is a FakeSongService object populated with randomly generated data. The CollectAllTopListeners and CollectAllTopScrobbles methods are the same as described in the previous article. When the method returns the two dictionaries, the service object goes out of scope and can be garbage-collected. When the program measures the memory load, it measures the size of the two dictionaries, but not service.

I've reused the FsCheck generators for random data generation:

private static SongService PopulateService()
{
    var users = RecommendationsProviderTests.Gen.UserName.Sample(size);
    var songs = RecommendationsProviderTests.Gen.Song.Sample(size);
    var scrobbleGen =
        from user in Gen.Elements(users)
        from song in Gen.Elements(songs)
        from scrobbleCount in Gen.Choose(1, 10)
        select (user, song, scrobbleCount);
 
    var service = new FakeSongService();
    foreach (var (user, song, scrobbleCount) in scrobbleGen.Sample(size))
        service.Scrobble(user, song, scrobbleCount);
 
    return service;
}

A Gen<T> object comes with a Sample method you can use to request a specified number of randomly generated values.

In order to keep the code simple, I used the size value for both the number of songs, number of users, and number of scrobbles. This probably creates too few scrobbles; a topic that requires further discussion later.

Measurements #

I ran the above program with various size values; 100,000 up to 1,000,000 in 100,000 increments, and from there up to 1,000,000 (one million) in 500,000 increments. At the higher values, it took a good ten minutes to run the program.

As the chart indicates, I ran the program with various data representations (more on that below). While there are four distinct data series, they overlap pairwise so perfectly that the graph doesn't show the difference. The record and struct record data series are so identical that you can't visibly see the difference. The same is true for the bitmasked class and the bitmasked struct data series, which only go to size 500,000.

There are small, but noticeable jumps from 4,500,000 to 5,000,000 and again from 8,500,000 to 9,000,000, but the overall impression is that the relationship is linear. It seems safe to conclude that the solution scales linearly with the data size.

The number of bytes per size is almost constant and averages to 178 bytes. How does that compare to my previous memory size estimates? There, I estimated a song and a scrobble to require 8 bytes each, and a user less than 32 bytes. The way the above simulation runs, it generates one song, one user, and one scrobble per size unit. Therefore, I'd expect the average memory cost per experiment size to be around 8 + 8 + 32 = 48, plus some overhead from the dictionaries.

Given that the number I measure is 178, that's 130 bytes of overhead. Honestly, that's more than I expected. I expect a dictionary to maintain an array of keys, perhaps hashed with a bucket per hash value. Perhaps, had I picked another data structure than a plain old Dictionary, it's possible that the overhead would be different. Or perhaps I just don't understand .NET's memory model, when push comes to shove.

I then tried to split the single size parameter into three that would control the number of users, songs, and scrobbles independently. Setting both the number of users and songs to ten million, I then ran a series of simulations with increasing scrobble counts.

The relationship still looks linear, and at a hundred million scrobbles (and ten million users and ten million songs), the simulation uses 8.3 GB of memory.

I admit that I'm still a bit puzzled by the measurements, compared to my previous estimates. I would have expected those sizes to require about 1,2 GB, plus overhead, so the actual measurements are off by a factor of 7. Not quite an order of magnitude, but close.

Realism #

How useful are these measurements? How realistic are the experiments' parameters? Most streaming audio services report having catalogues with around 100 million songs, which is ten times more than what I've measured here. Such services may also have significantly more users than ten million, but what is going to make or break this architecture option (keeping all data in memory) is how many scrobbles users have, and how many times they listen to each song.

Even if we naively still believe that a scrobble only takes up 8 bytes, it doesn't follow automatically that 100 scrobbles take up 800 bytes. It depends on how many repeats there are. Recall how we may model a scrobble:

public sealed record Scrobble(Song Song, int ScrobbleCount);

If a user listens to the same song ten times, we don't have to create ten Scrobble objects; we can create one and set the ScrobbleCount to 10.

The memory requirement to store users' scrobbles depend on the average listening pattern. Even with millions of users, we may be able to store scrobbles in memory if users listen to relatively few songs. On the other hand, if they only listen to each song once, it's probably not going to fit in memory.

Still, we're dangerously close to the edge of what we can fit in memory. Shouldn't I just declare bankruptcy on that idea and move on?

The purpose of this overall article series is to demonstrate alternatives to the Impureim Sandwich pattern, so I'm ultimately going to do exactly that: Move on.

But not yet.

Sharding #

Some applications are truly global in nature, and when that's the case, keeping everything in memory may not be 'web scale'.

Still, I've seen more than one international company treat geographic areas as separate entities. This may be for legal reasons, or other business concerns that are unrelated to technology constraints.

As a programmer, you may think that a song recommendations service ought to be truly global. After all, more data produces more accurate results, right?

Your business owners may not think so. They may be concerned that regional music tastes may 'bleed over' market boundaries, and that this could ultimately scare customers away.

Even if you can technically prove that this isn't a relevant concern, because you can write an algorithm that takes this into account, you may get a direct order that, say, Southeast Asian scrobbles may not be used in North America, or vice verse.

It's worth investigating whether such business or legal constraints are already in place, because if they are, this may mean that you can shard the data, and that each shard still fits in memory.

You may still think that I'm trying to salvage a bad idea, but that's not my agenda. I discuss these topics because in my experience, many programmers don't consider them. Understanding the wider context of a problem may suggest solutions that you might otherwise dismiss.

But what if the business constraints change in the future? Shouldn't we be ready for that?

Yes and no. You should consider how such changes would impact the architecture. Then you discuss the advantages and disadvantages with other stakeholders.

Keep in mind that the reason to consider an Impureim Sandwich is because it's simple and easy to implement and maintain. Other alternatives may be more 'scalable', but also riskier to implement. You should involve other stakeholders in such decisions.

Song representations #

The first of the above charts draws graphs for four data series:

struct record
record
bitmasked struct
bitmasked class

These measure four different ways to model data; here more specifically a song.

My initial model of a song was a record:

public sealed record Song(int Id, bool IsVerifiedArtist, byte Rating);

Then I thought that perhaps, since the type only contains value types, it might be better to turn the above record into a record struct:

public record struct Song(int Id, bool IsVerifiedArtist, byte Rating);

It turns out that it makes no visible difference. In the chart, the two data series are so close to each other that you can't distinguish them.

Then I thought that instead of an int, a bool, and a byte, I could use a single bitmask to model all three values.

After all, I was only guessing when it came to data types anyway. It's likely that Rating is only a five-point or ten-point scale, but I still used a byte to model it. This suggests that I'm not using 96% of the data type's range. Perhaps I could use one of those redundant bits for IsVerifiedArtist, instead of an entire bool.

Taking this further, modelling the Id as an int suggests that you may have 4,294,967,295 unique songs. That's 4.3 billion songs - at least an order of magnitude more than those 100 million songs that we hear about. In reality though, most systems that use int for IDs only do so because int is CLS-compliant, and uint is not. In other words, most systems that use int for IDs most likely only use the positive half, which means there are 16 bytes to use for other purposes. Enter the bitmasked song:

public readonly struct Song
{
    private const uint idMask =
        0b0000_0111_1111_1111_1111_1111_1111_1111;
    private const uint isVerifiedArtistMask =
        0b1000_0000_0000_0000_0000_0000_0000_0000;
    private const uint ratingMask =
        0b0111_1000_0000_0000_0000_0000_0000_0000;
    private readonly uint bits;
 
    public Song(int id, bool isVerifiedArtist, byte rating)
    {
        var idBits = (uint)id & idMask;
        var isVerifiedArtistBits = isVerifiedArtist ? isVerifiedArtistMask : 0u;
        var ratingBits = ((uint)rating << 27) & ratingMask;
        bits = idBits | isVerifiedArtistBits | ratingBits;
    }
 
    public int Id => (int)(bits & idMask);
 
    public bool IsVerifiedArtist =>
        (bits & isVerifiedArtistMask) == isVerifiedArtistMask;
 
    public byte Rating => (byte)((bits & ratingMask) >> 27);
}

In this representation, I've set aside the lower 27 bits for the ID, enabling IDs to range between 0 and 134,217,727. The top bit is used for IsVerifiedArtist, and the remaining four bits for Rating.

This data structure only holds a single uint, and since I made it a struct, I thought it'd have minimal overhead.

As you can see in the above chart, that's not the case. When I run the experiment, this representation requires more memory.

Just to make sure I wasn't missing anything obvious, I tried making the bitmasked Song a class instead. No visible difference.

If you're wondering why the bitmasked data series only go to 500,000, it's because this change made the experiments glacial. It took somewhere between 12 and 24 hours to run the experiment with a size of 500,000.

For what it's worth, I don't think the slowdown is directly related to the data representation, but rather to the change I had to make to the FsCheck-based data generator:

let songParams = gen {
    let maxId = 0b0111_1111_1111_1111_1111_1111_1111
    let! songId = Gen.choose (1, maxId)
    let! isVerified = ArbMap.generate ArbMap.defaults
    let! rating = Gen.choose (0, 10) |> Gen.map byte
    return songId, isVerified, rating }
 
[<CompiledName "Song">]
let song = gen {
    let! (id, isVerifiedArtist, rating) = songParams
    return Song (id, isVerifiedArtist, rating) }

I can't explain why the bitmasked representation requires more memory, but I'm increasingly having a nagging feeling that I've made a mistake somewhere. If you can spot a mistake, please let me know by leaving a comment.

Other data representations #

I also considered whether it'd make sense to represent the entire data set as a huge matrix. One could, for example, let rows represent users, and columns songs, and let each element represent the number of times a user has listened to a particular song:

User	Song 1	Song 2	Song 3	...
123	0	0	4	...
456	2	0	4	...
...

Let's say that you may expect some users to listen to a song more than 255 times, but probably not more than 65,535 times. Thus, you could store each play count as a ushort. Still, you would need users x songs values, so if you have 100 million songs and 10 million users, that implies 2 PB of memory. That doesn't sound useful.

On the other hand, most of those elements are going to be 0, so perhaps one could use an adjacency list instead. That is, however, essentially what an IReadOnlyDictionary<string, IReadOnlyCollection<Scrobble>> is, so we're now back where we started.

Conclusion #

This article reports on some measurements I've made of memory requirements, assuming that we keep all scrobble data in memory. While I suspect that I've made a mistake, it still seems reasonable to conclude that the song recommendations scenario is on the edge of what's possible with the Impureim Sandwich pattern.

That's okay. I'm interested in understanding the limitations of solutions.

I do think, however, that it's worth taking note of just how massive amounts of data are required before the Impureim Sandwich pattern becomes untenable.

When I describe the pattern, the most common reaction is that it doesn't scale. And, as this article has explored, it's true that it doesn't scale. But it scales much, much better than you think. You may have billions of entities in your system, and they may still fit in a few gigabytes. Don't dismiss the Impureim Sandwich before you've made a real effort to understand the memory implications of it. Your intuition is likely to work against you.

I'll round off this part of the article series by showing how the Impureim Sandwich looks in other, more functional languages.

Next: Song recommendations as an F# Impureim Sandwich.

This blog is totally free, but if you like it, please consider supporting it.

Song recommendations as a C# Impureim Sandwich

2025-05-05T06:23:00+00:00

A refactoring example.

This is an article in a larger series about functional programming design alternatives. I'm assuming that you've read the previous articles, but briefly told, I'm considering an example presented by Oleksii Holub in the article Pure-Impure Segregation Principle. The example gathers song recommendations for a user in a long-running process.

In the previous article I argued that while the memory requirements for this problem seem so vast that an Impureim Sandwich appears out of the question, it's nonetheless worthwhile to at least try it out. The refactoring isn't that difficult, and it turns out that it does simplify the code.

Enumeration API #

The data access API is a web service:

"I don't own the database, those are requests to an external API service (think Spotify API) that provides the data."

Tweet, Oleksii Holub, 2022

In order to read all data, we'll have to assume that there's a way to enumerate all songs and all users. With that assumption, I add the GetAllSongs and GetAllUsers methods to the SongService interface:

public interface SongService
{
    Task<IEnumerable<Song>> GetAllSongs();
    Task<IEnumerable<User>> GetAllUsers();
    Task<IReadOnlyCollection<User>> GetTopListenersAsync(int songId);
    Task<IReadOnlyCollection<Scrobble>> GetTopScrobblesAsync(string userName);
}

It is, of course, a crucial assumption, and it's possible that no such API exists. On the other hand, a REST API could expose such functionality as a paged feed. Leafing through potentially hundreds (or thousands) such pages is bound to take some time, so it's good to know that this is a background process. As I briefly mentioned in the previous article, we could imagine that we have a dedicated indexing server for this kind purpose. While we may rightly expect the initial data load to take some time (hours, even), once it's in memory, we should be able to reuse it to calculate song recommendations for all users, instead of just one user.

In the previous article I estimated that it should be possible to keep all songs in memory with less than a gigabyte. Users, without scrobbles, also take up surprisingly little space, to the degree that a million users fit in a few dozen megabytes. Even if, eventually, we may be concerned about memory, we don't have to be concerned about this part.

In any case, the addition of these two new methods doesn't break the existing example code, although I did have to implement the method in the FakeSongService class that I introduced in the article Characterising song recommendations:

public Task<IEnumerable<Song>> GetAllSongs()
{
    return Task.FromResult<IEnumerable<Song>>(songs.Values);
}
 
public Task<IEnumerable<User>> GetAllUsers()
{
    return Task.FromResult(users.Select(kvp => new User(kvp.Key, kvp.Value.Values.Sum())));
}

With those additions, we can load all data as the first layer (phase, really) of the sandwich.

Front-loading the data #

Loading all the data is the responsibility of this DataCollector module:

public static class DataCollector
{
    public static async Task<IReadOnlyDictionary<int, IReadOnlyCollection<User>>>
        CollectAllTopListeners(this SongService songService)
    {
        var dict = new Dictionary<int, IReadOnlyCollection<User>>();
        foreach (var song in await songService.GetAllSongs())
        {
            var topListeners = await songService.GetTopListenersAsync(song.Id);
            dict.Add(song.Id, topListeners);
        }
        return dict;
    }
 
    public static async Task<IReadOnlyDictionary<string, IReadOnlyCollection<Scrobble>>>
        CollectAllTopScrobbles(this SongService songService)
    {
        var dict = new Dictionary<string, IReadOnlyCollection<Scrobble>>();
        foreach (var user in await songService.GetAllUsers())
        {
            var topScrobbles = await songService.GetTopScrobblesAsync(user.UserName);
            dict.Add(user.UserName, topScrobbles);
        }
        return dict;
    }
}

These two methods work with any SongService implementation, so while the code base will work with FakeSongService, real 'production code' might as well use an HTTP-based implementation that pages through the implied web API.

The dictionaries returned by the methods are likely to be huge. That's a major point of this exercise. Once the change is implemented and Characterisation Tests show that it still works, it makes sense to generate data to get a sense of the memory footprint.

Table-driven methods #

Perhaps you wonder why the above CollectAllTopListeners and CollectAllTopScrobbles methods return dictionaries of exactly that shape.

Code Complete describes a programming technique called table-driven methods. The idea is to replace branching instructions such as if, else, and switch with a lookup table. The overall point, however, is that you can replace function calls with table lookups.

Consider the GetTopListenersAsync method. It takes an int as input, and returns a Task<IReadOnlyCollection<User>> as output. If you ignore the Task, that's an IReadOnlyCollection<User>. In other words, you can exchange an int for an IReadOnlyCollection<User>.

If you have an IReadOnlyDictionary<int, IReadOnlyCollection<User>> you can also exchange an int for an IReadOnlyCollection<User>. These two APIs are functionally equivalent - although, of course, they have very different memory and run-time profiles.

The same goes for the GetTopScrobblesAsync method: It takes a string as input and returns an IReadOnlyCollection<Scrobble> as output (if you ignore the Task). An IReadOnlyDictionary<string, IReadOnlyCollection<Scrobble>> is equivalent.

To make it practical, it turns out that we also need a little helper method to deal with the case where the dictionary has no entry for a given key:

internal static IReadOnlyCollection<T> GetOrEmpty<T, TKey>(
    this IReadOnlyDictionary<TKey, IReadOnlyCollection<T>> dict,
    TKey key)
{
    if (dict.TryGetValue(key, out var result))
        return result;
    return Array.Empty<T>();
}

If there's no entry for a key, this function instead returns an empty array.

That should make it as easy as possible to replace calls to GetTopListenersAsync and GetTopScrobblesAsync with dictionary lookups.

Adding method parameters #

When refactoring, it's a good idea to proceed in small, controlled steps. You can see each of my micro-commits in the Git repository's refactor-to-function branch. Here, I'll give an overview.

First, I added two dictionaries as parameters to the GetRecommendationsAsync method. You may recall that the method used to look like this:

public async Task<IReadOnlyList<Song>> GetRecommendationsAsync(string userName)

After I added the two dictionaries, it looks like this:

public async Task<IReadOnlyList<Song>> GetRecommendationsAsync(
    string userName,
    IReadOnlyDictionary<string, IReadOnlyCollection<Scrobble>> topScrobbles,
    IReadOnlyDictionary<int, IReadOnlyCollection<User>> topListeners)

At this point, the GetRecommendationsAsync method uses neither the topScrobbles nor the topListeners parameter. Still, I consider this a distinct checkpoint that I commit to Git. As I've outlined in my book Code That Fits in Your Head, it's safest to either refactor production code while keeping test code untouched, or refactor test code without editing the production code. An API change like the current is an example of a situation where that separation is impossible. This is the reason I want to keep it small and isolated. While the change does touch both production code and test code, I'm not editing the behaviour of the System Under Test.

Tests now look like this:

[<Property>]
let ``No data`` () =
    Gen.userName |> Arb.fromGen |> Prop.forAll <| fun userName ->
    task {
        let srvc = FakeSongService ()
        let sut = RecommendationsProvider srvc
 
        let! topScrobbles = DataCollector.CollectAllTopScrobbles srvc
        let! topListeners = DataCollector.CollectAllTopListeners srvc
        let! actual =
            sut.GetRecommendationsAsync (userName, topScrobbles, topListeners)
 
        Assert.Empty actual } :> Task

The test now uses the DataCollector to front-load data into dictionaries that it then passes to GetRecommendationsAsync. Keep in mind that GetRecommendationsAsync doesn't yet use that data, but it's available to it once I make a change to that effect.

You may wish to compare this version of the No data test with the previous version shown in the article Characterising song recommendations.

Refactoring to function #

The code is now ready for refactoring from dependency injection to dependency rejection. It's even possible to do it one method call at a time, because the data in the FakeSongService is the same as the data in the two dictionaries. It's just two different representations of the same data.

Since, as described above, the dictionaries are equivalent to the SongService queries, each is easily replaced with the other. The first impure action in GetRecommendationsAsync, for example, is this one:

var scrobbles = await _songService.GetTopScrobblesAsync(userName);

The equivalent dictionary lookup enables us to change that line of code to this:

var scrobbles = topScrobbles.GetOrEmpty(userName);

Notice that the dictionary lookup is a pure function that the method need not await.

Even though the rest of GetRecommendationsAsync still queries the injected SongService, all tests pass, and I can commit this small, isolated change to Git.

Proceeding in a similar fashion enables us to eliminate the SongService queries one by one. There are only three method calls, so this can be done in three controlled steps. Once the last impure query has been replaced, the C# compiler complains about the async keyword in the declaration of the GetRecommendationsAsync method.

Not only is the async keyword no longer required, the method is no longer asynchronous. There's no reason to return a Task, and the Async method name suffix is also misleading.

public IReadOnlyList<Song> GetRecommendations(
    string userName,
    IReadOnlyDictionary<string, IReadOnlyCollection<Scrobble>> topScrobbles,
    IReadOnlyDictionary<int, IReadOnlyCollection<User>> topListeners)

The GetRecommendations method no longer uses the injected SongService, and since it's is the only method of the RecommendationsProvider class, we can now (r)eject the dependency.

This furthermore means that the class no longer has any class fields; we might as well make it (and the GetRecommendations function) static. Here's the final function in its entirety:

public static IReadOnlyList<Song> GetRecommendations(
    string userName,
    IReadOnlyDictionary<string, IReadOnlyCollection<Scrobble>> topScrobbles,
    IReadOnlyDictionary<int, IReadOnlyCollection<User>> topListeners)
{
    // 1. Get user's own top scrobbles
    // 2. Get other users who listened to the same songs
    // 3. Get top scrobbles of those users
    // 4. Aggregate the songs into recommendations
 
    var scrobbles = topScrobbles.GetOrEmpty(userName);
    var scrobblesSnapshot = scrobbles
        .OrderByDescending(s => s.ScrobbleCount)
        .Take(100)
        .ToArray();
 
    var recommendationCandidates = new List<Song>();
    foreach (var scrobble in scrobblesSnapshot)
    {
        var otherListeners = topListeners.GetOrEmpty(scrobble.Song.Id);
        var otherListenersSnapshot = otherListeners
            .Where(u => u.TotalScrobbleCount >= 10_000)
            .OrderByDescending(u => u.TotalScrobbleCount)
            .Take(20)
            .ToArray();
 
        foreach (var otherListener in otherListenersSnapshot)
        {
            var otherScrobbles =
                topScrobbles.GetOrEmpty(otherListener.UserName);
            var otherScrobblesSnapshot = otherScrobbles
                .Where(s => s.Song.IsVerifiedArtist)
                .OrderByDescending(s => s.Song.Rating)
                .Take(10)
                .ToArray();
 
            recommendationCandidates.AddRange(
                otherScrobblesSnapshot.Select(s => s.Song)
            );
        }
    }
 
    var recommendations = recommendationCandidates
        .OrderByDescending(s => s.Rating)
        .Take(200)
        .ToArray();
 
    return recommendations;
}

The overall structure is similar to the original version. Now that the code is simpler (because there's no longer any asynchrony) you could keep refactoring. With this C# code example, I'm going to stop here, but when I port it to F# I'm going to refactor more aggressively.

Sandwich #

One point of the whole exercise is to demonstrate how to refactor to an Impureim Sandwich. The GetRecommendations method shown above constitutes the pure filling of the sandwich, but what does the entire sandwich look like?

In this code base, the sandwiches only exist as unit tests, the simplest of which is still the No data test:

[<Property>]
let ``No data`` () =
    Gen.userName |> Arb.fromGen |> Prop.forAll <| fun user ->
    task {
        let srvc = FakeSongService ()
 
        let! topScrobbles = DataCollector.CollectAllTopScrobbles srvc
        let! topListeners = DataCollector.CollectAllTopListeners srvc
        let actual = RecommendationsProvider.GetRecommendations (user, topScrobbles, topListeners)
 
        Assert.Empty actual } :> Task

In the above code snippet, I've coloured in the relevant part of the test. I admit that it's a stretch to colour the last line red, since Assert.Empty is, at least, deterministic. One could argue that since it throws an exception on failure, it's not strictly free of side effects, but that's really a weak argument. It would be easy to refactor assertions to pure functions.

Instead, you may consider the bottom layer of the sandwich as a placeholder where something impure might happen. The background service that updates the song recommendations may, for example, save the result as a (CQRS-style) materialised view.

The above test snippet, then, is more of a sketch of how the Impureim Sandwich may look: First, front-load data using the DataCollector methods; second, call GetRecommendations; third, do something with the result.

Conclusion #

The changes demonstrated in this article serve two purposes. One is to show how to refactor an impure action to a pure function, pursuing the notion of an Impureim Sandwich. The second is to evaluate a proof-of-concept: If we do, indeed, front-load all of the data, is it realistic that all data fits in RAM?

We have yet to address that question, but since the present article is already substantial, I'll address that in a separate article.

Next: Song recommendations proof-of-concept memory measurements.

This blog is totally free, but if you like it, please consider supporting it.

Song recommendations as an Impureim Sandwich

2025-04-28T07:16:00+00:00

Does your data fit in RAM?

This article is part of a series on functional programming design alternatives. In a previous article you saw how to add enough Characterisation Tests to capture the intended behaviour of the example song recommendations system originally presented by Oleksii Holub in the article Pure-Impure Segregation Principle.

Problem statement #

After showing how one problem can be refactored to pure functions, Oleksii Holub writes:

"Although very useful, the type of "lossless" refactoring shown earlier only works if the data required by the function can be easily encapsulated within its input parameters. Unfortunately, this is not always the case.

"Often a function may need to dynamically resolve data from an external API or a database, with no way of knowing about it beforehand. This typically results in an implementation where pure and impure concerns are interleaved with each other, creating a tightly coupled cohesive structure."

Oleksii Holub, Pure-Impure Segregation Principle

The article then proceeds to present the song recommendations example. It's a single C# method that queries a data store or third-party service to recommend songs. I'm imagining that it queries a third-party web service that contains usages data for a system like Spotify.

"The above algorithm works by retrieving the user's most listened songs, finding other people who have also listened to the same titles, and then extracting their top songs as well. Those songs are then aggregated into a list of recommendations and returned to the caller.

"It's quite clear that this function would benefit greatly from being pure, seeing how much business logic is encapsulated within it. Unfortunately, the technique we relied upon earlier won't work here.

"In order to fully isolate GetRecommendationsAsync(...) from its impure dependencies, we would have to somehow supply the function with an entire list of songs, users, and their scrobbles upfront. If we assume that we're dealing with data on millions of users, it's obvious that this would be completely impractical and likely even impossible."

Oleksii Holub, Pure-Impure Segregation Principle

It does, indeed, sound impractical.

Data sizes #

Can you, however, trust your intuition? Research suggests that the human brain is ill-equipped to think about randomness and probabilities, and I've observed something similar when it comes to data sizes.

In the real world, a million of anything countable is an almost incomprehensible amount, so it's no wonder if our intuition fails us. A million records sounds like a lot, but if it's only a few integers, is it really that bad?

Many systems use 32-bit integers for various IDs. A million IDs, then, is 32 million bits, or approximately 4 MB. As I'm writing this, the smallest Azure instance (Free F1) has 1 GB of memory, and while the OS takes a big bite out of that, 4 MB is nothing.

The song recommendations problem implies larger memory pressure. It may not fit on every machine, but it's worth considering if, after all, it doesn't fit in RAM.

My real-life experience with developing streaming services #

It just so happens that I have professional experience developing REST APIs for a white-label audio streaming service. Back in the early 2010s I helped design and implement the company's online music catalogue, user system, and a few other services. The catalogue is particularly interesting in this regard, since it only changed nightly, and we were planning on relying on HTTP for caching.

I vividly recall a meeting we had with the IT operations specialist responsible for the new catalogue service. We explained that we'd set HTTP cache timeouts to 6 hours, and asked if he'd be able to set up a reverse proxy so that we didn't have to implement caching in our code base.

He asked how much cache space we needed.

We knew the size of a typical HTTP response, and the number of tracks, artists, and albums in the system, so after a back-of-the-envelope calculation, we told him: 18 GB.

He just shrugged and said "OK".

In 2012 I though that 18 GB was a fair amount of data (I actually still think that). Even so, the operations team had plenty of servers with that capacity.

Later, I did more work for that company, but most of it is less relevant to the song recommendations example. What does turn out to be relevant to the topic is something I learned the first few days of my engagement.

Early on, before I was involved, the company needed a recommendation system, but hadn't been able to find any off-the-shelf component. This was in the early 2000s and before Solr, but after Lucene. I'm not aware of all the forces that affected my then future client, but in the end, they decided to write their own search and recommendations engine.

Essentially, during the night a beefy server would read all relevant data from the database, crunch it, create data structures, and keep all data in memory. Like the reverse proxy, it required a server with more RAM than a normal pizza box, but not prohibitively so.

Costs #

Consider the cost of hardware, compared to developer time. A few specialised servers may set your organisation back a few thousand of dollars/pounds/euros. That's an amount you can easily burn through in salary if the code is too complicated, or has too many bugs.

You may argue that if you already have programmers on staff, they don't cost extra, but a too-complicated code base is still going to slow them down. Thus, the wrong software design could incur an opportunity cost greater than the cost of a server.

One of many reasons I'm interested in functional programming (FP) is its potential to make code bases simpler. The Impureim Sandwich is a wonderfully simple design, so it's worth pursuing; not only for some FP ideal, but because of its simplifying potential.

Intuition may tell us that the song recommendations scenario is prohibitively big, and therefore, an Impureim Sandwich is out of the question. As this overall article series explores, it's not the only alternative, but given its advantages, its worth giving it a second chance.

Analysis #

The GetRecommendationsAsync method from the example makes a lot of external calls, with its nested loops. The method uses the first call to GetTopScrobblesAsync to produce the scrobblesSnapshot variable, which is capped at 100 objects. If we assume that this method call returns at least 100 objects, the outer foreach loop will make 100 calls to GetTopListenersAsync.

If we again assume that each of these return enough data, the inner foreach loop will make 20 calls to GetTopScrobblesAsync, for each object in the outer loop. That's 2,000 external calls, plus the 100 calls in the outer loop, plus the initial call to GetTopScrobblesAsync, for a total of 2,101.

When I first saw the example, I didn't know much about the overall context. I didn't know if these impure actions were database queries or web service calls, so I asked Oleksii Holub.

It turns out that it's all web service calls, and as I interpret the response, GetRecommendationsAsync is being invoked from a background maintenance process.

"It takes around 10 min in total while maintaining it."

Tweet, Oleksii Holub, 2022

That's good to know, because if we're going to consider an Impureim Sandwich, it implies reading gigabytes of data in the first phase. That's going to take some time, but if this is a background process, we do have time.

Memory estimates #

One thing is to load an entire song catalogue into memory. That's what required 18 GB in 2012. Another thing is to load all scrobbles; i.e. statistics about plays. Fortunately, in order to produce song recommendations, we only need IDs. Consider again the data structures from the previous article:

public sealed record Song(int Id, bool IsVerifiedArtist, byte Rating);

public sealed record Scrobble(Song Song, int ScrobbleCount);

public sealed record User(string UserName, int TotalScrobbleCount);

Apart from the UserName all values are small predictable values: int, byte, and bool, and while a string may be arbitrarily long, we can make a guess at the average size of a user name. In the previous article, I assumed that the user name would be an alphanumeric string between one and twenty characters.

How many songs might a system contain? Some numbers thrown around for a system like Spotify suggest a number on the order of 100 million. With an int, a bool, and a byte, we can estimate that a song requires 6 bytes, plus some overhead. Let's guess 8 bytes. A 100 million songs would then require 800 million bytes, or around 800 MB. That eliminates the smallest cloud instances, but is in itself easily within reach for all modern computers. Your phone has more memory than that.

How about scrobbles? While I don't use Spotify, I do scrobble plays to Last.fm. At the moment I have around 114,000 scrobbles, and while I don't listen to music as much as I used to when I was younger, I have, on the other hand, been at it for a long time: Since 2007. If we assume that each user has 200,000 scrobbles, and a scrobble requires 8 bytes, that's 1,600,000 bytes, or 1.6 MB. Practically nothing.

The size of a User object depends on how long the user name is, but will probably, on average, be less than 32 bytes. Compared to the user's scrobbles, we can ignore the memory pressure of the user object itself.

As the number of users grow, it will dominate the memory requirements for the catalogue. How many users should we assume?

A million is probably too few, but for a frame of reference, that would require 1,6 TB. This is where it starts to sound unfeasible to keep all data in RAM. Even though servers with that much RAM exist, they're so expensive (still) that the above cost consideration no longer applies.

Still, there are some naive assumptions above. Instead of storing each scrobble in a separate Scrobble object, you could store repeated plays as a single object with the appropriate ScrobbleCount value. If you've listened to the same song 50 times, it doesn't require 400 bytes of storage, but only 8 bytes. That is, after all, orders of magnitude less.

In the end, back-of-the-envelope calculations are fine, but measurements are better. It might be worthwhile to develop a proof of concept and measure how much memory it requires.

In three articles, I'll explore how a song recommendations Impureim Sandwich looks in various constellations:

Song recommendations as a C# Impureim Sandwich
Song recommendations as an F# Impureim Sandwich
Song recommendations as a Haskell Impureim Sandwich

In the end, it may turn out that for this particular system, an Impureim Sandwich truly is unfeasible. Keep in mind, though, that the purpose of this article series is to demonstrate alternative designs. The song recommendations problem is just a placeholder example. Perhaps you have another system where, intuitively, an Impureim Sandwich sounds impossible, but once you run the numbers, it might actually be not only possible, but desirable.

Conclusion #

Modern computers have so vast storage capacities that intuition often fails us. We may think that billions of data points sounds like something that can't possibly fit in RAM. When you run the numbers, however, it may turn out that the required data fits on a normal server.

If so, an Impureim Sandwich may still be an option. Load data into memory, pass it as argument to a pure function, and handle the return value.

The song recommendations scenario is interesting because an Impureim Sandwich seems to be pushing the envelope. It probably is impractical, but still worth a proof of concept. On the other hand, if it's impractical, it's worthwhile to also explore alternatives. Later articles will do that, but first, if you're interested, the next articles look at the proofs of concept in three languages.

Next: Song recommendations as a C# Impureim Sandwich.

This blog is totally free, but if you like it, please consider supporting it.

Porting song recommendations to Haskell

2025-04-21T10:19:00+00:00

An F# code base translated to Haskell.

This article is part of a larger article series that examines variations of how to take on a non-trivial problem using functional architecture. In a previous article we established a baseline C# code base. Future articles are going to use that C# code base as a starting point for refactored code. On the other hand, I also want to demonstrate what such solutions may look like in languages like F# or Haskell. In this article, you'll see how to port the baseline to Haskell. To be honest, I first ported the C# code to F#, and then used the F# code as a guide to implement equivalent Haskell code.

If you're following along in the Git repositories, this is a repository separate from the .NET repositories. The code shown here is from its master branch.

If you don't care about Haskell, you can always go back to the table of contents in the 'root' article and proceed to the next topic that interests you.

Data structures #

When working with statically typed functional languages like Haskell, it often makes most sense to start by declaring data structures.

data User = User
  { userName :: String
  , userScrobbleCount :: Int }
  deriving (Show, Eq)

This is much like an F# or C# record declaration, and this one echoes the corresponding types in F# and C#. The most significant difference is that here, a user's total count of scrobbles is called userScrobbleCount rather than TotalScrobbleCount. The motivation behind that variation is that Haskell data 'getters' are actually top-level functions, so it's usually a good idea to prefix them with the name of the data structure they work on. Since the data structure is called User, both 'getter' functions get the user prefix.

I found userTotalScrobbleCount a bit too verbose to my tastes, so I dropped the Total part. Whether or not that's appropriate remains to be seen. Naming in programming is always hard, and there's a risk that you don't get it right the first time around. Unless you're publishing a reusable library, however, the option to rename it later remains.

The other two data structures are quite similar:

data Song = Song
  { songId :: Int
  , songHasVerifiedArtist :: Bool
  , songRating :: Word8 }
  deriving (Show, Eq)
 
data Scrobble = Scrobble
  { scrobbledSong :: Song
  , scrobbleCount :: Int }
  deriving (Show, Eq)

I thought that scrobbledSong was more descriptive than scrobbleSong, so I allowed myself that little deviation from the idiomatic naming convention. It didn't cause any problems, but I'm still not sure if that was a good decision.

How does one translate a C# interface to Haskell? Although type classes aren't quite the same as C# or Java interfaces, this language feature is close enough that I can use it in that role. I don't consider such a type class idiomatic in Haskell, but as an counterpart to the C# interface, it works well enough.

class SongService a where
  getTopListeners :: a -> Int -> IO [User]
  getTopScrobbles :: a -> String -> IO [Scrobble]

Any instance of the SongService class supports queries for top listeners of a particular song, as well as for top scrobbles for a user.

To reiterate, I don't intend to keep this type class around if I can help it, but for didactic reasons, it'll remain in some of the future refactorings, so that you can contrast and compare the Haskell code to its C# and F# peers.

Test Double #

To support tests, I needed a Test Double, so I defined the following Fake service, which is nothing but a deterministic in-memory instance. The type itself is just a wrapper of two maps.

data FakeSongService = FakeSongService
  { fakeSongs :: Map Int Song
  , fakeUsers :: Map String (Map Int Int) }
  deriving (Show, Eq)

Like the equivalent C# class, fakeSongs is a map from song ID to Song, while fakeUsers is a bit more complex. It's a map keyed on user name, but the value is another map. The keys of that inner map are song IDs, while the values are the number of times each song was scrobbled by that user.

The FakeSongService data structure is a SongService instance by explicit implementation:

instance SongService FakeSongService where
  getTopListeners srvc sid = do
    return $
      uncurry User <$>
      Map.toList (sum <$> Map.filter (Map.member sid) (fakeUsers srvc))
  getTopScrobbles srvc userName = do
    return $
      fmap (\(sid, c) -> Scrobble (fakeSongs srvc ! sid) c) $
      Map.toList $
      Map.findWithDefault Map.empty userName (fakeUsers srvc)

In order to find all the top listeners of a song, it finds all the fakeUsers who have the song ID (sid) in their inner map, sum all of those users' scrobble counts together and creates User values from that data.

To find the top scrobbles of a user, the instance finds the user in the fakeUsers map, looks each of that user's scrobbled song up in fakeSongs, and creates Scrobble values from that information.

Finally, test code needs a way to add data to a FakeSongService value, which this test-specific helper function accomplishes:

scrobble userName s c (FakeSongService ss us) =
  let sid = songId s
      ss' = Map.insertWith (\_ _ -> s) sid s ss
      us' = Map.insertWith (Map.unionWith (+)) userName (Map.singleton sid c) us
  in FakeSongService ss' us'

Given a user name, a song, a scrobble count, and a FakeSongService, this function returns a new FakeSongService value with the new data added to the data already there.

QuickCheck Arbitraries #

In the F# test code I used FsCheck to get good coverage of the code. For Haskell, I'll use QuickCheck.

Porting the ideas from the F# tests, I define a QuickCheck generator for user names:

alphaNum :: Gen Char
alphaNum = elements (['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9'])
 
userName :: Gen String
userName = do
  len <- choose (1, 19)
  first <- elements $ ['a'..'z'] ++ ['A'..'Z']
  rest <- vectorOf len alphaNum
  return $ first : rest

It's not that the algorithm only works if usernames are alphanumeric strings that start with a letter and are no longer than twenty characters, but whenever a property is falsified, I'd rather look at a user name like "Yvj0D1I" or "tyD9P1eOqwMMa1Q6u" (which are already bad enough), than something with line breaks and unprintable characters.

Working with QuickCheck, it's often useful to wrap types from the System Under Test in test-specific Arbitrary wrappers:

newtype ValidUserName = ValidUserName { getUserName :: String } deriving (Show, Eq)
 
instance Arbitrary ValidUserName where
  arbitrary = ValidUserName <$> userName

I also defined a (simpler) Arbitrary instance for Song called AnySong.

A few properties #

With FakeSongService in place, I proceeded to add the test code, starting from the top of the F# test code, and translating each as faithfully as possible. The first one is an Ice Breaker Test that only verifies that the System Under Test exists and doesn't crash when called.

testProperty "No data" $ \ (ValidUserName un) -> ioProperty $ do
  actual <- getRecommendations emptyService un
  return $ null actual

As I've done since at least 2019, it seems, I've inlined test cases as anonymous functions; this time as QuickCheck properties. This one just creates a FakeSongService that contains no data, and asks for recommendations. The expected result is that actual is empty (null), since there's nothing to recommend.

A slightly more involved property adds some data to the service before requesting recommendations:

testProperty "One user, some songs" $ \
  (ValidUserName user)
  (fmap getSong -> songs)
  -> monadicIO $ do
  scrobbleCounts <- pick $ vectorOf (length songs) $ choose (1, 100)
  let scrobbles = zip songs scrobbleCounts
  let srvc = foldr (uncurry (scrobble user)) emptyService scrobbles
 
  actual <- run $ getRecommendations srvc user
 
  assertWith (null actual) "Should be empty"

A couple of things are worthy of note. First, the property uses a view pattern to project a list of songs from a list of Arbitraries, where getSong is the 'getter' that belongs to the AnySong newtype wrapper.

I find view patterns quite useful as a declarative way to 'promote' a single Arbitrary instance to a list. In a third property, I take it a step further:

(fmap getUserName -> NonEmpty users)

This not only turns the singular ValidUserName wrapper into a list, but by projecting it into NonEmpty, the test declares that users is a non-empty list. QuickCheck picks all that up and generates values accordingly.

If you're interested in seeing this more advanced view pattern in context, you may consult the Git repository.

Secondly, the "One user, some songs" test runs in monadicIO, which I didn't know existed before I wrote these tests. Together with pick, run, and assertWith, monadicIO is defined in Test.QuickCheck.Monadic. It enables you to write properties that run in IO, which these properties need to do, because getRecommendations is IO-bound.

There's one more QuickCheck property in the code base, but it mostly repeats techniques already shown here. See the Git repository for all the details, if necessary.

Examples #

In addition to the properties, I also ported the F# examples; that is, 'normal' unit tests. Here's one of them:

"One verified recommendation" ~: do
  let srvc =
        scrobble "ana" (Song 2 True 5) 9_9990 $
        scrobble "ana" (Song 1 False 5) 10 $
        scrobble "cat" (Song 1 False 6) 10 emptyService
 
  actual <- getRecommendations srvc "cat"
 
  [Song 2 True 5] @=? actual

This one is straightforward, but as I already discussed when characterizing the original code, some of the examples essentially document quirks in the implementation. Here's the relevant test, translated to Haskell:

"Only top-rated songs" ~: do
  -- Scale ratings to keep them less than or equal to 10.
  let srvc =
        foldr (\i -> scrobble "hyle" (Song i True (toEnum i `div` 2)) 500) emptyService [1..20]
 
  actual <- getRecommendations srvc "hyle"
 
  assertBool "Should not be empty" (not $ null actual)
  -- Since there's only one user, but with 20 songs, the implementation
  -- loops over the same songs 20 times, so 400 songs in total (with
  -- duplicates). Ordering on rating, only the top-rated 200 remains, that
  -- is, those rated 5-10. Note that this is a Characterization Test, so not
  -- necessarily reflective of how a real recommendation system should work.
  assertBool "Should have 5+ rating" (all ((>= 5) . songRating) actual)

This test creates twenty scrobbles for one user: One with a zero rating, two with rating 1, two with rating 2, and so on, up to a single song with rating 10.

The implementation of GetRecommendationsAsync uses these twenty songs to find 'other users' who have these top songs as well. In this case, there's only one user, so for every of those twenty songs, you get the same twenty songs, for a total of 400.

There are more unit tests than these. You can see them in the Git repository.

Implementation #

The most direct translation of the C# and F# 'reference implementation' that I could think of was this:

getRecommendations srvc un = do
  -- 1. Get user's own top scrobbles
  -- 2. Get other users who listened to the same songs
  -- 3. Get top scrobbles of those users
  -- 4. Aggregate the songs into recommendations

  -- Impure
  scrobbles <- getTopScrobbles srvc un
 
  -- Pure
  let scrobblesSnapshot = take 100 $ sortOn (Down . scrobbleCount) scrobbles
 
  recommendationCandidates <- newIORef []
  forM_ scrobblesSnapshot $ \scrobble -> do
    -- Impure
    otherListeners <- getTopListeners srvc $ songId $ scrobbledSong scrobble
 
    -- Pure
    let otherListenersSnapshot =
          take 20 $
          sortOn (Down . userScrobbleCount) $
          filter ((10_000 <=) . userScrobbleCount) otherListeners
 
    forM_ otherListenersSnapshot $ \otherListener -> do
      -- Impure
      otherScrobbles <- getTopScrobbles srvc $ userName otherListener
 
      -- Pure
      let otherScrobblesSnapshot =
            take 10 $
            sortOn (Down . songRating . scrobbledSong) $
            filter (songHasVerifiedArtist . scrobbledSong) otherScrobbles
 
      forM_ otherScrobblesSnapshot $ \otherScrobble -> do
        let song = scrobbledSong otherScrobble
        modifyIORef recommendationCandidates (song :)
 
  recommendations <- readIORef recommendationCandidates
  -- Pure
  return $ take 200 $ sortOn (Down . songRating) recommendations

In order to mirror the original implementation as closely as possible, I declare recommendationCandidates as an IORef so that I can incrementally add to it as the action goes through its nested loops. Notice the modifyIORef towards the end of the code listing, which adds a single song to the list.

Once all the looping is done, the action uses readIORef to pull the recommendations out of the IORef.

As you can see, I also ported the comments from the original C# code.

I don't consider this idiomatic Haskell code, but the goal in this article was to mirror the C# code as closely as possible. Once I start refactoring, you'll see some more idiomatic implementations.

Conclusion #

Together with the previous two articles in this article series, this establishes a baseline from which I can refactor the code. While we might consider the original C# code idiomatic, this port to Haskell isn't. It is, on the other hand, similar enough to both its C# and F# peers that we can compare and contrast all three.

Particularly two design choices make this Haskell implementation less than idiomatic. One is the use of IORef to update a list of songs. The other is using a type class to model an external dependency.

As I cover various alternative architectures in this article series, you'll see how to get rid of both.

Next: Song recommendations as an Impureim Sandwich.

This blog is totally free, but if you like it, please consider supporting it.

Porting song recommendations to F#

2025-04-14T08:54:00+00:00

A C# code base translated to F#.

This article is part of a larger article series that examines variations of how to take on a non-trivial problem using functional architecture. In the previous article we established a baseline C# code base. Future articles are going to use that C# code base as a starting point for refactored code. On the other hand, I also want to demonstrate what such solutions may look like in languages like F# or Haskell. In this article, you'll see how to port the C# baseline to F#.

The code shown in this article is from the fsharp-port branch of the accompanying Git repository.

Data structures #

We may start by defining the required data structures. All are going to be records.

type User = { UserName : string; TotalScrobbleCount : int }

Just like the equivalent C# code, a User is just a string and an int.

When creating new values, record syntax can sometimes be awkward, so I also define a curried function to create User values:

let user userName totalScrobbleCount =
    { UserName = userName; TotalScrobbleCount = totalScrobbleCount }

Likewise, I define Song and Scrobble in the same way:

type Song = { Id : int; IsVerifiedArtist : bool; Rating : byte }
let song id isVerfiedArtist rating =
    { Id = id; IsVerifiedArtist = isVerfiedArtist; Rating = rating }
 
type Scrobble = { Song : Song; ScrobbleCount : int }
let scrobble song scrobbleCount = { Song = song; ScrobbleCount = scrobbleCount }

To be honest, I only use those curried functions sparingly, so they're somewhat redundant. Perhaps I should consider getting rid of them. For now, however, they stay.

Since I'm moving all the code to F#, I also have to translate the interface.

type SongService =
    abstract GetTopListenersAsync : songId : int -> Task<IReadOnlyCollection<User>>
    abstract GetTopScrobblesAsync : userName : string -> Task<IReadOnlyCollection<Scrobble>>

The syntax is different from C#, but otherwise, this is the same interface.

Implementation #

Those are all the supporting types required to implement the RecommendationsProvider. This is the most direct translation of the C# code that I could think of:

type RecommendationsProvider (songService : SongService) =
    member _.GetRecommendationsAsync userName = task {
        // 1. Get user's own top scrobbles
        // 2. Get other users who listened to the same songs
        // 3. Get top scrobbles of those users
        // 4. Aggregate the songs into recommendations
 
        // Impure
        let! scrobbles = songService.GetTopScrobblesAsync userName
 
        // Pure
        let scrobblesSnapshot =
            scrobbles
            |> Seq.sortByDescending (fun s -> s.ScrobbleCount)
            |> Seq.truncate 100
            |> Seq.toList
 
        let recommendationCandidates = ResizeArray ()
        for scrobble in scrobblesSnapshot do
            // Impure
            let! otherListeners =
                songService.GetTopListenersAsync scrobble.Song.Id
 
            // Pure
            let otherListenersSnapshot =
                otherListeners
                |> Seq.filter (fun u -> u.TotalScrobbleCount >= 10_000)
                |> Seq.sortByDescending (fun u -> u.TotalScrobbleCount)
                |> Seq.truncate 20
                |> Seq.toList
 
            for otherListener in otherListenersSnapshot do
                // Impure
                let! otherScrobbles =
                    songService.GetTopScrobblesAsync otherListener.UserName
 
                // Pure
                let otherScrobblesSnapshot =
                    otherScrobbles
                    |> Seq.filter (fun s -> s.Song.IsVerifiedArtist)
                    |> Seq.sortByDescending (fun s -> s.Song.Rating)
                    |> Seq.truncate 10
                    |> Seq.toList
 
                otherScrobblesSnapshot
                |> List.map (fun s -> s.Song)
                |> recommendationCandidates.AddRange
 
        // Pure
        let recommendations =
            recommendationCandidates
            |> Seq.sortByDescending (fun s -> s.Rating)
            |> Seq.truncate 200
            |> Seq.toList
            :> IReadOnlyCollection<_>
 
        return recommendations }

As you can tell, I've kept the comments from the original, too.

Test Double #

In the previous article, I'd written the Fake SongService in C#. Since, in this article, I'm translating everything to F#, I need to translate the Fake, too.

type FakeSongService () =
    let songs = ConcurrentDictionary<int, Song> ()
    let users = ConcurrentDictionary<string, ConcurrentDictionary<int, int>> ()
 
    interface SongService with
        member _.GetTopListenersAsync songId =
            let listeners =
                users
                |> Seq.filter (fun kvp -> kvp.Value.ContainsKey songId)
                |> Seq.map (fun kvp -> user kvp.Key (Seq.sum kvp.Value.Values))
                |> Seq.toList
 
            Task.FromResult listeners
 
        member _.GetTopScrobblesAsync userName =
            let scrobbles =
                users.GetOrAdd(userName, ConcurrentDictionary<_, _> ())
                |> Seq.map (fun kvp -> scrobble songs[kvp.Key] kvp.Value)
                |> Seq.toList
 
            Task.FromResult scrobbles
 
    member _.Scrobble (userName, song : Song, scrobbleCount) =
        let addScrobbles (scrobbles : ConcurrentDictionary<_, _>) =
            scrobbles.AddOrUpdate (
                song.Id,
                scrobbleCount,
                fun _ oldCount -> oldCount + scrobbleCount)
            |> ignore
            scrobbles
 
        users.AddOrUpdate (
            userName,
            ConcurrentDictionary<_, _>
                [ KeyValuePair.Create (song.Id, scrobbleCount) ],
            fun _ scrobbles -> addScrobbles scrobbles)
        |> ignore
        
        songs.AddOrUpdate (song.Id, song, fun _ _ -> song) |> ignore

Apart from the code shown here, only minor changes were required for the tests, such as using those curried creation functions instead of constructors, a cast to SongService, and a few other non-behavioural things like that. All tests still pass, so I consider this a faithful translation of the C# code base.

Conclusion #

This article does more groundwork. Since it may be illuminating to see one problem represented in more than one programming language, I present it in both C#, F#, and Haskell. The next article does exactly that: Translates this F# code to Haskell. Once all three bases are established, we can start introducing solution variations.

If you don't care about the Haskell examples, you can always go back to the first article in this article series and use the table of contents to jump to the next C# example.

Next: Porting song recommendations to Haskell.

This blog is totally free, but if you like it, please consider supporting it.

Characterising song recommendations

2025-04-10T08:05:00+00:00

Using characterisation tests and mutation testing to describe existing behaviour. An example.

This article is part of an article series that presents multiple design alternatives for a given problem that I call the song recommendations problem. In short, the problem is to recommend songs to a user based on a vast repository of scrobbles. The problem was originally proposed by Oleksii Holub as a an example of a problem that may not be a good fit for functional programming (FP).

As I've outlined in the introductory article, I'd like to use the opportunity to demonstrate alternative FP designs. Before I can do that, however, I need a working example of Oleksii Holub's code example, as well as a trustworthy test suite. That's what this article is about.

The code in this article mostly come from the master branch of the .NET repository that accompanies this article series, although some of the code is taken from intermediate commits on that branch.

Inferred details #

The original article only shows code, but doesn't link to an existing code base. While I suppose I could have asked Oleksii Holub if he had a copy he would share, the existence of such a code base isn't a given. In any case, inferring an entire code base from a comprehensive snippet is an interesting exercise in its own right.

The first step was to copy the example code into a code base. Initially it didn't compile because of some missing dependencies that I had to infer. It was only three Value Objects and an interface:

public sealed record Song(int Id, bool IsVerifiedArtist, byte Rating);

public sealed record Scrobble(Song Song, int ScrobbleCount);

public sealed record User(string UserName, int TotalScrobbleCount);

public interface SongService
{
    Task<IReadOnlyCollection<User>> GetTopListenersAsync(int songId);
    Task<IReadOnlyCollection<Scrobble>> GetTopScrobblesAsync(string userName);
}

These type declarations are straightforward, but still warrant a few comments. First, Song, Scrobble, and User are C# records, which is a more recent addition to the language. If you're reading along here, but using another C-based language, or an older version of C#, you can implement such immutable Value Objects with normal language constructs; it just takes more code, instead of the one-liner syntax. F# developers will, of course, be familiar with the concept of records, and Haskell also has them.

Another remark about the above type declarations is that while SongService is an interface, it has no I prefix. This is syntactically legal, but not idiomatic in C#. I've used the name from the original code sample verbatim, so that's the immediate explanation. It's possible that Oleksii Holub intended the type to be a base class, but for various reasons I prefer interfaces, although in this particular example I don't think it would have made much of a difference. I'm only pointing it out because there's a risk that it might confuse some readers who are used to the C# naming convention. Java programmers, on the other hand, should feel at home.

As far as I remember, the only other change I had to make to the code in order to make it compile was to give the RecommendationsProvider class a constructor:

public RecommendationsProvider(SongService songService)
{
    _songService = songService;
}

This is just basic Constructor Injection, and while I find the underscore prefix redundant, I've kept it in order to stay as faithful to the original example as possible.

At this point the code compiles.

Test Double #

The goal of this article series is to present several alternative designs that implement the same behaviour. This means that as I refactor the code, I need to know that I didn't break existing functionality.

"to refactor, the essential precondition is [...] solid tests"

Refactoring, Martin Fowler, 1999

Currently, I have no tests, so I'll have to add some. Since RecommendationsProvider makes heavy use of its injected SongService, tests must supply that dependency in order to do meaningful work. Since Stubs and Mocks break encapsulation I instead favour state-based testing with Fake Objects.

After some experimentation, I arrived at this FakeSongService:

public sealed class FakeSongService : SongService
{
    private readonly ConcurrentDictionary<int, Song> songs;
    private readonly ConcurrentDictionary<string, ConcurrentDictionary<int, int>> users;
 
    public FakeSongService()
    {
        songs = new ConcurrentDictionary<int, Song>();
        users = new ConcurrentDictionary<string, ConcurrentDictionary<int, int>>();
    }
 
    public Task<IReadOnlyCollection<User>> GetTopListenersAsync(int songId)
    {
        var listeners =
            from kvp in users
            where kvp.Value.ContainsKey(songId)
            select new User(kvp.Key, kvp.Value.Values.Sum());
 
        return Task.FromResult<IReadOnlyCollection<User>>(listeners.ToList());
    }
 
    public Task<IReadOnlyCollection<Scrobble>> GetTopScrobblesAsync(
        string userName)
    {
        var scrobbles = users
            .GetOrAdd(userName, new ConcurrentDictionary<int, int>())
            .Select(kvp => new Scrobble(songs[kvp.Key], kvp.Value));
 
        return Task.FromResult<IReadOnlyCollection<Scrobble>>(scrobbles.ToList());
    }
 
    public void Scrobble(string userName, Song song, int scrobbleCount)
    {
        users.AddOrUpdate(
            userName,
            new ConcurrentDictionary<int, int>(
                new[] { KeyValuePair.Create(song.Id, scrobbleCount) }),
            (_, scrobbles) => AddScrobbles(scrobbles, song, scrobbleCount));
 
        songs.AddOrUpdate(song.Id, song, (_, _) => song);
    }
 
    private static ConcurrentDictionary<int, int> AddScrobbles(
        ConcurrentDictionary<int, int> scrobbles,
        Song song,
        int scrobbleCount)
    {
        scrobbles.AddOrUpdate(
            song.Id,
            scrobbleCount,
            (_, oldCount) => oldCount + scrobbleCount);
        return scrobbles;
    }
}

If you're wondering about the use of concurrent dictionaries, I use them because it made it easier to write the implementation, and not because I need the implementation to be thread-safe. In fact, I'm fairly sure that it's not thread-safe. That's not an issue. The tests aren't going to use shared mutable state.

The GetTopListenersAsync and GetTopScrobblesAsync methods implement the interface, and the Scrobble method (here, scrobble is a verb: to scrobble) is a back door that enables tests to populate the FakeSongService.

Icebreaker Test #

While the 'production code' is in C#, I decided to write the tests in F# for two reasons.

The first reason was that I wanted to be able to present the various FP designs in both C# and F#. Writing the tests in F# would make it easier to replace the C# code base with an F# alternative.

The second reason was that I wanted to leverage a property-based testing framework's ability to produce many randomly-generated test cases. I considered this important to build confidence that my tests weren't just a few specific examples that wouldn't catch errors when I made changes. Since the RecommendationsProvider API is asynchronous, the only .NET property-based framework I knew of that can run Task-valued properties is FsCheck. While it's possible to use FsCheck from C#, the F# API is more powerful.

In order to get started, however, I first wrote an Icebreaker Test without FsCheck:

[<Fact>]
let ``No data`` () = task {
    let srvc = FakeSongService ()
    let sut = RecommendationsProvider srvc
    let! actual = sut.GetRecommendationsAsync "foo"
    Assert.Empty actual }

This is both a trivial case and an edge case, but clearly, if there's no data in the SongService, the RecommendationsProvider can't recommend any songs.

As I usually do with Characterisation Tests, I temporarily sabotage the System Under Test so that the test fails. This is to ensure that I didn't write a tautological assertion. Once I've seen the test fail for the appropriate reason, I undo the sabotage and check in the code.

Refactor to property #

In the above No data test, the specific input value "foo" is irrelevant. It might as well have been any other string, so why not make it a property?

In this particular case, the userName could be any string, but it might be appropriate to write a custom generator that produces 'realistic' user names. Just to make things simple, I'm going to assume that user names are between one and twenty characters and assembled from alphanumeric characters, and that the fist character must be a letter:

module Gen =
    let alphaNumeric = Gen.elements (['a'..'z'] @ ['A'..'Z'] @ ['0'..'9'])
 
    let userName = gen {
        let! length = Gen.choose (1, 19)
        let! firstLetter = Gen.elements <| ['a'..'z'] @ ['A'..'Z']
        let! rest = alphaNumeric |> Gen.listOfLength length
        return firstLetter :: rest |> List.toArray |> String }

Strictly speaking, as long as user names are distinct, the code ought to work, so this generator may be more conservative than necessary. Why am I constraining the generator? For two reasons: First, when FsCheck finds a counter-example, it displays the values that caused the property to fail. A twenty-character alphanumeric string is easier to relate to than some arbitrary string with line breaks and unprintable characters. The second reason is that I'm later going to measure memory load for some of the alternatives, and I wanted data to have realistic size. If user names are chosen by humans, they're unlikely to be longer than twenty characters on average (I've decided).

I can now rewrite the above No data test as an FsCheck property:

[<Property>]
let ``No data`` () =
    Gen.userName |> Arb.fromGen |> Prop.forAll <| fun userName ->
    task {
        let srvc = FakeSongService ()
        let sut = RecommendationsProvider srvc
 
        let! actual = sut.GetRecommendationsAsync userName
 
        Assert.Empty actual } :> Task

You may think that this is overkill just to be able to supply random user names to the GetRecommendationsAsync method. In isolation, I'd be inclined to agree, but this edit was an occasion to get my FsCheck infrastructure in place. I can now use that to add more properties.

Full coverage #

The cyclomatic complexity of the GetRecommendationsAsync method is only 3, so it doesn't require many tests to attain full code coverage. Not that 100% code coverage should be a goal in itself, but when adding tests to an untested code base, it can be useful as an indicator of confidence. Despite its low cyclomatic complexity, the method, with all of its filtering and sorting, is actually quite involved. 100% coverage strikes me as a low bar.

The above No data test exercises one of the three branches. At most two more tests are required to attain full coverage. I'll just show the simplest of them here.

The next test case requires a new FsCheck generator, in addition to Gen.userName already shown.

let song = ArbMap.generate ArbMap.defaults |> Gen.map Song

As a fairly simple one-liner, this seems close to the Fairbairn threshold, but I think that giving this generator a name makes the test easier to read.

[<Property>]
let ``One user, some songs`` () =
    gen {
        let! user = Gen.userName
        let! songs = Gen.arrayOf Gen.song
        let! scrobbleCounts =
            Gen.choose (1, 100) |> Gen.arrayOfLength songs.Length
        return (user, Array.zip songs scrobbleCounts) }
    |> Arb.fromGen |> Prop.forAll <| fun (user, scrobbles) ->
    task {
        let srvc = FakeSongService ()
        scrobbles |> Array.iter (fun (s, c) -> srvc.Scrobble (user, s, c))
        let sut = RecommendationsProvider srvc
 
        let! actual = sut.GetRecommendationsAsync user
 
        Assert.Empty actual } :> Task

This test creates scrobbles for a single user and adds them to the Fake data store. It uses TIE-fighter syntax to connect the generators to the test body.

Since all the scrobble counts are generated between 1 and 100, none of them are greater than or equal to 10,000 and thus the test expects no recommendations.

You may think that I'm cheating - after all, why didn't I choose another range for the scrobble count? To be honest, I was still in an exploratory phase, trying to figure out how to express the tests, and as a first step, I was aiming for full code coverage. Even though this test's assertion is weak, it does exercise another branch of the GetRecommendationsAsync method.

I had to write only one more test to fully cover the System Under Test. That method is more complicated, so I'll spare you the details. If you're interested, you may consider consulting the example source code repository.

Mutation testing #

While I don't think that code coverage is useful as a target measure, it can be illuminating as a tool. In this case, knowing that I've now attained full coverage tells me that I need to resort to other techniques if I want another goal to aim for.

I chose mutation testing as my new technique. The GetRecommendationsAsync method makes heavy use of LINQ methods such as OrderByDescending, Take, and Where. Stryker for .NET knows about LINQ, so among all the automated sabotage is does, it tries to see what happens if it removes e.g. Where or Take.

Although I find the Stryker jargon childish, I set myself the goal to see if I could 'kill mutants' to a degree that I'd at least get a green rating.

I found that I could edge closer to that goal by a combination of appending assertions (thus strengthening postconditions) and adding tests. While I sometimes find it easier to define properties than examples, at other times, it's the other way around. In this case, I found it easier to add single examples, like this one:

[<Fact>]
let ``One verified recommendation`` () = task {
    let srvc = FakeSongService ()
    srvc.Scrobble ("cat", Song (1, false, 6uy),     10)
    srvc.Scrobble ("ana", Song (1, false, 5uy),     10)
    srvc.Scrobble ("ana", Song (2,  true, 5uy), 9_9990)
    let sut = RecommendationsProvider srvc
 
    let! actual = sut.GetRecommendationsAsync "cat"
 
    Assert.Equal<Song> ([ Song (2, true, 5uy) ], actual) } :> Task

It adds three scrobbles to the data store, but only one of them is verified (which is what the true value indicates), so this is the only recommendation the test expects to see.

Notice that although song number 2 'only' has 9,9990 plays, the user ana has exactly 10,000 plays in all, so barely makes the cut. By carefully adding five examples like this one, I was able to 'kill all mutants'.

In all, I have eight tests; three FsCheck properties and five normal xUnit.net facts.

All tests work exclusively by supplying direct and indirect input to the System Under Test (SUT), and verify the return value of GetRecommendationsAsync. No Mocks or Stubs have opinions about how the SUT interacts with the injected SongService. This gives me confidence that the tests constitute a trustworthy regression test suite, and that they're still sufficiently decoupled from implementation details to enable me to completely rewrite the SUT.

Quirks #

When you add tests to an existing code base, you may discover edge cases that the original programmer overlooked. The GetRecommendationsAsync method is only a code example, so I don't blame Oleksii Holub for some casual coding, but it turns out that the code has some quirks.

For example, there's no deduplication, so I had to apologise in my test code:

[<Fact>]
let ``Only top-rated songs`` () = task {
    let srvc = FakeSongService ()
    // Scale ratings to keep them less than or equal to 10.
    [1..20] |> List.iter (fun i ->
        srvc.Scrobble ("hyle", Song (i, true, byte i / 2uy), 500))
    let sut = RecommendationsProvider srvc
 
    let! actual = sut.GetRecommendationsAsync "hyle"
 
    Assert.NotEmpty actual
    // Since there's only one user, but with 20 songs, the implementation loops
    // over the same songs 20 times, so 400 songs in total (with duplicates).
    // Ordering on rating, only the top-rated 200 remains, that is, those rated
    // 5-10. Note that this is a Characterization Test, so not necessarily
    // reflective of how a real recommendation system should work.
    Assert.All (actual, fun s -> Assert.True (5uy <= s.Rating)) } :> Task

This test creates twenty scrobbles for one user: One with a zero rating, two with rating 1, two with rating 2, and so on, up to a single song with rating 10.

You might protest that this is because my FakeSongService implementation is too unsophisticated. Obviously, it should not return the 'original' user's songs! Do, however, consider the implied signature of the GetTopListenersAsync method:

Task<IReadOnlyCollection<User>> GetTopListenersAsync(int songId);

The method only accepts a songId as input, and if we assume that the service is stateless, it doesn't know who the 'original' user is.

Should I fix the quirks? In a real system, it might be appropriate, but in this context I find it better to keep the them. Real systems often have quirks in the shape of legacy business rules and the like, so I only find it realistic that the system may exhibit some weird behaviour. The goal of this set of articles isn't to refactor this particular system, but rather to showcase alternative designs for a system sufficiently complicated to warrant refactorings. Simplifying the code might defeat that purpose.

As shown, I have an automated test that requires me to keep that behaviour. I think I'm in a good position to make sweeping changes to the code.

Conclusion #

As Martin Fowler writes, an essential precondition for refactoring is a trustworthy test suite. On a daily basis, millions of developers prove him wrong by deploying untested changes to production. There are other ways to make changes, including manual testing, A/B testing, testing in production, etc. Some of them may even work in some contexts.

In contrast to such real-world concerns, I don't have a production system with real users. Neither do I have a product owner or a department of manual testers. The best I can do is to add enough Characterisation Tests that I feel confident that I've described the behaviour, rather than the implementation, in enough detail to hold it in place. A Software Vise, as Michael Feathers calls it in Working Effectively with Legacy Code.

Most systems in 'the real world' have too few automated tests. Adding tests to a legacy code base is a difficult discipline, so I found it worthwhile to document this work before embarking on the actual design changes promised by this article series. Now that this is out of the way, we can proceed.

The next two articles do more groundwork to establish equivalent code bases in F# and Haskell. If you only care about the C# examples, you can go back to the first article in this article series and use the table of contents to jump to the next C# example.

Next: Porting song recommendations to F#.

This blog is totally free, but if you like it, please consider supporting it.

Alternative ways to design with functional programming

2025-04-07T18:27:00+00:00

A catalogue of FP solutions to a sample problem.

If you're past the novice stage of learning programming, you will have noticed that there's usually more than one way to solve a particular problem. Sometimes one way is better than alternatives, but often, there's no single clearly superior option.

It's a cliche that you should use the right tool for the job. For programmers, our most important tools are the techniques, patterns, algorithms, and data structures we know, rather than the IDEs we use. You can only choose the right tool for a job if you have more than one to choose from. Again, for programmers this implies knowing of more than one way to solve a problem. This is the reason I find doing katas so valuable.

Instead of a kata, in the series that this article commences I'll take a look at an example problem and in turn propose multiple alternative solutions. The problem I'll tackle is bigger than a typical kata, but you can think of this article series as a spirit companion to Thirteen ways of looking at a turtle by Scott Wlaschin.

Recommendations #

The problem I'll tackle was described by Oleksii Holub in 2020, and I've been considering how to engage with it ever since.

Oleksii Holub presents it as the second of two examples in an article named Pure-Impure Segregation Principle. In a nutshell, the problem is to identify song recommendations for a user, sourced from a vast repository of scrobbles.

The first code example in the article is fine as well, but it's not as rich a source of problems, so I don't plan to talk about it in this context.

Oleksii Holub's article does mention my article Dependency rejection as well as the Impureim Sandwich pattern.

It's my experience that the Impureim Sandwich is surprisingly often applicable, despite its seemingly obvious limitations. More than once, people have responded that it doesn't work in general.

I've never claimed that the Impureim Sandwich is a general-purpose solution to everything, only that it surprisingly often fits, once you massage the problem a bit:

I have, however, solicited examples that challenge the pattern, and occasionally readers supply examples, for which I'm thankful. I'm trying to get a sense for just how widely applicable the Impureim Sandwich pattern is, and finding its limitations is an important part of that work.

The song recommendations problem is the most elusive one I've seen so far, so I'm grateful to Oleksii Holub for the example.

In the articles in this series, I'll present various alternative solutions to that problem. To make things as clear as I can, I don't do this because I think that the code shown in the original article is bad. Quite the contrary, I'd probably write it like that myself.

I offer the alternatives to teach. Only by knowing of more than one way of solving the problem can you pick the right tool for the job. It may turn out that the right design is the one already suggested by Oleksii Holub, but if you change circumstances, perhaps another design is better. Ultimately, I hope that the reader can extrapolate from this problem to other problems that he or she may encounter.

The way much online discourse is conducted today, I wish that I didn't have to emphasise the following: Someone may choose to read Oleksii Holub's article as a rebuttal of my ideas about functional architecture and the Impureim Sandwich. I don't read it that way, but rather as honest pursuit of intellectual inquiry. I engage with it in the same spirit, grateful for the rich example that it offers.

Code examples #

I'll show code in the languages with which I'm most comfortable: C#, F#, and Haskell. I'll attempt to write the C# code in such a way that programmers of Java, TypeScript, or similar languages can also read along. On the other hand, I'm not going to explain F# or Haskell, but I'll write the articles so that you can skip the F# or Haskell articles and still learn from the C# articles.

While I don't expect the majority of my readers to know Haskell, I find it an indispensable tool when evaluating whether or not a design is functional. F# is a good didactic bridge between C# and Haskell.

The code is available upon request against a small support donation of 10 USD (or more). If you're one of my regular supporters, you have my gratitude and can get the code without further donation. Also, on his blog, Oleksii Holub asks you to support Ukraine against the aggressor. If you can document that you've donated at least 10 USD to one of the charities listed there, on or after the publication of this article, I'll be happy to send you the code as well. In both cases, please write to me.

I've used Git branches for the various alternatives. In each article, I'll do my best to remember to write which branch corresponds to the article.

Articles #

This article series will present multiple alternatives in more than one programming language. I find it most natural to group the articles according to design first, and language second.

While you can view this list as a catalogue of functional programming designs, I'm under no illusion that it's complete.

Characterising song recommendations
Porting song recommendations to F#
Porting song recommendations to Haskell
Song recommendations as an Impureim Sandwich
Song recommendations from combinators
Song recommendations with pipes and filters
- Song recommendations with Reactive Extensions for .NET
- Song recommendations with F# agents
Song recommendations with free monads
- Song recommendations with Haskell free monads
- Song recommendations with F# free monads
- Song recommendations with C# free monads

Some of the design alternatives will require detours to other interesting topics. While I'll do my best to write to enable you to skip the F# and Haskell content, few articles on this blog are self-contained. I do expect the reader to follow links when necessary, but if I've failed to explain anything to your satisfaction, please leave a comment.

Conclusion #

This article series examines multiple alternative designs to the song recommendations example presented by Oleksii Holub. The original example has interleaved impurities and is therefore not really functional, even though it looks 'functional' on the surface, due to its heavy use of filters and projections.

That example may leave some readers with the impression that there are some problems that, due to size or other 'real-world' constraints, are impossible to solve with functional programming. The present catalogue of design alternatives is my attempt to disabuse readers of that notion.

Next: Characterising song recommendations.

This blog is totally free, but if you like it, please consider supporting it.

Ports and fat adapters

2025-04-01T13:16:00+00:00

Is it worth it having a separate use-case layer?

When I occasionally post something about Ports and Adapters (also known as hexagonal architecture), a few reactions seem to indicate that I'm 'doing it wrong'. I apologize for the use of weasel words, but I don't intend to put particular people on the spot. Everyone has been nice and polite about it, and it's possible that I've misunderstood the criticism. Even so, a few comments have left me with the impression that there's an elephant in the room that I should address.

In short, I usually don't abstract application behaviour from frameworks. I don't create 'application layers', 'use-case classes', 'mediators', or similar. This is a deliberate architecture decision.

In this article, I'll use a motivating example to describe the reasoning behind such a decision.

Humble Objects #

A software architect should consider how the choice of particular technologies impact the development and future sustainability of a solution. It's often a good idea to consider whether it makes sense to decouple application code from frameworks and particular external dependencies. For example, should you hide database access behind an abstraction? Should you decouple the Domain Model from the web framework you use?

This isn't always the right decision, but in the following, I'll assume that this is the case.

When you apply the Dependency Inversion Principle (DIP) you let the application code define the abstractions it needs. If it needs to persist data, it may define a Repository interface. If it needs to send notifications, it may define a 'notification gateway' abstraction. Actual code that, say, communicates with a relational database is an Adapter. It translates the application interface into database SDK code.

I've been over this ground already, but to take an example from the sample code that accompanies Code That Fits in Your Head, here's a single method from the SqlReservationsRepository Adapter:

public async Task Create(int restaurantId, Reservation reservation)
{
    if (reservation is null)
        throw new ArgumentNullException(nameof(reservation));
 
    using var conn = new SqlConnection(ConnectionString);
    using var cmd = new SqlCommand(createReservationSql, conn);
    cmd.Parameters.AddWithValue("@Id", reservation.Id);
    cmd.Parameters.AddWithValue("@RestaurantId", restaurantId);
    cmd.Parameters.AddWithValue("@At", reservation.At);
    cmd.Parameters.AddWithValue("@Name", reservation.Name.ToString());
    cmd.Parameters.AddWithValue("@Email", reservation.Email.ToString());
    cmd.Parameters.AddWithValue("@Quantity", reservation.Quantity);
 
    await conn.OpenAsync().ConfigureAwait(false);
    await cmd.ExecuteNonQueryAsync().ConfigureAwait(false);
}

This is one method of a class named SqlReservationsRepository, which is an Adapter that makes ADO.NET code look like the application-specific IReservationsRepository interface.

Such Adapters are often as 'thin' as possible. One dimension of measurement is to look at the cyclomatic complexity, where the ideal is 1, the lowest possible score. The code shown here has a complexity measure of 2 because of the null guard, which exists because of a static analysis rule.

In test parlance, we call such thin Adapters Humble Objects. Or, to paraphrase what Kris Jenkins said at the GOTO Copenhagen 2024 conference, separate code into parts that are

hard to test, but easy to get right
hard to get right, but easy to test.

You can do the same when sending email, querying a weather service, raising events on pub-sub infrastructure, getting the current date, etc. This isolates your application code from implementation details, such as particular database servers, SDKs, network protocols, and so on.

Shouldn't you be doing the same on the receiving end?

Fat Adapters #

In his article on Hexagonal Architecture Alistair Cockburn acknowledges a certain asymmetry. Some ports are activated by the application. Those are the ones already examined. An application reads from and writes to a database. An application sends emails. An application gets the current date.

Other ports, on the other hand, drive the application. According to Tomas Petricek's distinction between frameworks and libraries (that I also use), this kind of behaviour characterizes a framework. Examples include web frameworks such as ASP.NET, Express.js, Django, or UI frameworks like Angular, WPF, and so on.

While I usually do shield my Domain Model from framework code, I tend to write 'fat' Adapters. As far as I can tell, this is what some people have taken issue with.

Here's an example:

[HttpPost("restaurants/{restaurantId}/reservations")]
public async Task<ActionResult> Post(int restaurantId, ReservationDto dto)
{
    if (dto is null)
        throw new ArgumentNullException(nameof(dto));
 
    Reservation? candidate1 = dto.TryParse();
    dto.Id = Guid.NewGuid().ToString("N");
    Reservation? candidate2 = dto.TryParse();
    Reservation? reservation = candidate1 ?? candidate2;
    if (reservation is null)
        return new BadRequestResult(/* Describe the errors here */);
 
    var restaurant = await RestaurantDatabase.GetRestaurant(restaurantId)
        .ConfigureAwait(false);
    if (restaurant is null)
        return new NotFoundResult();
 
    using var scope = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
 
    var reservations = await Repository
        .ReadReservations(restaurant.Id, reservation.At)
        .ConfigureAwait(false);
    var now = Clock.GetCurrentDateTime();
    if (!restaurant.MaitreD.WillAccept(now, reservations, reservation))
        return NoTables500InternalServerError();
 
    await Repository.Create(restaurant.Id, reservation).ConfigureAwait(false);
 
    scope.Complete();
 
    return Reservation201Created(restaurant.Id, reservation);
}

This is (still) code originating from the example code base that accompanies Code That Fits in Your Head, although I've here used the variation from Coalescing DTOs. I've also inlined the TryCreate helper method, so that the entire use-case flow is visible as a single method.

In a sense, we may consider this an Adapter, too. This Post action method is part of a Controller class that handles incoming HTTP requests. It is, however, not that class that deals with the HTTP protocol. Neither does it parse the request body, or checks headers, etc. The ASP.NET framework takes care of that.

By following certain naming conventions, adorning the method with an [HttpPost] attribute, and returning ActionResult, this method plays by the rules of the ASP.NET framework. Even if it doesn't implement any particular interface, or inherits from some ASP.NET base class, it clearly 'adapts' to the ASP.NET framework.

It does that by attempting to parse and validate input, look up data in data sources, and in general checking preconditions before delegating work to the Domain Model - which happens in the call to MaitreD.WillAccept.

This is where some people seem to get uncomfortable. If this is an Adapter, it's a 'fat' one. In this particular example, the cyclomatic complexity is 6. Not really a Humble Object.

Shouldn't there be some kind of 'use-case' model?

Use-case Model API #

I deliberately avoid 'use-case' model, 'mediators', or whatever other name people tend to use. I'll try to explain why by going through the exercise of actually extracting such a model. My point, in short, is that I find it not worth the effort.

In the following, I'll call such a model a 'use-case', since this is one piece of terminology that I tend to run into. You may also call it an 'Application Model' or something else.

The 'problem' that I apparently am meant to solve is that most of the code in the above Post method is tightly coupled to ASP.NET. If we want to decouple this code, how do we go about it?

It's possible that my imagination fails me, but the best I can think of is some kind of 'use-case model' that models the 'make a reservation' use case. Perhaps we should name it MakeReservationUseCase. Should it be some kind of Command object? It could be, but I think that this is awkward, because it also needs to communicate with various dependencies, such as RestaurantDatabase, Repository, and Clock. A long-lived service object that can wrap around those dependencies seems like a better option, but then we need a method on that object.

public sealed class MakeReservationUseCase
{
    // What to call this method? What to return? I hate this already.
    public object MakeReservation(/* What to receive here? */)
    {
        throw new NotImplementedException();
    }
}

What do we call such a method? Had this been a true Command object, the single parameterless method would be called Execute, but since I'm planning to work with a stateless service, the method should take arguments. I played with options such as Try, Do, or Go, so that you'd have MakeReservationUseCase.Try and so on. Still, I thought this bordered on 'cute' or 'clever' code, and at the very least not particularly idiomatic C#. So I settled for MakeReservation, but now we have MakeReservationUseCase.MakeReservation, which is clearly redundant. I don't like the direction this design is going.

The next question is what parameters this method should take?

Considering the above Post method, couldn't we pass the dto object on to the use-case model? Technically, we could, but consider this: The ReservationDto object's raison d'être is to support reception and transmission of JSON objects. As I've already covered in an earlier article series, serialization formats are inexplicably coupled to the boundaries of a system.

Imagine that we wanted to decompose the code base into smaller constituent projects. If so, the use-case model should be independent of the ASP.NET-based code. Does it seem reasonable, then, to define the use-case API in terms of a serialization format?

I don't think that's the right design. Perhaps, instead, we should 'explode' the Data Transfer Object (DTO) into its primitive constituents?

public object MakeReservation(
    int restaurantId,
    string? id,
    string? at,
    string? email,
    string? name,
    int quantity)

I'm not too happy about this, either. Six parameters is pushing it, and this is even only an example. What if you need to pass more data than that? What if you need to pass a collection? What if each element in that collection contains another collection?

Introduce Parameter Object, you say?

Given that this is the way we want to go (in this demonstration), this seems as though it's the only good option, but that means that we'd have to define another reservation object. Not the (JSON) DTO that arrives at the boundary. Not the Reservation Domain Model, because the data has yet to be validated. A third reservation class. I don't even know what to call such a class...

So I'll leave those six parameters as above, while pointing out that no matter what we do, there seems to be a problem.

Return type woes #

What should the MakeReservation method return?

The code in the above Post method returns various ActionResult objects that indicate success or various failures. This isn't an option if we want to decouple MakeReservationUseCase from ASP.NET. How may we instead communicate one of four results?

Many object-oriented programmers might suggest throwing custom exceptions, and that's a real option. If nothing else, it'd be idiomatic in a language like C#. This would enable us to declare the return type as Reservation, but we would also have to define three custom exception types.

There are some advantages to such a design, but it effectively boils down to using exceptions for flow control.

Is there a way to model heterogeneous, mutually exclusive values? Another object-oriented stable is to introduce a type hierarchy. You could have four different classes that implement the same interface, or inherit from the same base class. If we go in this direction, then what behaviour should we define for this type? What do all four objects have in common? The only thing that they have in common is that we need to convert them to ActionResult.

We can't, however, have a method like ToActionResult() that converts the object to ActionResult, because that would couple the API to ASP.NET.

You could, of course, use downcasts to check the type of the return value, but if you do that, you might as well leave the method as shown above. If you plan on dynamic type checks and casts, the only base class you need is object.

Visitor #

If only there was a way to return heterogeneous, mutually exclusive data structures. If only C# had sum types...

Fortunately, while C# doesn't have sum types, it is possible to achieve the same goal. Use a Visitor as a sum type.

You could start with a type like this:

public sealed class MakeReservationResult
{
    public T Accept<T>(IMakeReservationVisitor<T> visitor)
    {
        // Implementation to follow...
    }
}

As usual with the Visitor design pattern, you'll have to inspect the Visitor interface to learn about the alternatives that it supports:

public interface IMakeReservationVisitor<T>
{
    T Success(Reservation reservation);
    T InvalidInput(string message);
    T NoSuchRestaurant();
    T NoTablesAvailable();
}

This enables us to communicate that there's exactly four possible outcomes in a way that doesn't depend on ASP.NET.

The 'only' remaining work on the MakeReservationResult class is to implement the Accept method. Are you ready? Okay, here we go:

public sealed class MakeReservationResult
{
    private readonly IMakeReservationResult imp;
 
    private MakeReservationResult(IMakeReservationResult imp)
    {
        this.imp = imp;
    }
 
    public static MakeReservationResult Success(Reservation reservation)
    {
        return new MakeReservationResult(new SuccessResult(reservation));
    }
 
    public static MakeReservationResult InvalidInput(string message)
    {
        return new MakeReservationResult(new InvalidInputResult(message));
    }
 
    public static MakeReservationResult NoSuchRestaurant()
    {
        return new MakeReservationResult(new NoSuchRestaurantResult());
    }
 
    public static MakeReservationResult NoTablesAvailable()
    {
        return new MakeReservationResult(new NoTablesAvailableResult());
    }
 
    public T Accept<T>(IMakeReservationVisitor<T> visitor)
    {
        return this.imp.Accept(visitor);
    }
 
    private interface IMakeReservationResult
    {
        T Accept<T>(IMakeReservationVisitor<T> visitor);
    }
 
    private sealed class SuccessResult : IMakeReservationResult
    {
        private readonly Reservation reservation;
 
        public SuccessResult(Reservation reservation)
        {
            this.reservation = reservation;
        }
        
        public T Accept<T>(IMakeReservationVisitor<T> visitor)
        {
            return visitor.Success(reservation);
        }
    }
 
    private sealed class InvalidInputResult : IMakeReservationResult
    {
        private readonly string message;
 
        public InvalidInputResult(string message)
        {
            this.message = message;
        }
 
        public T Accept<T>(IMakeReservationVisitor<T> visitor)
        {
            return visitor.InvalidInput(message);
        }
    }
 
    private sealed class NoSuchRestaurantResult : IMakeReservationResult
    {
        public T Accept<T>(IMakeReservationVisitor<T> visitor)
        {
            return visitor.NoSuchRestaurant();
        }
    }
 
    private sealed class NoTablesAvailableResult : IMakeReservationResult
    {
        public T Accept<T>(IMakeReservationVisitor<T> visitor)
        {
            return visitor.NoTablesAvailable();
        }
    }
}

That's a lot of boilerplate code, but it's so automatable that there are programming languages that can do this for you. On .NET, it's called F#, and all of that would be a single line of code.

Use Case implementation #

Implementing MakeReservation is now easy, since it mostly involves moving code from the Controller to the MakeReservationUseCase class, and changing it so that it returns the appropriate MakeReservationResult objects instead of ActionResult objects.

public sealed class MakeReservationUseCase
{
    public MakeReservationUseCase(
        IClock clock,
        IRestaurantDatabase restaurantDatabase,
        IReservationsRepository repository)
    {
        Clock = clock;
        RestaurantDatabase = restaurantDatabase;
        Repository = repository;
    }
 
    public IClock Clock { get; }
    public IRestaurantDatabase RestaurantDatabase { get; }
    public IReservationsRepository Repository { get; }
 
    public async Task<MakeReservationResult> MakeReservation(
        int restaurantId,
        string? id,
        string? at,
        string? email,
        string? name,
        int quantity)
    {
        if (!Guid.TryParse(id, out var rid))
            rid = Guid.NewGuid();
        if (!DateTime.TryParse(at, out var rat))
            return MakeReservationResult.InvalidInput("Invalid date.");
        if (email is null)
            return MakeReservationResult.InvalidInput("Invalid email.");
        if (quantity < 1)
            return MakeReservationResult.InvalidInput("Invalid quantity.");
        var reservation = new Reservation(
            rid,
            rat,
            new Email(email),
            new Name(name ?? ""),
            quantity);
 
        var restaurant = await RestaurantDatabase.GetRestaurant(restaurantId).ConfigureAwait(false);
        if (restaurant is null)
            return MakeReservationResult.NoSuchRestaurant();
 
        using var scope = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
 
        var reservations = await Repository.ReadReservations(restaurant.Id, reservation.At)
            .ConfigureAwait(false);
        var now = Clock.GetCurrentDateTime();
        if (!restaurant.MaitreD.WillAccept(now, reservations, reservation))
            return MakeReservationResult.NoTablesAvailable();
 
        await Repository.Create(restaurant.Id, reservation).ConfigureAwait(false);
 
        scope.Complete();
 
        return MakeReservationResult.Success(reservation);
    }
}

I had to re-implement input validation, because the TryParse method is defined on ReservationDto, and the Use-case Model shouldn't be coupled to that class. Still, you could argue that if I'd immediately implemented the use-case architecture, I would never had had the parser defined on the DTO.

Decoupled Controller #

The Controller method may now delegate implementation to a MakeReservationUseCase object:

[HttpPost("restaurants/{restaurantId}/reservations")]
public async Task<ActionResult> Post(int restaurantId, ReservationDto dto)
{
    if (dto is null)
        throw new ArgumentNullException(nameof(dto));
 
    var result = await makeReservationUseCase.MakeReservation(
        restaurantId,
        dto.Id,
        dto.At,
        dto.Email,
        dto.Name,
        dto.Quantity).ConfigureAwait(false);
    return result.Accept(new PostReservationVisitor(restaurantId));
}

While that looks nice and slim, it's not all, because you also need to define the PostReservationVisitor class:

private class PostReservationVisitor : IMakeReservationVisitor<ActionResult>
{
    private readonly int restaurantId;
 
    public PostReservationVisitor(int restaurantId)
    {
        this.restaurantId = restaurantId;
    }
 
    public ActionResult Success(Reservation reservation)
    {
        return Reservation201Created(restaurantId, reservation);
    }
 
    public ActionResult InvalidInput(string message)
    {
        return new BadRequestObjectResult(message);
    }
 
    public ActionResult NoSuchRestaurant()
    {
        return new NotFoundResult();
    }
 
    public ActionResult NoTablesAvailable()
    {
        return NoTables500InternalServerError();
    }
}

Notice that this implementation has to receive the restaurantId value through its constructor, since this piece of data isn't part of the IMakeReservationVisitor API. If only we could have handled all that pattern matching with a simple closure...

Well, you can. You could have used Church encoding instead of the Visitor pattern, but many programmers find that less idiomatic, or not sufficiently object-oriented.

Be that as it may, the Controller now plays the role of an Adapter between the ASP.NET framework and the framework-neutral Use-case Model. Is it all worth it?

Reflection #

Where does that put us? It's certainly possible to decouple the Use-case Model from the specific framework, but at what cost?

In this example, I had to introduce two new classes and one interface, as well as four private implementation classes and a private interface.

And that was to support just one use case. If I want to implement a query (HTTP GET), I would need to go through similar motions, but with slight variations. And again for updates (HTTP PUT). And again for deletes. And again for the next resource, such as the restaurant calendar, daily schedule, management of tenants, and so on.

The cost seems rather substantial to me. Do the benefits outweigh them? What are the benefits?

Well, you now have a technology-neutral application model. You could, conceivably, tear out ASP.NET and replace it with, oh... ServiceStack. Perhaps. Theoretically. I haven't tried.

This strikes me as an argument similar to insisting that hiding database access behind an interface enables us to replace SQL Server with a document database. That rarely happens, and is not really why we do that.

So to be fair, decoupling also protects us from changes in libraries and frameworks. It makes it easier to modify one part of the system without having to worry (too much) about other parts. It makes it easier to subject subsystems to automated testing.

Does the above refactoring improve testability? Not really. MakeReservationUseCase may be testable, but so was the original Controller. The entire code base for Code That Fits in Your Head was developed with test-driven development (TDD). The Controllers are mostly covered by self-hosted integration tests, and bolstered with a few unit tests that directly call their action methods.

Another argument for a decoupled Use-case Model is that it might enable you to transplant the entire application to a new context. Since it doesn't depend on ASP.NET, you could reuse it in an Android or iPhone app. Or a batch job. Or an AI-assisted chat bot. Right? Couldn't you?

I'd be surprised if that were the case. Every application type has its own style of user interaction, and they tend to be incompatible. The user-interface flow of a web application is fundamentally different from a rich native app.

In short, I consider the notion of a technology-neutral Use-case Model to be a distraction. That's why I usually don't bother.

Conclusion #

I usually implement Controllers, message handlers, application entry points, and so on as fairly 'fat Adapters' over particular frameworks. I do that because I expect the particular application or user interaction to be intimately tied to that kind of application. This doesn't mean that I just throw everything in there.

Fat Adapters should still be covered by automated tests. They should still show appropriate decoupling. Usually, I treat each as an Impureim Sandwich. All impure actions happen in the Fat Adapter, and everything else is done by a pure function. Granted, however, this kind of architecture comes much more natural when you are working in a programming language that supports it.

C# doesn't really, but you can make it work. And that work, contrary to modelling use cases as classes, is, in my experience, well worth the effort.

Comments

Thomas Skovsende #

I do not really strongly disagree, but what would you do in the case where you had multiple entry points for your use cases? Ie. I have a http endpoint that creates a reservation, but I also need to be able to listen to a messagebus where reservations can come in. A lot of the logic is the same in both cases.

2025-04-09 09:17 UTC

Mark Seemann #

Thomas, thank you for writing. I'll give you two answers, because I believe that the specific answer doesn't generalize. On the other hand, the specific answer also has merit.

In the particular case, it seems as though an 'obvious' architecture would be to have the HTTP endpoint do minimal work required to parse the incoming data, and then put it on the message bus together with all the other messages. I could imagine situations where that would be appropriate, but I can also image some edge(?) cases where that still couldn't work.

As a general answer, however, having some common function or object that handles both cases could make sense. That's pretty much the architecture I spend this article discouraging. That said, as I tried to qualify, I usually don't run into situations where such an architecture is warranted. Case in point, I haven't run into a scenario like the one you describe. Other people, however, also wrote to tell me that they have two endpoints, such as both gRPC and HTTP, or both SOAP and REST, so while I, personally, haven't seen this much, it clearly does happen.

In short, I don't mind that kind of architecture when it addresses an actual problem. Often, though, that kind of requirement isn't present, and in that case, this kind of 'use-case model' architecture shouldn't be the default.

The reason I also gave you the specific answer is that I often get the impression that people seek general, universal solutions, and this could make them miss elegant shortcuts.

2025-04-13 15:17 UTC

This blog is totally free, but if you like it, please consider supporting it.

Phased breaking changes

2025-03-17T14:02:00+00:00

Giving advance warning before breaking client code.

I was recently listening to Jimmy Bogard on .NET Rocks! talking about 14 versions of Automapper. It made me reminisce on how I dealt with versioning of AutFixture, in the approximately ten years I helmed that project.

Jimmy has done open source longer than I have, and it sounds as though he's found a way that works for him. When I led AutoFixture, I did things a bit differently, which I'll outline in this article. In no way do I mean to imply that that way was better than Jimmy's. It may, however, strike a chord with a reader or two, so I present it in the hope that some readers may find the following ideas useful.

Scope #

This article is about versioning a code base. Typically, a code base contains 'modules' of a kind, and client code that relies on those modules. In object-oriented programming, modules are often called classes, but in general, what matters in this context is that some kind of API exists.

The distinction between API and client code is most clear if you're maintaining a reusable library, and you don't know the client developers, but even internal application code has APIs and client code. The following may still be relevant if you're working in a code base together with colleagues.

This article discusses code-level APIs. Examples include C# code that other .NET code can call, but may also apply to Java objects callable from Clojure, Haskell code callable by other Haskell code, etc. It does not discuss versioning of REST APIs or other kinds of online services. I have, in the past, discussed versioning in such a context, and refer you, among other articles, to REST implies Content Negotiation and Retiring old service versions.

Additionally, some of the techniques outlined here are specific to .NET, or even C#. If, as I suspect, JavaScript or other languages don't have those features, then these techniques don't apply. They're hardly universal.

Semantic versioning #

The first few years of AutoFixture, I didn't use a systematic versioning scheme. That changed when I encountered Semantic Versioning: In 2011 I changed AutoFixture versioning to Semantic Versioning. This forced me to think explicitly about breaking changes.

As an aside, in recent years I've encountered the notion that Semantic Versioning is somehow defunct. This is often based on the observation that Semantic Version 2.0.0 was published in 2013. Surely, if no further development has taken place, it's been abandoned by its maintainer? This may or may not be the case. Does it matter?

The original author, Tom Preston-Werner, may have lost interest in Semantic Versioning. Or perhaps it's simply done. Regardless of the underlying reasons, I find Semantic Versioning useful as it is. The fact that it hasn't changed since 2013 may be an indication that it's stable. After all, it's not a piece of software. It's a specification that helps you think about versioning, and in my opinion, it does an excellent job of that.

As I already stated, once I started using Semantic Versioning I began to think explicitly about breaking changes.

Advance warning #

Chapter 10 in Code That Fits in Your Head is about making changes to existing code bases. Unless you're working on a solo project with no other programmers, changes you make impact other people. If you can, avoid breaking other people's code. The chapter discusses some techniques for that, but also briefly covers how to introduce breaking changes. Some of that chapter is based on my experience with AutoFixture.

If your language has a way to retire an API, use it. In Java you can use the @Deprecated annotation, and in C# the equivalent [Obsolete] attribute. In C#, any client code that uses a method with the [Obsolete] attribute will now emit a compiler warning.

By default, this will only be a warning, and there's certainly a risk that people don't look at those. On the other hand, if you follow my advice from Code That Fits in Your Head, you should treat warnings as errors. If you do, however, those warnings emitted by [Obsolete] attributes will prevent your code from compiling. Or, if you're the one who just adorned a method with that attribute, you should understand that you may just have inconvenienced someone else.

Therefore, whenever you add such an attribute, do also add a message that tells client developers how to move on from the API that you've just retired. As an example, here's an (ASP.NET) method that handles GET requests for calendar resources:

[Obsolete("Use Get method with restaurant ID.")]
[HttpGet("calendar/{year}/{month}")]
public ActionResult LegacyGet(int year, int month)

To be honest, that message may be a bit on the terse side, but the point is that there's another method on the same class that takes an additional restaurantId. While I'm clearly not perfect, and should have written a more detailed message, the point is that you should make it as easy as possible for client developers to deal with the problem that you've just given them. My rules for exception messages also apply here.

It's been more than a decade, but as I remember it, in the AutoFixture code base, I kept a list of APIs that I intended to deprecate at the next major revision. In other words, there were methods I considered fair use in a particular major version, but that I planned to phase out over multiple revisions. There were, however, a few methods that I immediately adorned with the [Obsolete] attribute, because I realized that they created problems for people.

The plan, then, was to take it up a notch when releasing a new major version. To be honest, though, I never got to execute the final steps of the plan.

Escalation #

By default, the [Obsolete] attribute emits a warning, but by supplying true as a second parameter, you can turn the warning into a compiler error.

[Obsolete("Use Get method with restaurant ID.", true)]
[HttpGet("calendar/{year}/{month}")]
public ActionResult LegacyGet(int year, int month)

You could argue that for people who treat warnings as errors, even a warning is a breaking change, but there can be no discussion that when you flip that bit, this is certainly a breaking change.

Thus, you should only escalate to this level when you publish a new major release.

Code already compiled against previous versions of your deprecated code may still work, but that's it. Code isn't going to compile against an API deprecated like that.

That's the reason it's important to give client developers ample warning.

With AutoFixture, I personally never got to that point, because I'm not sure that I arrived at this deprecation strategy until major version 3, which then had a run from early 2013 to late 2017. In other words, the library had a run of 4½ years without breaking changes. And when major version 4 rolled around, I'd left the project.

Even after setting the error flag to true, code already compiled against earlier versions may still be able to run against newer binaries. Thus, you still need to keep the deprecated API around for a while longer. Completely removing a deprecated method should only happen in yet another major version release.

Conclusion #

To summarize, deprecating an API could be considered a breaking change. If you take that position, imagine that your current Semantic Version is 2.44.2. Deprecating a method would then required that you release version 3.0.0.

In any case, you make some more changes to your code, reaching version 3.5.12. For various reasons, you decide to release version 4.0.0, in which you can also turn the error flag on. EVen so, the deprecated API remains in the library.

Only in version 5.0.0 can you entirely delete it.

Depending on how often you change major versions, this whole process may take years. I find that appropriate.

This blog is totally free, but if you like it, please consider supporting it.

Appeal to aithority

2025-03-10T14:40:00+00:00

No, it's not a typo.

A few months ago, I was listening to a semi-serious programme from the Danish public service radio. This is a weekly programme about language that I always listen to as a podcast. The host is the backbone of the show, but in addition to new guests each week, he's flanked by a regular expert who is highly qualified to answer questions about etymology, grammar, semantics, etc.

In the episode I'm describing, the expert got a question that a listener had previously emailed. To answer, (s)he started like this (and I'm paraphrasing): I don't actually know the answer to this question, so I did what everyone does these days, when they don't know the answer: I asked ChatGPT.

(S)he then proceeded to read aloud what ChatGPT had answered, and concluded with some remarks along the lines that that answer sounded quite plausible.

If I used ten to twenty hours of my time re-listening to every episode from the past few months, I could find the particular episode, link to it, transcribe the exact words, and translate them to English to the best of my abilities. I am, however, not going to do that. First, I'm not inclined to use that much time writing an essay on which I make no income. Second, my aim is not to point fingers at anyone in particular, so I'm being deliberately vague. As you may have noticed, I've even masked the person's sex. Not because I don't remember, but to avoid singling out anyone.

The expert in question is a regular of the programme, and I've heard him or her give good and knowledgeable answers to many tricky questions. As far as I could tell, this particular question was unanswerable, along the lines of why is 'table' called 'table' rather than 'griungth'?

The correct answer would have been I don't know, and I don't think anyone else does.

Being a veteran of the programme, (s)he must have realized on beforehand that this wouldn't be good radio, and instead decided to keep it light-hearted.

I get that, and I wouldn't be writing about it now if it doesn't look like an example of an increasing trend.

People are using large language models (LLMs) to advocate for their positions.

Appeal to authority #

Appeal to authority is no new technique in discourse.

"You may also, should it be necessary, not only twist your authorities, but actually falsify them, or quote something which you have invented entirely yourself. As a rule, your opponent has no books at hand, and could not use them if he had."

The Art of Being Right, Arthur Schopenhauer, 1831

This seems similar to how people have begun using so-called artificial intelligence (AI) to do their arguing for them. We may, instead, call this appeal to aithority.

Epistemological cul-de-sac #

We've all seen plenty of examples of LLMs being wrong. I'm not going to tire you with any of those here, but I did outline my experience with GitHub Copilot in 2022. While these technologies may have made some advances since then, they still make basic mistakes.

Not only that. They're also non-deterministic. Ask a system a question once, and you get one answer. Ask the same question later, and you may get a variation of the same answer, or perhaps even a contradictory answer. If someone exhibits an answer they got from an LLM as an argument in their favour, consider that they may have been asking it five or six times before they received an answer they liked.

Finally, you can easily ask leading questions. Even if someone shows you a screen shot of a chat with an LLM, they may have clipped prior instructions that nudge the system towards a particular bias.

I've seen people post screen shots that an LLM claims that F# is a better programming language than C#. While I'm sympathetic to that claim, that's not an argument. Just like how you feel about something isn't an argument.

This phenomenon seems to be a new trend. People use answers from LLMs as evidence that they are right. I consider this an epistemological dead end.

Real authority #

Regular readers of this blog may have noticed that I often go to great lengths to track down appropriate sources to cite. I do this for several reasons. One is simply out of respect for the people who figured out things before us. Another reason is to strengthen my own arguments.

It may seem that I, too, appeal to authority. Indeed, I do. When not used in in the way Schopenhauer describes, citing authority is a necessary epistemological shortcut. If someone who knows much about a particular subject has reached a conclusion based on his or her work, we may (tentatively) accept the conclusion without going through all the same work. As Carl Sagan said, "If you wish to make an apple pie from scratch, you must first invent the universe." You can't do all basic research by yourself. At some point, you'll have to take expert assertions at face value, because you don't have the time, the education, or the money to build your own Large Hadron Collider.

Don't blindly accept an argument on the only grounds that someone famous said something, but on the other hand, there's no reason to dismiss out of hand what Albert Einstein had to say about gravity, just because you've heard that you shouldn't accept an argument based on appeal to authority.

Conclusion #

I'm concerned that people increasingly seem to resort to LLMs to argue a case. The LLMs says this, so it must be right.

Sometimes, people will follow up their arguments with of course, it's just an AI, but... and then proceed to unfold their preferred argument. Even if this seems as though the person is making a 'real' argument, starting from an LLM answer establishes a baseline to a discussion. It still lends an aura of truth to something that may be false.

This blog is totally free, but if you like it, please consider supporting it.

Reactive monad

2025-03-03T09:30:00+00:00

IObservable<T> is (also) a monad.

This article is an instalment in an article series about monads. While the previous articles showed, in great detail, how to turn various classes into monads, this article mostly serves as a place-holder. The purpose is only to point out that you don't have to create all monads yourself. Sometimes, they come as part of a reusable library.

Rx define such libraries, and IObservable<T> forms a monad. Reactive Extensions for .NET define a SelectMany method for IObservable<T>, so if source is an IObservable<int>, you can translate it to IObservable<char> like this:

IObservable<char> dest = source.SelectMany(i => Observable.Repeat('x', i));

Since the SelectMany method is, indeed, called SelectMany and has the signature

public static IObservable<TResult> SelectMany<TSource, TResult>(
    this IObservable<TSource> source,
    Func<TSource, IObservable<TResult>> selector)

you can also use C#'s query syntax:

IObservable<char> dest = from i in source
                         from x in Observable.Repeat('x', i)
                         select x;

In both of the above examples, I've explicitly declared the type of dest instead of using the var keyword. There's no practical reason to do this; I only did it to make the type clear to you.

Left identity #

As I've already written time and again, a few test cases don't prove that any of the monad laws hold, but they can help illustrate what they imply. For example, here's an illustration of the left-identity law, written as a parametrized xUnit.net test:

[Theory]
[InlineData(1)]
[InlineData(2)]
[InlineData(3)]
public async Task LeftIdentity(int a)
{
    IObservable<char> h(int i) => Observable.Repeat('x', i);
 
    IList<char>  left = await Observable.Return(a).SelectMany(h).ToList();
    IList<char> right = await h(a).ToList();
 
    Assert.Equal(left, right);
}

Not only does the System.Reactive library define monadic bind in the form of SelectMany, but also return, with the aptly named Observable.Return function. .NET APIs often forget to do so explicitly, which means that I often have to go hunting for it, or guessing what the developers may have called it. Not here; thank you, Rx team.

Right identity #

In the same spirit, we may write another test to illustrate the right-identity law:

[Theory]
[InlineData("foo")]
[InlineData("bar")]
[InlineData("baz")]
public async Task RightIdentity(string a)
{
    IObservable<char> f(string s) => s.ToObservable();
 
    IObservable<char> m = f(a);
    IList<char>  left = await m.SelectMany(Observable.Return).ToList();
    IList<char> right = await m.ToList();
 
    Assert.Equal(left, right);
}

In both this and the previous test, you can see that the test has to await the observables in order to verify that the resulting collections are identical. Clearly, if you're instead dealing with infinite streams of data, you can't rely on such a simplifying assumption. For the general case, you must instead turn to other (proof) techniques to convince yourself that the laws hold. That's not my agenda here, so I'll skip that part.

Associativity #

Finally, we may illustrate the associativity law like this:

[Theory]
[InlineData("foo")]
[InlineData("123")]
[InlineData("4t2")]
public async Task Associativity(string a)
{
    IObservable<char> f(string s) => s.ToObservable();
    IObservable<byte> g(char c)
    {
        if (byte.TryParse(c.ToString(), out var b))
            return Observable.Return(b);
        else
            return Observable.Empty<byte>();
    }
    IObservable<bool> h(byte b) => Observable.Repeat(b % 2 == 0, b);
 
    IObservable<char> m = f(a);
    IList<bool>  left = await m.SelectMany(g).SelectMany(h).ToList();
    IList<bool> right = await m.SelectMany(x => g(x).SelectMany(h)).ToList();
 
    Assert.Equal(left, right);
}

This test composes three observable-producing functions in two different ways, to verify that they produce the same values.

The first function, f, simply turns a string into an observable stream. The string "foo" becomes the stream of characters 'f', 'o', 'o', and so on.

The next function, g, tries to parse the incoming character as a number. I've chosen byte as the data type, since there's no reason trying to parse a value that can, at best, be one of the digits 0 to 9 into a full 32-bit integer. A byte is already too large. If the character can be parsed, it's published as a byte value; if not, an empty stream of data is returned. For example, the character stream 'f', 'o', 'o' results in three empty streams, whereas the stream 4, t, 2 produces one singleton stream containing the byte 4, followed by an empty stream, followed again by a stream containing the single number 2.

The third and final function, h, turns a number into a stream of Boolean values; true if the number is even, and false if it's odd. The number of values is equal to the number itself. Thus, when composed together, "123" becomes the stream false, true, true, false, false, false. This is true for both the left and the right list, even though they're results of two different compositions.

Conclusion #

The point of this article is mostly that monads are commonplace. While you may discover them in your own code, they may also come in a reusable library. If you already know C# LINQ based off IEnumerable<T>, parts of Rx will be easy for you to learn. After all, it's the same abstraction, and familiar abstractions make code readable.

Next: The IO monad.

This blog is totally free, but if you like it, please consider supporting it.

Easier encapsulation with static types

2025-02-24T14:05:00+00:00

A metaphor.

While I'm still struggling with the notion that dynamically typed languages may have compelling advantages, I keep coming back to the benefits of statically typed languages. One such benefit is how it enables the communication of contracts, as I recently discussed in Encapsulating rod-cutting.

As usual, I base my treatment of encapsulation on my reading of Bertrand Meyer's seminal Object-Oriented Software Construction. A major aspect of encapsulation is the explicit establishment of contracts. What is expected of client code before it can invoke an operation (preconditions)? What is guaranteed to be true after the operation completes (postconditions)? And what is always true of a particular data structure (invariants)?

Contracts constitute the practical part of encapsulation. A contract can give you a rough sense of how well-encapsulated an API is: The more statements you can utter about the contract, the better encapsulation. You may even be able to take all those assertions about the contract and implement them as property-based tests. In other words, if you can think of many properties to write as tests, the API in question probably has good encapsulation. If, on the other hand, you can't think of a single precondition, postcondition, or invariant, this may indicate that encapsulation is lacking.

Contracts are the practical part of encapsulation. The overall notion provides guidance of how to achieve encapsulation. Specific contracts describe what is possible, and how to successfully interact with an API. Clearly, the what and how.

They don't, however, explain why encapsulation is valuable.

Why encapsulate? #

Successful code bases are big. Such a code base rarely fits in your head in its entirety. And the situation is only exacerbated by multiple programmers working concurrently on the code. Even if you knew most of the code base by heart, your team members are changing it, and you aren't going to be aware of all the modifications.

Encapsulation offers a solution to this problem. Instead of knowing every detail of the entire code base, encapsulation should enable you to interact with an API (originally, an object) without knowing all the implementation details. This is the raison d'être of contracts. Ideally, knowing the contract and the purpose of an object and its methods should be enough.

Imagine that you've designed an API with a strong contract. Is your work done? Not yet. Somehow, you'll have to communicate the contract to all present and future client developers.

How do you convey a contract to potential users? I can think of a few ways. Good names are important, but only skin-deep. You can also publish documentation, or use the type system. The following metaphor explores those two alternatives.

Doing a puzzle #

When I was a boy, I had a puzzle called Das verflixte Hunde-Spiel, which roughly translates to the confounded dog game. I've previously described the game and an algorithm for solving it, but that's not my concern here. Rather, I'd like to discuss how one might abstract the information carried by each tile.

As the picture suggests, the game consists of nine square tiles, each with two dog heads and two tails. The objective of the puzzle is to lay all nine tiles in a three-by-three grid such that all the heads fit the opposing tails. The dogs come in four different colour patterns, and each head must fit a tail of the same pattern.

It turns out that there are umpteen variations of this kind of puzzle. This one has cartoon dogs, but you can find similar games with frogs, cola bottles, playing card suits, trains, ladybirds, fast food, flowers, baseball players, owls, etc. This suggests that a generalization may exist. Perhaps an abstraction, even.

"Abstraction is the elimination of the irrelevant and the amplification of the essential"

Robert C. Martin, Designing Object-Oriented C++ Applications Using The Booch Method, ch. 00

How to eliminate the irrelevant and amplify the essential of a tile?

To recapitulate, a single tile looks like this:

In a sense, we may regard most of the information on such a tile as 'implementation details'. In a code metaphor, imagine looking at a tile like this as being equivalent to looking at the source code of a method or function (i.e. API). That's not the essence we need to correctly assemble the puzzle.

Imagine that you have to lay down the tiles according to a known solution. Since you already know the solution, this task only involves locating and placing each of the nine tiles. In this case, there are only nine tiles, each with four possible rotations, so if you already know what you're looking for, that is, of course, a tractable endeavour.

Now imagine that you'd like to undertake putting together the tiles without having to navigate by the full information content of each tile. In programming, we often need to do this. We have to identify objects that are likely to perform some subtask for us, and we have to figure out how to interact with such an object to achieve our goals. Preferably, we'd like to be able to do this without having to read all the source code of the candidate object. Encapsulation promises that this should be possible.

The backside of the tiles #

If we want to eliminate the irrelevant, we have to hide the information on each tile. As a first step, consider what happens if we flip the tiles around so that we only see their backs.

Obviously, each backside is entirely devoid of information, which means that we're now flying blind. Even if we know how to solve the puzzle, our only recourse is to blindly pick and rotate each of the nine tiles. As the previous article calculated, when picking at random, the odds of arriving at any valid solution is 1 to 5,945,425,920. Not surprisingly, total absence of information doesn't work.

We already knew that, because, while we want to eliminate the irrelevant, we also must amplify the essential. Thus, we need to figure out what that might be.

Perhaps we could write the essential information on the back of each tile. In the metaphor, this would correspond to writing documentation for an API.

Documentation #

To continue the metaphor, I asked various acquaintances to each 'document' a title. I deliberately only gave them the instruction that they should enable me to assemble the puzzle based on what was on the back of each tile. Some asked for additional directions, as to format, etc., but I refused to give any. People document code in various different ways, and I wanted to capture similar variation. Let's review some of the documentation I received.

Since I asked around among acquaintances, all respondents were Danes, and some chose to write the documentation in Danish, as is the case with this one.

Unless you have an explicit, enforced policy, you might run into a similar situation in software documentation. I've seen more than one example of software documentation written in Danish, simply because the programmer who wrote it didn't consider anything else than his or her native language. I'm sure most Europeans have similar experiences.

The text on the tile says, from the top and clockwise:

light brown dog/light snout/dark ears
dark brown, white/snout
orange tail/brown spots on/back
orange tail/orange back

Notice the disregard for capitalization rules or punctuation, a tendency among programmers that I've commented upon in Code That Fits in Your Head.

In addition to the text, the back of the above tile also includes six arrows. Four of them ought to be self-evident, but can you figure out what the two larger arrows indicate?

It turns out that the person writing this piece of documentation eventually realized that the description should be mirrored, because it was on the backside of the tile. To be fair to that person, I'd asked everyone to write with a permanent marker or pen, so correcting a mistake involved either a 'hack' like the two arrows, or starting over from scratch.

Let's look at some more 'documentation'. Another tile looks like this:

At first glance, I thought those symbols were Greek letters, but once you look at it, you start to realize what's going on. In the upper right corner, you see a stylized back and tail. Likewise, the lower left corner has a stylized face in the form of a smiley. The lines then indicate that the sides indicated by a corner has a head or tail.

Additionally, each side is encoded with a letter. I'll leave it as an exercise for the reader to figure out what G and B indicate, but also notice the two examples of a modified R. The one to the right indicates red with spots, and the other one uses the minus symbol to indicate red without spots.

On the one hand, this example does an admirable job of eliminating the irrelevant, but you may also find that it errs on the side of terseness. At the very least, it demands of the reader that he or she is already intimately familiar with the overall problem domain. You have to know the game well enough to be able to figure out that R- probably means red without spots.

Had this been software documentation, we might have been less than happy with this level of information. It may meet formal requirements, but is perhaps too idiosyncratic or esoteric.

Be that as it may, it's also possible to err on the other side.

In this example, the person writing the documentation essentially copied and described every detail on the front of the tile. Having no colours available, the person instead chose to describe in words the colour of each dog. Metaphorically speaking, the documentation replicates the implementation. It doesn't eliminate any irrelevant detail, and thereby it also fails to amplify the essential.

Here's another interesting example:

The text is in Danish. From the top clockwise, it says:

dark brown dog with blue collar
light brown dog with red collar
brown dog with small spots on back
Brown dog with big spots on back

Notice how the person writing this were aware that a tile has no natural up or down. Instead, each side is described with letters facing up when that side is up. You have to rotate the documentation in order to read all four sides. You may find that impractical, but I actually consider that to represent something essential about each tile. To me, this is positive.

Even so, an important piece of information is missing. It doesn't say which sides have heads, and which ones have tails.

Finally, here's one that, in my eyes, amplifies the essential and eliminates the irrelevant:

Like the previous example, you have to rotate the documentation in order to read all four sides, but the text is much terser. If you ask me, Grey head, Burnt umber tail, Brown tail, and Spotted head amplifies the essential and eliminates everything else.

Notice, however, how inconsistent the documentation is. People chose various different ways in their attempt to communicate what they found important. Some people inadvertently left out essential information. Other people provided too much information. Some people never came through, so in a few cases, documentation was entirely absent. And finally, I've hinted at this already, most people forgot to 'mirror' the information, but a few did remember.

All of this has direct counterparts in software documentation. The level of detail you get from documentation varies greatly, and often, the information that I actually care about is absent: Can I call this method with a negative number? Does the input string need to be formatted in a particular way? Does the method ever return null? Which exceptions may it throw?

I'm not against documentation, but it has limitations. As far as I can tell, though, that's your only option if you're working in a dynamically typed language.

Static types with limited expression #

Can you think of a way to constrain which puzzle pieces fit together with other pieces?

That's how jigsaw puzzles work. As a first attempt, we may try to cut out out the pieces like this:

This does help some, because when presented with the subtask of having to locate and find the next piece, at least we can't rotate the next piece in four different positions. Instead, we have only two options. Perhaps we'll choose to lay down the next piece like this:

You may also decide to rotate the right piece ninety degrees clockwise, but those are your only two rotation options.

We may decide to encode the shape of the pieces so that, say, the tabs indicate heads and the indentations indicate tails. This, at least, prevents us from joining head with head, or tail with tail.

This strikes me as an apt metaphor for C, or how many programmers use the type systems of C# or Java. It does prevent some easily preventable mistakes, but the types still don't carry enough information to enable you to identify exactly the pieces you need.

More expressive static types #

Static type systems come in various forms, and some are more expressive than others. To be honest, C#'s type system does come with good expressive powers, although it tends to require much ceremony. As far as I can tell, Java's type system is on par with C#. Let's assume that we either take the time to jump through the hoops that make these type systems expressive, or that we're using a language with a more expressive type system.

In the puzzle metaphor, we may decide to give a tile this shape:

Such a shape encodes all the information that we need, because each tab or indentation has a unique shape. We may not even have to remember exactly what a square indentation represents. If we're presented with the above tile and asked to lay down a compatible tile, we have to find one with a square tab.

Encoding the essential information into tile shapes enables us to not only prevent mistakes, but identify the correct composition of all the tiles.

For years, I've thought about static types as shapes of objects or functions. For practical purposes, static types can't express everything an operation may do, but I find it useful to use a good type system to my advantage.

Code examples #

You may find this a nice metaphor, and still fail to see how it translates to actual code. I'm not going to go into details here, but rather point to existing articles that give some introductions.

One place to start is to refactor from primitive obsession to domain models. Just wrapping a string or an integer in a predicative type helps communicate the purpose and constraints of a data type. Consider a constructor like this:

public Reservation(
    Guid id,
    DateTime at,
    Email email,
    Name name,
    NaturalNumber quantity)

While hardly sophisticated, it already communicates much information about preconditions for creating a Reservation object. Some of the constituent types (Guid and DateTime) are built in, so likely well-known to any developer working on a relevant code base. If you're wondering whether you can create a reservation with a negative quantity, the types already answer that.

Languages with native support for sum types let you easily model mutually exclusive, heterogeneous closed type hierarchies, as shown in this example:

type PaymentService = { Name : string; Action : string }
 
type PaymentType =
| Individual of PaymentService
| Parent of PaymentService
| Child of originalTransactionKey : string * paymentService : PaymentService

And if your language doesn't natively support sum types, you can emulate them with the Visitor design pattern.

You can, in fact, do some quite sophisticated tricks even with .NET's type system.

Conclusion #

People often argue about static types with the assumption that their main use is to prevent mistakes. They can help do that, too, but I also find static types an excellent communication medium. The benefits of using a static type system to model contracts is that, when a type system is already part of a language, it's a consistent, formalized framework for communication. Instead of inconsistent and idiosyncratic documentation, you can embed much information about a contract in the types of an API.

And indeed, not only can the types help communicate pre- and postconditions. The type checker also prevents errors.

A sufficiently sophisticated type system carries more information that most people realize. When I write Haskell code, I often need to look up a function that I need. Contrary to other languages, I don't try to search for a function by guessing what name it might have. Rather, the Hoogle search engine enables you to search for a function by type.

Types are shapes, and shapes are like outlines of objects. Used well, they enable you to eliminate the irrelevant, and amplify the essential information about an API.

This blog is totally free, but if you like it, please consider supporting it.

In defence of multiple WiP

2025-02-17T08:52:00+00:00

Programming isn't like factory work.

I was recently stuck on a programming problem. Specifically, part two of an Advent of Code puzzle, if you must know. As is my routine, I went for a run, which always helps to get unstuck. During the few hours away from the keyboard, I'd had a new idea. When I returned to the computer, I had my new algorithm implemented in about an hour, and it calculated the correct result in sub-second time.

I'm not writing this to brag. On the contrary, I suck at Advent of Code (which is a major reason that I do it). The point is rather that programming is fundamentally non-linear in effort. Not only are some algorithms orders of magnitudes faster than other algorithms, but it's also the case that the amount of time you put into solving a problem doesn't always correlate with the outcome.

Sometimes, the most productive way to solve a problem is to let it rest and go do something else.

One-piece flow #

Doesn't this conflict with the ideal of one-piece flow? That is, that you should only have one piece of work in progress (WiP).

Yes, it does.

It's not that I don't understand basic queue theory, haven't read The Goal, or that I'm unaware of the compelling explanations given by, among other people, Henrik Kniberg. I do, myself, lean (pun intended) towards lean software development.

I only offer the following as a counterpoint to other voices. As I've described before, when I seem to disagree with the mainstream view on certain topics, the explanation may rather be that I'm concerned with a different problem than other people are. If your problem is a dysfunctional organization where everyone have dozens of tasks in progress, nothing ever gets done because it's considered more important to start new work items than completing ongoing work, where 'utilization' is at 100% because of 'efficiency', then yes, I'd also recommend limiting WiP.

The idea in one-piece flow is that you're only working on one thing at a time.

Perhaps you can divide the task into subtasks, but you're only supposed to start something new when you're done with the current job. Compared to the alternative of starting a lot concurrent tasks in order to deal with wait times in the system, I agree with the argument that this is often better. One-piece flow often prompts you to take a good, hard look at where and how delays occur in your process.

Even so, I find it ironic that most of 'the Lean squad' is so busy blaming Taylorism for everything that's wrong with many software development organizations, only to go advocate for another management style rooted in factory work.

Programming isn't manufacturing.

Urgent or important #

As Eisenhower quoted an unnamed college president:

"I have two kinds of problems, the urgent and the important. The urgent are not important, and the important are never urgent."

It's hard to overstate how liberating it can be to ignore the urgent and focus on the important. Over decades, I keep returning to the realization that you often reach the best solutions to software problems by letting them stew.

I'm sure I've already told the following story elsewhere, but it bears repeating. Back in 2009 I started an open-source project called AutoFixture and also managed to convince my then-employer, Safewhere, to use it in our code base.

Maintaining or evolving AutoFixture wasn't my job, though. It was a work-related hobby, so nothing related to it was urgent. When in the office, I worked on Safewhere code, but biking back and forth between home and work, I thought about AutoFixture problems. Often, these problems would be informed by how we used it in Safewhere. My point is that the problems I was thinking about were real problems that I'd encountered in my day job, not just something I'd dreamt up for fun.

I was mostly thinking about API designs. Given that this was ideally a general-purpose open-source project, I didn't want to solve narrow problems with specific solutions. I wanted to find general designs that would address not only the immediate concerns, but also other problems that I had yet to discover.

Many an evening I spent trying out an idea I'd had on my bicycle. Often, it turned out that the idea wouldn't work. While that might be, in one sense, dismaying, on the other hand, it only meant that I'd learned about yet another way that didn't work.

Because there was no pressure to release a new version of AutoFixture, I could take the time to get it right. (After a fashion. You may disagree with the notion that AutoFixture is well-designed. I designed its APIs to the best of my abilities during the decade I lead the project. And when I discovered property-based testing, I passed on the reins.)

Get there earlier by starting later #

There's a 1944 science fiction short story by A. E. van Vogt called Far Centaurus that I'm now going to spoil.

In it, four astronauts embark on a 500-year journey to Alpha Centauri, using suspended animation. When they arrive, they discover that the system is long settled, from Earth.

During their 500 years en route, humans invented faster space travel. Even though later generations started later, they arrived earlier. They discovered a better way to get from a to b.

Compared to one-piece flow, we may illustrate this metaphor like this:

When presented with a problem, we don't start working on it right away. Or, we do, but the work we do is thinking rather than typing. We may even do some prototyping at that stage, but if no good solution presents itself, we put away the problem for a while.

We may return to the problem from time to time, and what may happen is that we realize that there's a much better, quicker way of accomplishing the goal than we first believed (as, again, recently happened to me). Once we have that realization, we may initiate the work, and it it may even turn out that we're done earlier than if we'd immediately started hacking at the problem.

By starting later, we've learned more. Like much knowledge work, software development is a profoundly non-linear endeavour. You may find a new way of doing things that are orders of magnitudes faster than what you originally had in mind. Not only in terms of big-O notation, but also in terms of implementation effort.

When doing Advent of Code, I've repeatedly been struck how the more efficient algorithm is often also simpler to implement.

Multiple WiP #

As the above figure suggests, you're probably not going to spend all your time thinking or doing. The figure has plenty of air in between the activities.

This may seem wasteful to efficiency nerds, but again: Knowledge work isn't factory work.

You can't think by command. If you've ever tried meditating, you'll know just how hard it is to empty your mind, or in any way control what goes on in your head. Focus on your breath. Indeed, and a few minutes later you snap out of a reverie about what to make for dinner, only to discover that you were able to focus on your breath for all of ten seconds.

As I already alluded to in the introduction, I regularly exercise during the work day. I also go grocery shopping, or do other chores. I've consistently found that I solve all hard problems when I'm away from the computer, not while I'm at it. I think Rich Hickey calls it hammock-driven development.

When presented with an interesting problem, I usually can't help thinking about it. What often happens, however, is that I'm mulling over multiple interesting problems during my day.

You could say that I actually have multiple pieces of work in progress. Some of them lie dormant for a long time, only to occasionally pop up and be put away again. Even so, I've had problems that I'd essentially given up on, only to resurface later when I'd learned a sufficient amount of new things. At that time, then, I sometimes realize that what I previously thought was impossible is actually quite simple.

It's amazing what you can accomplish when you focus on the things that are important, rather than the things that are urgent.

One size doesn't fit all #

How do I know that this will always work? How can I be sure that an orders-of-magnitude insight will occur if I just wait long enough?

There are no guarantees. My point is rather that this happens with surprising regularity. To me, at least.

Your software organization may include tasks that represent truly menial work. Yet, if you have too much of that, why haven't you automated it away?

Still, I'm not going to tell anyone how to run their development team. I'm only pointing out a weakness with the common one-piece narrative: It treats work as mostly a result of effort, and as if it were somehow interchangeable with other development tasks.

Most crucially, it models the amount of time required to complete a task as being independent of time: Whether you start a job today or in a month, it'll take x days to complete.

What if, instead, the effort was an function of time (as well as other factors)? The later you start, the simpler the work might be.

This of course doesn't happen automatically. Even if I have all my good ideas away from the keyboard, I still spend quite a bit of time at the keyboard. You need to work enough with a problem before inspiration can strike.

I'd recommend more slack time, more walks in the park, more grocery shopping, more doing the dishes.

Conclusion #

Programming is knowledge work. We may even consider it creative work. And while you can nurture creativity, you can't force it.

I find it useful to have multiple things going on at the same time, because concurrent tasks often cross-pollinate. What I learn from engaging with one task may produce a significant insight into another, otherwise unrelated problem. The lack of urgency, the lack of deadlines, foster this approach to problem-solving.

But I'm not going to tell you how to run your software development process. If you want to treat it as an assembly line, that's your decision.

You'll probably get work done anyway. Months of work can save days of thinking.

This blog is totally free, but if you like it, please consider supporting it.

Geographic hulls

2025-02-10T07:14:00+00:00

Seven lines of Python code.

Can you tell what this is?

I showed this to both my wife and my son, and they immediately recognized it for what it is. On the other hand, they're also both culturally primed for it.

After all, it's a map of Denmark, although I've transformed each of the major islands, as well as the peninsula of Jutland to their convex hulls.

Here's the original map I used for the transformation:

I had a reason to do this, having to do with the coastline paradox, but my underlying motivation isn't really that important for this article, since I rather want to discuss how I did it.

The short answer is that I used Python. You have to admit that Python has a fabulous ecosystem for all kinds of data crunching, including visualizations. I'd actually geared up to implementing a Graham scan myself, but that turned out not to be necessary.

GeoPandas to the rescue #

I'm a novice Python programmer, but I've used Matplotlib before to visualize data, so I found it natural to start with a few web searches to figure out how to get to grips with the problem.

I quickly found GeoPandas, which works on top of Matplotlib to render and visualize geographical data.

My next problem was to find a data set for Denmark, which I found on SimpleMaps. I chose to download and work with the GeoJSON format.

Originally, I'd envisioned implementing a Graham scan myself. After all, I'd done that before in F#, and it's a compelling exercise. It turned out, however, that this function is already available in the GeoPandas API.

I had trouble separating the data file's multi-part geometry into multiple single geometries. This meant that when I tried to find the convex hull, I got the hull of the entire map, instead of each island individually. The solution was to use the explode function.

Once I figured that out, it turned out that all I needed was seven lines of Python code, including imports and a blank line:

import geopandas as gpd
import matplotlib.pyplot as plt
 
map = gpd.read_file('dk.json')
map.explode().boundary.plot(edgecolor='green').set_axis_off()
map.explode().convex_hull.boundary.plot().set_axis_off()
plt.show()

In this script, I display the unmodified map before the convex hulls. This is only an artefact of my process. As I've already admitted, this is new ground for me, and I initially wanted to verify that I could even read in and display a GeoJSON file.

For both maps I use the boundary property to draw only the outline of the map, rather than filled polygons.

Enveloping the map parts #

Mostly for fun, but also to illustrate what a convex hull is, we can layer the two visualizations in a single image. In order to do that, a few changes to the code are required.

import geopandas as gpd
import matplotlib.pyplot as plt
 
map = gpd.read_file('dk.json')
_, ax = plt.subplots()
map.explode().boundary.plot(ax=ax, edgecolor='green').set_axis_off()
map.explode().convex_hull.boundary.plot(ax=ax).set_axis_off()
plt.show()

This little script now produces this image:

Those readers who know Danish geography may wonder what's going on with Falster. Since it's the sixth-largest Island in Denmark, shouldn't it have its own convex hull? Yes, it should, yet here it's connected to Zealand. Granted, two bridges connect the two, but that's hardly sufficient to consider them one island. There are plenty of bridges in Denmark, so according to that criterion, most of Denmark is connected. In fact, on the above map, only Bornholm, Samsø, Læsø, Ærø, Fanø, and Anholt would then remain as islands.

Rather, this only highlights the quality, or lack thereof, of the data set. I don't want to complain about a free resource, and the data set has served my purposes well enough. I mostly point this out in case readers were puzzled about this. In fact, a similar case applies to Nørrejyske Ø, which in the GeoJSON map is connected to Jutland at Aalborg. Yes, there's a bridge there. No, that shouldn't qualify as a land connection.

Other countries #

As you may have noticed, apart from the hard-coded file name, nothing in the code is specific to Denmark. This means that you can play around with other countries. Here I've downloaded various GeoJSON data sets from GeoJSON Maps of the globe, which seems to be using the same source data set that the Danish data set is also based on. In other words, if I download the file for Denmark from that site, it looks exactly as above.

Can you guess which country this is?

Or this one?

While this is all good fun, not all countries have interesting convex hull:

While I'll let you have a bit of fun guessing, you can hover your cursor over each image to reveal which country it is.

Conclusion #

Your default position when working with Python should probably be: There's already a library for that.

In this article, I've described how I wanted to show Denmark, but only the convex hull of each of the larger islands, as well as the Jutland peninsula. Of course, there was already a library for that, so that I only needed to write seven lines of code to produce the figures I wanted.

Granted, it took a few hours of research to put those seven lines together, but I'm only a novice Python programmer, and I'm sure an old hand could do it much faster.

This blog is totally free, but if you like it, please consider supporting it.

Modelling data relationships with C# types

2025-02-03T07:24:00+00:00

A C# example implementation of Ghosts of Departed Proofs.

This article continues where Modelling data relationships with F# types left off. It ports the F# example code to C#. If you don't read F# source code, you may instead want to read Implementing rod-cutting to get a sense of the problem being addressed.

I'm going to assume that you've read enough of the previous articles to get a sense of the example, but in short, this article examines if it's possible to use the type system to model data relationships. Specifically, we have methods that operate on a collection and a number. The precondition for calling these methods is that the number is a valid (one-based) index into the collection.

While you would typically implement such a precondition with a Guard Clause and communicate it via documentation, you can also use the Ghosts of Departed Proofs technique to instead leverage the type system. Please see the previous article for an overview.

That said, I'll repeat one point here: The purpose of these articles is to showcase a technique, using a simple example to make it, I hope, sufficiently clear what's going on. All this machinery is hardly warranted for an example as simple as this. All of this is a demonstration, not a recommendation.

Size proofs #

As in the previous article, we may start by defining what a 'size proof' looks like. In C#, it may idiomatically be a class with an internal constructor.

public sealed class Size<T>
{
    public int Value { get; }
 
    internal Size(int value)
    {
        Value = value;
    }
 
    // Also override ToString, Equals, and GetHashCode...
}

Since the constructor is internal it means that client code can't create Size<T> instances, and thereby client code can't decide a concrete type for the phantom type T.

Issuing size proofs #

How may client code create Size<T> objects? It may ask a PriceList<T> object to issue a proof:

public sealed class PriceList<T>
{
    public IReadOnlyCollection<int> Prices { get; }
 
    internal PriceList(IReadOnlyCollection<int> prices)
    {
        Prices = prices;
    }
 
    public Size<T>? TryCreateSize(int candidate)
    {
        if (0 < candidate && candidate <= Prices.Count)
            return new Size<T>(candidate);
        else
            return null;
    }
 
    // More members go here...

If the requested candidate integer represents a valid (one-indexed) position in the PriceList<T> object, the return value is a Size<T> object that contains the candidate. If, on the other hand, the candidate isn't in the valid range, no object is returned.

Since both PriceList<T> and Size<T> classes are immutable, once a 'size proof' has been issued, it remains valid. As I've previously argued, immutability makes encapsulation simpler.

This kind of API does, however, look like it's turtles all the way down. After all, the PriceList<T> constructor is also internal. Now the question becomes: How does client code create PriceList<T> objects?

The short answer is that it doesn't. Instead, it'll be given an object by the library API. You'll see how that works later, but first, let's review what such an API enables us to express.

Proof-based Cut API #

As described in Encapsulating rod-cutting, returning a collection of 'cut' objects better communicates postconditions than returning a tuple of two arrays, as the original algorithm suggested. In other words, we're going to need a type for that.

public sealed record Cut<T>(int Revenue, Size<T> Size);

In this case we can get by with a simple record type. Since one of the properties is of the type Size<T>, client code can't create Cut<T> instances, just like it can't create Size<T> or PriceList<T> objects. This is what we want, because a Cut<T> object encapsulates a proof that it's valid, related to the original collection of prices.

We can now define the Cut method as an instance method on PriceList<T>. Notice how all the T type arguments line up. As input, the Cut method only accepts Size<T> proofs issued by a compatible price list. This is enforced at compile time, not at run time.

public IReadOnlyCollection<Cut<T>> Cut(Size<T> n)
{
    var p = Prices.Prepend(0).ToArray();
    var r = new int[n.Value + 1];
    var s = new int[n.Value + 1];
    r[0] = 0;
    for (int j = 1; j <= n.Value; j++)
    {
        var q = int.MinValue;
        for (int i = 1; i <= j; i++)
        {
            var candidate = p[i] + r[j - i];
            if (q < candidate)
            {
                q = candidate;
                s[j] = i;
            }
        }
        r[j] = q;
    }
 
    var cuts = new List<Cut<T>>();
    for (int i = 1; i <= n.Value; i++)
    {
        var revenue = r[i];
        var size = new Size<T>(s[i]);
        cuts.Add(new Cut<T>(revenue, size));
    }
    return cuts;
}

For good measure, I'm showing the entire implementation, but you only need to pay attention to the method signature. The point is that n is constrained by the type system to be in a valid range.

Proof-based Solve API #

The same technique can be applied to the Solve method. Just align the Ts.

public IReadOnlyCollection<Size<T>> Solve(Size<T> n)
{
    var cuts = Cut(n).ToArray();
    var sizes = new List<Size<T>>();
    var size = n;
    while (size.Value > 0)
    {
        sizes.Add(cuts[size.Value - 1].Size);
        size = new Size<T>(size.Value - cuts[size.Value - 1].Size.Value);
    }
    return sizes;
}

This is another instance method on PriceList<T>, which is where T is defined.

Proof-based revenue API #

Finally, we may also implement a method to calculate the revenue from a given sequence of cuts.

public int CalculateRevenue(IReadOnlyCollection<Size<T>> cuts)
{
    var arr = Prices.ToArray();
    return cuts.Sum(c => arr[c.Value - 1]);
}

Not surprisingly, I hope, CalculateRevenue is another instance method on PriceList<T>. The cuts will typically come from a call to Solve, but it's entirely possible for client code to create an ad-hoc collection of Size<T> objects by repeatedly calling TryCreateSize.

Running client code #

How does client code use this API? It calls an Accept method with an implementation of this interface:

public interface IPriceListVisitor<TResult>
{
    TResult Visit<T>(PriceList<T> priceList);
}

Why 'visitor'? This doesn't quite look like a Visitor, and yet, it still does.

Imagine, for a moment, that we could enumerate all types that T could inhabit.

TResult Visit(PriceList<Type1> priceList);
TResult Visit(PriceList<Type2> priceList);
TResult Visit(PriceList<Type3> priceList);
// ⋮
TResult Visit(PriceList<TypeN> priceList);

Clearly we can't do that, since T is infinite, but if we could, the interface would look like a Visitor.

I find the situation sufficiently similar to name the interface with the Visitor suffix. Now we only need a class with an Accept method.

public sealed class RodCutter(IReadOnlyCollection<int> prices)
{
    public TResult Accept<TResult>(IPriceListVisitor<TResult> visitor)
    {
        return visitor.Visit(new PriceList<object>(prices));
    }
}

Client code may create a RodCutter object, as well as one or more classes that implement IPriceListVisitor<TResult>, and in this way interact with the library API.

Let's see some examples. We'll start with the original CLRS example, written as an xUnit.net test.

[Fact]
public void ClrsExample()
{
    var sut = new RodCutter([1, 5, 8, 9, 10, 17, 17, 20, 24, 30]);
 
    var actual = sut.Accept(new CutRodVisitor(10));
 
    var expected = new[] {
        ( 1,  1),
        ( 5,  2),
        ( 8,  3),
        (10,  2),
        (13,  2),
        (17,  6),
        (18,  1),
        (22,  2),
        (25,  3),
        (30, 10)
    };
    Assert.Equal(expected, actual);
}

CutRodVisitor is a nested class that implements the IPriceListVisitor<TResult> interface:

private class CutRodVisitor(int i) :
    IPriceListVisitor<IReadOnlyCollection<(int, int)>>
{
    public IReadOnlyCollection<(int, int)> Visit<T>(PriceList<T> priceList)
    {
        var n = priceList.TryCreateSize(i);
        if (n is null)
            return [];
        else
        {
            var cuts = priceList.Cut(n);
            return cuts.Select(c => (c.Revenue, c.Size.Value)).ToArray();
        }
    }
}

The CutRodVisitor class returns a collection of tuples. Why doesn't it just return cuts directly?

It can't, because it wouldn't type-check. Think about it for a moment. When you implement the interface, you need to pick a type for TResult. You can't, however, declare it to implement IPriceListVisitor<Cut<T>> (where T would be the T from Visit<T>), because at the class level, you don't know what T is. Neither does the compiler.

Your Visit<T> implementation must work for any T.

Preventing misalignment #

Finally, here's a demonstration of how the phantom type prevents confusing or mixing up two (or more) different price lists. Consider this rather artificial example:

[Fact]
public void NestTwoSolutions()
{
    var sut = new RodCutter([1, 2, 2]);
    var inner = new RodCutter([1]);
 
    (int, int)? actual = sut.Accept(new NestedRevenueVisitor(inner));
 
    Assert.Equal((3, 1), actual);
}

This unit test creates two price arrays and calls Accept on one of them (the 'outer' one, you may say), while passing the inner one to the Visitor, which at first glance just looks like this:

private class NestedRevenueVisitor(RodCutter inner) :
    IPriceListVisitor<(int, int)?>
{
    public (int, int)? Visit<T>(PriceList<T> priceList)
    {
        return inner.Accept(new InnerRevenueVisitor<T>(priceList));
    }
 
    // Inner visitor goes here...
}

Notice that it only delegates to yet another Visitor, passing the 'outer' priceList as a constructor parameter to the next Visitor. The purpose of this is to bring two PriceList<T> objects in scope at the same time. This will enable us to examine what happens if we make a programming mistake.

First, however, here's the proper, working implementation without mistakes:

private class InnerRevenueVisitor<T>(PriceList<T> priceList1) : IPriceListVisitor<(int, int)?>
{
    public (int, int)? Visit<T1>(PriceList<T1> priceList2)
    {
        var n1 = priceList1.TryCreateSize(3);
        if (n1 is null)
            return null;
        else
        {
            var cuts1 = priceList1.Solve(n1);
            var revenue1 = priceList1.CalculateRevenue(cuts1);
 
            var n2 = priceList2.TryCreateSize(1);
            if (n2 is null)
                return null;
            else
            {
                var cuts2 = priceList2.Solve(n2);
                var revenue2 = priceList2.CalculateRevenue(cuts2);
 
                return (revenue1, revenue2);
            }
        }
    }
}

Notice how both priceList1 and priceList2 are now both in scope. So far, they're not mixed up, so the Visit implementation queries first one and then another for the optimal revenue. If all works well (which it does), it returns a tuple with the two revenues.

What happens if I make a mistake? What if, for example, I write priceList2.Solve(n1)? It shouldn't be possible to use n1, which was issued by pricelist1, with priceList2. And indeed this isn't possible. With that mistake, the code doesn't compile. The compiler error is:

Argument 1: cannot convert from 'Ploeh.Samples.RodCutting.Size<T>' to 'Ploeh.Samples.RodCutting.Size<T1>'

When you look at the types, that makes sense. After all, there's no guarantee that T is equal to T1.

You'll run into similar problems if you mix up the two 'contexts' in other ways. The code doesn't compile. Which is what you want.

Conclusion #

This article demonstrates how to use the Ghosts of Departed Proofs technique in C#. In some ways, I find that it comes across as more idiomatic in C# than in F#. I think this is because rank-2 polymorphism is only possible in F# when using its object-oriented features. Since F# is a functional-first programming language, it seems a little out of place there, whereas it looks more at home in C#.

Perhaps I should have designed the F# code to make use of objects to the same degree as I've done here in C#.

I think I actually like how the C# API turned out, although having to define and implement a class every time you need to supply a Visitor may feel a bit cumbersome. Even so, developer experience shouldn't be exclusively about saving a few keystrokes. After all, typing isn't a bottleneck.

This blog is totally free, but if you like it, please consider supporting it.

Dependency inversion without inversion of control

2025-01-27T13:02:00+00:00

Here, have a sandwich.

For years I've been thinking about the Dependency Inversion Principle (DIP) and Inversion of Control (IoC) as two different things. While there's some overlap, they're not the same. To make matters more confusing, most people seem to consider IoC and Dependency Injection (DI) as interchangeable synonyms. As Steven van Deursen and I explain in DIPPP, they're not the same.

I recently found myself in a discussion on Stack Overflow where I was trying to untangle that confusion for a fellow Stack Overflow user. While I hadn't included a pedagogical Venn diagram, perhaps I should have.

This figure suggests that the sets are of equal size, which doesn't have to be the case. The point, rather, is that while the intersection may be substantial, each relative complement is not only not empty, but richly populated.

In this article, I'm not going to spend more time on the complement IoC without DIP. Rather, I'll expand on how to apply the DIP without IoC.

Appeal to authority? #

While writing the Stack Overflow answer, I'd tried to keep citations to 'original sources'. Sometimes, when a problem is technically concrete, it makes sense for me to link to one of my own works, but I've found that when the discussion is more abstract, that rarely helps convincing people. That's understandable. I'd also be sceptical if I were to run into some rando that immediately proceeded to argue a case by linking to his or her own blog.

This strategy, however elicited this response:

"Are you aware of any DIP-compliant example from Robert Martin that does not utilize polymorphism? The original paper along with some of Martin's lectures certainly seem to imply the DIP requires polymorphism."

comment, jaco0646

That's a fair question, and once I started looking for such examples, I had to admit that I couldn't find any. Eventually, I asked Robert C. Martin directly.

"Does the DIP require polymorphism? I argue that it does't, but I've managed to entangle myself in a debate where original sources count. Could you help us out?"

Tweet, me

To which he answered in much detail, but of which the essential response was:

"The DIP does not require polymorphism. Polymorphism is just one of several mechanisms to achieve dependency inversion."

Tweet, Robert C. Martin

While this was the answer I'd hoped for, it's easy to dismiss this exchange as an appeal to authority. On the other hand, as Carl Sagan said, "If you wish to make an apple pie from scratch, you must first invent the universe," which obviously isn't practical, and so we instead stand on the shoulders of giants.

In this context, asking Robert C. Martin was relevant because he's the original author of works that introduce the DIP. It's reasonable to assume that he has relevant insights on the topic.

It's not that I can't argue my case independently, but rather that I didn't think that the comments section of a Stack Overflow question was the right place to do that. This blog, on the other hand, is mine, I can use all the words I'd like, and I'll now proceed to do so.

Kernel of the idea #

All of Robert C. Martin's treatments of the DIP that I've found starts with the general idea and then proceeds to show examples of implementing it in code. As I've already mentioned, I haven't found a text of Martin's where the example doesn't utilize IoC.

The central idea, however, says nothing about IoC.

"A. High-level modules should not depend on low-level modules. Both should depend on abstractions.

"B. Abstractions should not depend on details. Details should depend upon abstractions."

APPP, Robert C. Martin

While only Martin knows what he actually meant, I can attempt a congenial reading of the work. What is most important here, I think, is that the word abstraction doesn't have to denote a particular kind of language construct, such as an abstract class or interface. Rather,

"Abstraction is the elimination of the irrelevant and the amplification of the essential."

Designing Object-Oriented C++ Applications Using The Booch Method, ch. 00, Robert C. Martin, his emphasis

The same connotation of abstraction seems to apply to the definition of the DIP. If, for example, we imagine that we consider a Domain Model, the business logic, as the essence we'd like to amplify, we may rightfully consider a particular persistence mechanism a detail. Even more concretely, if you want to take restaurant reservations via a REST API, the business rules that determine whether or not you can accept a reservation shouldn't depend on a particular database technology.

While code examples are useful, there's evidently a risk that if the examples are too much alike, it may constrain readers' thinking. All Martin's examples seem to involve IoC, but for years now, I've mostly been interested in the Dependency Inversion Principle itself. Abstractions should not depend on details. That's the kernel of the idea.

IoC isn't functional #

My thinking was probably helped along by exploring functional programming (FP). A natural question arises when one embarks on learning FP: How does IoC fit with FP? The short answer, it turns out, is that it doesn't. DI, at least, makes everything impure.

Does this mean, then, that FP precludes the DIP? That would be a problem, since the notion that abstractions shouldn't depend on details seems important. Doing FP shouldn't entail giving up on important architectural rules. And fortunately, it turns out not being the case. Quite the contrary, a consistent application of functional architecture seems to lead to Ports and Adapters. It'd go against the grain of FP to have a Domain Model query a relational database. Even if abstracted away, a database exists outside the process space of an application, and is inherently impure. IoC doesn't address that concern.

In FP, there are other ways to address such problems.

DIP sandwich #

While you can always model pure interactions with free monads, it's usually not necessary. In most cases, an Impureim Sandwich suffices.

The sample code base that accompanies Code That Fits in Your Head takes a similar approach. While it's possible to refactor it to an explicit Impureim Sandwich, the code presented in the book follows the kindred notion of Functional Core, Imperative Shell.

The code base implements an online restaurant reservation system, and the Domain Model is a set of data structures and pure functions that operate on them. The central and most complex function is the WillAccept method shown here. It decides whether to accept a reservation request, based on restaurant table configurations, existing reservations, business rules related to seating durations, etc. It does this without depending on details. It doesn't know about databases, the application's configuration system, or how to send emails in case it decides to accept a reservation.

All of this is handled by the application's HTTP Model, using the demarcation shown in Decomposing CTFiYH's sample code base. The HTTP Model defines Controllers, Data Transfer Objects (DTOs), middleware, and other building blocks required to drive the actual REST API.

The ReservationsController class contains, among many other methods, this helper method that illustrates the point:

private async Task<ActionResult> TryCreate(Restaurant restaurant, Reservation reservation)
{
    using var scope = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
 
    var reservations = await Repository.ReadReservations(restaurant.Id, reservation.At);
    var now = Clock.GetCurrentDateTime();
    if (!restaurant.MaitreD.WillAccept(now, reservations, reservation))
        return NoTables500InternalServerError();
 
    await Repository.Create(restaurant.Id, reservation);
 
    scope.Complete();
 
    return Reservation201Created(restaurant.Id, reservation);
}

Notice the call to restaurant.MaitreD.WillAccept. The Controller gathers all data required to call the pure function and subsequently acts on the return value. This keeps the abstraction (MaitreD) free of implementation details.

DI addressing another concern #

You may be wondering what exactly Repository is. If you've bought the book, you also have access to the sample code base, in which case you'd be able to look it up. It turns out that it's an injected dependency. While this may seem a bit contradictory, it also gives me the opportunity to discuss that this isn't an all-or-nothing proposition.

Consider the architecture diagram from Decomposing CTFiYH's sample code base, repeated here for convenience:

In the context of this diagram, the DIP is being applied in two different ways. From the outer Web Host to the HTTP Model, the decomposed code base uses ordinary DI. From the HTTP Model to the Domain Model, there's no inversion of control, but rather the important essence of the DIP: That the Domain Model doesn't depend on any of the details that surrounds it. Even so, the dependencies remain inverted, as indicated by the arrows.

What little DI that's left remains to support automated testing. Injecting Repository and a few other real dependencies enabled me to test-drive the externally visible behaviour of the system with state-based self-hosted tests.

If I hadn't cared about that, I could have hard-coded the SqlReservationsRepository object directly into the Controller and merged the Web Host with the HTTP Module. The Web Host is quite minimal anyway. This would, of course, have meant that the DIP no longer applied at that level, but even so, the interaction between the HTTP Model and the Domain Model would still follow the principle.

One important point about the above figure is that it's not to scale. The Web Host is in reality just six small classes, and the SQL and SMTP libraries each only contain a single class.

Conclusion #

Despite the name similarity, the Dependency Inversion Principle isn't equivalent with Inversion of Control or Dependency Injection. There's a sizeable intersection between the two, but the DIP doesn't require IoC.

I often use the Functional Core, Imperative Shell architecture, or the Impureim Sandwich pattern to invert the dependencies without inverting control. This keeps most of my code more functional, which also means that it fits better in my head and is intrinsically testable.

This blog is totally free, but if you like it, please consider supporting it.

Modelling data relationships with F# types

2025-01-20T07:24:00+00:00

An F# example implementation of Ghosts of Departed Proofs.

In a previous article, Encapsulating rod-cutting, I used a code example to discuss how to communicate an API's contract to client developers; that is, users of the API. In the article, I wrote

"All this said, however, it's also possible that I'm missing an obvious design alternative. If you can think of a way to model this relationship in a non-predicative way, please write a comment."

And indeed, a reader helpfully offered an alternative:

"Regarding the relation between the array and the index, you will find the paper called "Ghosts of departed proofs" interesting. Maybe an overkill in this case, maybe not, but a very interesting and useful technique in general."

borar

I wouldn't call it 'an obvious design alternative', but nonetheless find it interesting. In this article, I'll pick up the code from Encapsulating rod-cutting and show how the 'Ghosts of Departed Proofs' (GDP) technique may be applied.

Problem review #

Before we start with the GDP technique, a brief review of the problem is in order. For the complete overview, you should read the Encapsulating rod-cutting article. In the present article, however, we'll focus on one particular problem related to encapsulation:

Ideally, the cut function should take two input arguments. The first argument, p, is an array or list of prices. The second argument, n, is the size of a rod to cut optimally. One precondition states that n must be less than or equal to the length of p. This is because the algorithm needs to look up the price of a rod of size n, and it can't do that if n is greater than the length of p. The implied relationship is that p is indexed by rod size, so that if you want to find the price of a rod of size n, you look at the nth element in p.

How may we model such a relationship in a way that protects the precondition?

An obvious choice, particularly in object-oriented design, is to use a Guard Clause. In the F# code base, it might look like this:

let cut (p : _ array) n =
    if p.Length <= n
    then raise (ArgumentOutOfRangeException "n must be less than the length of p")
 
    // The rest of the function body...

You might argue that in F# and other functional programming languages, throwing exceptions isn't idiomatic. Instead, you ought to return Result or Option values, here the latter:

let cut (p : _ array) n =
    if p.Length <= n
    then None
    else
        // The rest of the function body...

To be clear, in most code bases, this is exactly what I would do. What follows is rather exotic, and hardly suitable for all use cases.

Proofs as values #

It's not too hard to model the lower boundary of the n parameter. As is often the case, it turns out that the number must be a natural number. I already covered that in the previous article. It's much harder, however, to model the upper boundary of the value, because it depends on the size of p.

The following is based on the paper Ghosts of Departed Proofs, as well as a helpful Gist also provided by Borar. (The link to the paper is to what I believe is the 'official' page for it, and since it's part of the ACM digital library, it's behind a paywall. Even so, as is the case with most academic papers, it's easy enough to find a PDF of it somewhere else. Not that I endorse content piracy, but it's my impression that academic papers are usually disseminated by the authors themselves.)

The idea is to enable a library to issue a 'proof' about a certain condition. In the example I'm going to use here, the proof is that a certain number is in the valid range for a given list of prices.

We actually can't entirely escape the need for a run-time check, but we do gain two other benefits. The first is that we're now using the type system to communicate a relationship that otherwise would have to be described in written documentation. The second is that once the proof has been issued, there's no need to perform additional run-time checks.

This can help move an API towards a more total, as opposed to partial, definition, which again moves towards what Michael Feathers calls unconditional code. This is particularly useful if the alternative is an API that 'forgets' which run-time guarantees have already been checked. The paper has some examples. I've also recently encountered similar situations when doing Advent of Code 2024. Many days my solution involved immutable maps (like hash tables) that I'd recurse over. In many cases I'd write an algorithm where I with absolute certainty knew that a particular key was in the map (if, for example, I'd just put it there three lines earlier). In such cases, you don't want a total function that returns an option or Maybe value. You want a partial function. Or a type-level guarantee that the value is, indeed, in the map.

For the example in this article, it's overkill, so you may wonder what the point is. On the other hand, a simple example makes it easier to follow what's going on. Hopefully, once you understand the technique, you can extrapolate it to situations where it might be more warranted.

Proof contexts #

The overall idea should look familiar to practitioners of statically-typed functional programming. Instead of plain functions and data structures, we introduce a special 'context' in which we have to run our computations. This is similar to how the IO monad works, or, in fact, most monads. You're not supposed to get the value out of the monad. Rather, you should inject the desired behaviour into the monad.

We find a similar design with existential types, or with the ST monad, on which the ideas in the GDP paper are acknowledged to be based. We even see a mutation-based variation in the article A mutable priority collection, where we may think of the Edit API as a variation of the ST monad, since it allows 'localized' state mutation.

I'll attempt to illustrate it like this:

A library offers a set of functions and data structures for immediate use. In addition, it also provides a higher-oder function that enables client code to embed a computation into a special 'sandbox' area where special rules apply. The paper calls such a context a 'name', which it does because it's trying to be as general as possible. As I'm writing this, I find it easier to think of this 'sandbox' as a 'proof context'. It's a context in which proof values exist. Crucially, as we shall see, they don't exist outside of this context.

Size proofs #

In the rod-cutting example, we particularly care about proving that a given number n is within the size of the price list. We do this by representing the proof as a value:

type Size<'a> = private Size of int with
    member this.Value = let (Size i) = this in i
    override this.ToString () = let (Size i) = this in string i

Two things are special about this type definition:

The constructor is private.
It has a phantom type 'a.

A phantom type is a generic type parameter that has no run-time value. Notice that Size<'a> contains no value of the type 'a. The type only exists at compile-time.

You can think of the type parameter as similar to a security token. The issuer of the proof associates a particular security token to vouch for its validity. Usually, when we talk about security tokens, they do have a run-time representation (typically a byte array) because we need to exchange them with other processes. This is, for example, how claims-based authentication works.

In this case, our concern isn't security. Rather, we wish to communicate and enforce certain relationships. Since we wish to leverage the type system, we use a type as a token.

Since the Size constructor is private, the library controls how it issues proofs, a bit like a claims issuer can sign a claim with its private key.

Okay, but how are Size proofs issued?

Issuing size proofs #

As you'll see later, more than one API may issue Size proofs, but the most fundamental is that you can query a price list for such a proof:

type PriceList<'a> = private PriceList of int list with
    member this.Length = let (PriceList prices) = this in prices.Length
    member this.trySize candidate : Size<'a> option =
        if 0 < candidate && candidate <= this.Length
        then Some (Size candidate)
        else None

The trySize member function issues a Some Size<'a> value if the candidate is within the size of the price array. As discussed above, we can't completely avoid a run-time check, but now that we have the proof, we don't need to repeat that run-time check if we wanted to use a particular Size value with the same PriceList.

Notice how immutability is an essential part of this design. If, in the object-oriented manner, we allow a price list to change, we could make it shorter. This could invalidate some proof that we previously issued. Since, however, the price list is immutable, we can trust that once we've checked a size, it remains valid. You can also think of this as a sort of encapsulation, in the sense that once we've assured ourselves that an object, or here rather a value, is valid, it remains valid. Indeed, encapsulation is simpler with immutability.

You probably still have some questions. For instance, how do we ensure that a size proof issued by one price list can't be used against another price list? Imagine that you have two price lists. One has ten prices, the other twenty. You could have the larger one issue a proof that size 17 is valid. What prevents you from using that proof with the smaller price list?

That's the job of that phantom type. Notice how a PriceList<'a> issues a Size<'a> proof. It's the same generic type argument.

Usually, I extol F#'s type inference. I prefer not having to use type annotations unless I have to. When it comes to GDP, however, type annotations are necessary, because we need these phantom types to line up. Without the type annotations, they wouldn't do that.

In the above example, the smaller price list might have the type PriceList<'a> and the larger one the type PriceList<'b>. The smaller would issue proofs of the type Size<'a>, and the larger one proofs of the type Size<'b>. As you'll see, you can't use a Size<'a> where a Size<'b> is required, or vice versa.

You may still wonder how one then creates PriceList<'a> values. After all, that type also has a private constructor.

We'll get back to that later.

Proof-based cut API #

Before we look at how client code may consume APIs based on proofs such as Size<'a>, we should review their expressive power. What does this design enable us to say?

While the first example above, with the Guard Clause alternative, was based on the initial imperative implementation shown in the article Implementing rod-cutting, the rest of the present article builds on the refactored code from Encapsulating rod-cutting.

The first change I need to introduce is to the Cut record type:

type Cut<'a> = { Revenue : int; Size : Size<'a> }

Notice that I've changed the type of the Size property to Size<'a>. This has the implication that Cut<'a> now also has a phantom type, and since client code can't create Size<'a> values, by transitivity it means that neither can client code create Cut<'a> values. These values can only be issued as proofs.

This enables us to change the type definition of the cut function:

let cut (PriceList prices : PriceList<'a>) (Size n : Size<'a>) : Cut<'a> list =
    // Implementation follows here...

Notice how all the phantom types line up. In order to call the function, client code must supply a Size<'a> value issued by a compatible PriceList<'a> value. Upon a valid call, the function returns a list of Cut<'a> values.

Pay attention to what is being communicated. You may find this strange and impenetrable, but for a reader who understands GDP, much about the contract is communicated through the types. We can see that n relates to prices, because the 'proof token' (the generic type parameter 'a) is the same for both arguments. A reader who understands how Size<'a> proofs are issued will now understand what the preconditions is: The n argument must be valid according to the size of the prices argument.

The type of the cut function also communicates a postcondition: It guarantees that the Size values of each Cut<'a> returned is valid according to the supplied prices. In other words, it means that no defensive coding is necessary. Client code doesn't have to check whether or not the price of each indicated cut can actually be found in prices. The types guarantee that they can.

You may consider the cut function a 'secondary' issuer of Size<'a> proofs, since it returns such values. If you wanted to call cut again with one of those values, you could.

Compared to the previous article, I don't think I changed much else in the cut function, besides the initial function declaration, and the last line of code, but for good measure, here's the entire function:

let cut (PriceList prices : PriceList<'a>) (Size n : Size<'a>) : Cut<'a> list =
    // Implementation follows here...
    let p = 0 :: prices |> Array.ofList
 
    let findBestCut revenues j =
        [1..j]
        |> List.map (fun i -> p[i] + Map.find (j - i) revenues, i)
        |> List.maxBy fst
 
    let aggregate acc j =
        let revenues = snd acc
        let q, i = findBestCut revenues j
        let cuts = fst acc
        cuts << (cons (q, i)), Map.add revenues.Count q revenues
 
    [1..n]
    |> List.fold aggregate (id, Map.add 0 0 Map.empty)
    |> fst <| [] // Evaluate Hughes list
    |> List.map (fun (r, i) -> { Revenue = r; Size = Size i })

The cut function is part of the same module as Size<'a>, so even though the constructor is private, the cut function can still use it.

Thus, the entire proof mechanism is for external use. Internally, the library code may take shortcuts, so it's up to the library author to convince him- or herself that the contract holds. In this case, I'm quite confident that the function only issues valid proofs. After all, I've lifted the algorithm from an acclaimed text book, and this particular implementation is covered by more than 10,000 test cases.

Proof-based solve API #

The solve code hasn't changed, I believe:

let solve prices n =
    let cuts = cut prices n
    let rec imp n =
        if n <= 0 then [] else
            let idx = n - 1
            let s = cuts[idx].Size
            s :: imp (n - s.Value)
    imp n.Value

While the code hasn't changed, the type has. In this case, no explicit type annotations are necessary, because the types are already correctly inferred from the use of cut:

solve: prices: PriceList<'a> -> n: Size<'a> -> Size<'a> list

Again, the phantom types line up as desired.

Proof-based revenue calculation #

Although I didn't show it in the previous article, I also included a function to calculate the revenue from a list of cuts. It gets the same treatment as the other functions:

let calculateRevenue (PriceList prices : PriceList<'a>) (cuts : Size<'a> list) =
    cuts |> List.sumBy (fun s -> prices[s.Value - 1])

Again we see how the GDP-based API communicates a precondition: The cuts must be valid according to the prices; that is, each cut, indicated by its Size property, must be guaranteed to be within the range defined by the price list. This makes the function total; or, unconditional code, as Michael Feathers would put it. The function can't fail at run time.

(I am, once more, deliberately ignoring the entirely independent problem of potential integer overflows.)

While you could repeatedly call PriceList<'a>.trySize to produce a list of cuts, the most natural way to produce such a list of cuts is to first call cut, and then pass its result to calculateRevenue.

The function returns int.

Proof-based printing #

Finally, here's printSolution:

let printSolution p n = solve p n |> List.iter (printfn "%O")

It hasn't changed much since the previous incarnation, but the type is now PriceList<'a> -> Size<'a> -> unit. Again, the precondition is the same as for cut.

Running client code #

How in the world do you write client code against this API? After all, the types all have private constructors, so we can't create any values.

If you trace the code dependencies, you'll notice that PriceList<'a> sits at the heart of the API. If you have a PriceList<'a>, you'd be able to produce the other values, too.

So how do you create a PriceList<'a> value?

You don't. You call the following runPrices function, and give it a PriceListRunner that it'll embed and run in the 'sandbox' illustrated above.

type PriceListRunner<'r> =
    abstract Run<'a> : PriceList<'a> -> 'r
 
let runPrices pl (ctx : PriceListRunner<'r>) = ctx.Run (PriceList pl)

As the paper describes, the GDP trick hinges on rank-2 polymorphism, and the only way (that I know of) this is supported in F# is on methods. An object is therefore required, and we define the abstract PriceListRunner<'r> class for that purpose.

Client code must implement the abstract class to call the runPrices function. Fortunately, since F# has object expressions, client code might look like this:

[<Fact>]
let ``CLRS example`` () =
    let p = [1; 5; 8; 9; 10; 17; 17; 20; 24; 30]
    let actual = Rod.runPrices p { new PriceListRunner<_> with
        member __.Run pl = option {
            let! n = pl.trySize 10
            let cuts = Rod.cut pl n
            return List.map (fun c -> (c.Revenue, c.Size.Value)) cuts } }
    [
        ( 1,  1)
        ( 5,  2)
        ( 8,  3)
        (10,  2)
        (13,  2)
        (17,  6)
        (18,  1)
        (22,  2)
        (25,  3)
        (30, 10)
    ] |> Some =! actual

This is an xUnit.net test where actual is produced by runPrices and an object expression that defines the code to run in the proof context. When the Run method runs, it runs with a concrete type that the compiler picked for 'a. This type is only in scope within that method, and can't escape it.

The implementing class is given a PriceList<'a> as an input argument. In this example, it tries to create a size of 10, which succeeds because the price list has ten elements.

Notice that the Run method transforms the cuts to tuples. Why doesn't it return cuts directly?

It can't. It's part of the deal. If I change the last line of Run to return cuts, the code no longer compiles. The compiler error is:

This code is not sufficiently generic. The type variable 'a could not be generalized because it would escape its scope.

Remember I wrote that 'a can't escape the scope of Run? This is enforced by the type system.

Preventing misalignment #

You may already consider it a benefit that this kind of API design uses the type system to communicate pre- and postconditions. Perhaps you also wonder how it prevents errors. As already discussed, if you're dealing with multiple price lists, it shouldn't be possible to use a size proof issued by one, with another. Let's see how that might look. We'll start with a correctly coded unit test:

[<Fact>]
let ``Nest two solutions`` () =
    let p1 = [1; 2; 2]
    let p2 = [1]
 
    let actual = Rod.runPrices p1 { new PriceListRunner<_> with
        member __.Run pl1 = option {
            let! n1 = pl1.trySize 3
            let cuts1 = Rod.solve pl1 n1
            let r = Rod.calculateRevenue pl1 cuts1
 
            let! inner = Rod.runPrices p2 { new PriceListRunner<_> with
                member __.Run pl2 = option {
                    let! n2 = pl2.trySize 1
                    let cuts2 = Rod.solve pl2 n2
                    return Rod.calculateRevenue pl2 cuts2 } }
 
            return (r, inner) } }
 
    Some (3, 1) =! actual

This code compiles because I haven't mixed up the Size or Cut values. What happens if I 'accidentally' change the 'inner' Rod.solve call to let cuts2 = Rod.solve pl2 n1?

The code doesn't compile:

Type mismatch. Expecting a 'Size<'a>' but given a 'Size<'b>' The type ''a' does not match the type ''b'

This is fortunate, because n1 wouldn't work with pl2. Consider that n1 contains the number 3, which is valid for the larger list pl1, but not the shorter list pl2.

Proofs are issued with a particular generic type argument - the type-level 'token', if you will. It's possible for a library API to explicitly propagate such proofs; you see a hint of that in cut, which not only takes as input a Size<'a> value, but also issues new proofs as a result.

At the same time, this design prevents proofs from being mixed up. Each set of proofs belongs to a particular proof context.

You get the same compiler error if you accidentally mix up some of the other terms.

Conclusion #

One goal in the GDP paper is to introduce a type-safe API design that's also ergonomic. Matt Noonan, the author, defines ergonomic as a design where correct use of the API doesn't place an undue burden on the client developer. The paper's example language is Haskell where rank-2 polymorphism has a low impact on the user.

F# only supports rank-2 polymorphism in method definitions, which makes consuming a GDP API more awkward than in Haskell. The need to create a new type, and the few lines of boilerplate that entails, is a drawback.

Even so, the GDP trick is a nice addition to your functional tool belt. You'll hardly need it every day, but I personally like having some specialized tools lying around together with the everyday ones.

But wait! The reason that F# has support for rank-2 polymorphism through object methods is because C# has that language feature. This must mean that the GDP technique works in C# as well, doesn't it? Indeed it does.

Next: Modelling data relationships with C# types.

This blog is totally free, but if you like it, please consider supporting it.

Recawr Sandwich

2025-01-13T15:52:00+00:00

A pattern variation.

After writing the articles Collecting and handling result values and Short-circuiting an asynchronous traversal, I realized that it might be valuable to describe a more disciplined variation of the Impureim Sandwich pattern.

The book Design Patterns describes each pattern over a number of sections. There's a description of the overall motivation, the structure of the pattern, UML diagrams, examples code, and more. One section discusses various implementation variations. I find it worthwhile, too, to explicitly draw attention to a particular variation of the more overall Impureim Sandwich pattern.

This variation imposes an additional constraint to the general pattern. While this may, at first glance, seem limiting, constraints liberate.

As a specialization, you may consider Recawr Sandwiches as a subset of all Impureim Sandwiches.

Read, calculate, write #

In short, the constraint is that the Sandwich should be organized in the following order:

Read data. This step is impure.
Calculate a result from the data. This step is a pure function.
Write data. This step is impure.

If the sandwich has more than three layers, this order should still be maintained. Once you start writing data to the network, to disk, to a database, or to the user interface, you shouldn't go back to reading in more data.

Naming #

The name Recawr Sandwich is made from the first letters of REad CAlculate WRite. It's pronounced recover sandwich.

When the idea of naming this variation originally came to me, I first thought of the name read/write sandwich, but then I thought that the most important ingredient, the pure function, was missing. I've considered some other variations, such as read, pure, write sandwich or input, referential transparency, output sandwich, but none of them quite gets the point across, I think, in the same way as read, calculate, write.

Precipitating example #

To be clear, I've been applying the Recawr Sandwich pattern for years, but it sometimes takes a counter-example before you realize that some implicit, tacit knowledge should be made explicit. This happened to me as I was discussing this implementation of Impureim Sandwich:

// Impure
IEnumerable<OneOf<ShoppingListItem, NotFound<ShoppingListItem>, Error>> results =
    await itemsToUpdate.Traverse(item => UpdateItem(item, dbContext));
 
// Pure
var result = results.Aggregate(
    new BulkUpdateResult([], [], []),
    (state, result) =>
        result.Match(
            storedItem => state.Store(storedItem),
            notFound => state.Fail(notFound.Item),
            error => state.Error(error)));
 
// Impure
await dbContext.SaveChangesAsync();
return new OkResult(result);

Notice that the top impure step traverses a collection of items to apply each to an action called UpdateItem. As I discussed in the article, I don't actually know what UpdateItem does, but the name strongly suggests that it updates a particular database row. Even if the actual write doesn't happen until SaveChangesAsync is called, this still seems off.

To be honest, I didn't realize this until I started thinking about how I'd go about solving the implied problem, if I had to do it from scratch. Because I probably wouldn't do it like that at all.

It strikes me that doing the update 'too early' makes the code more complicated than it has to be.

What would a Recawr Sandwich look like?

Recawr example #

Perhaps one could instead start by querying the database about which items are actually in it, then prepare the result, and finally make the update.

// Read
var existing = await FilterExisting(itemsToUpdate, dbContext);
 
// Calculate
var result = new BulkUpdateResult([.. existing], [.. itemsToUpdate.Except(existing)], []);
 
// Write
var results = await existing.Traverse(item => UpdateItem(item, dbContext));
await dbContext.SaveChangesAsync();
return new OkResult(result);

To be honest, this variation has different behaviour when Error values occur, but then again, I wasn't entirely sure what was even the purpose of the error value. If it's to model errors that client code can't recover from, throw an exception instead.

In any case, the example is typical of many I/O-heavy operations, which veer dangerously close to the degenerate. There really isn't a lot of logic required, so one may reasonably ask whether the example is useful. It was, however, the example that got me thinking about giving the Recawr Sandwich an explicit name.

Other examples #

All the examples in the original Impureim Sandwich article are actually Recawr Sandwiches. Other articles with clear Recawr Sandwich examples are:

In other words, I'm just retroactively giving these examples a more specific label.

What's an example of an Impureim Sandwich which is not a Recawr Sandwich? Ironically, the first example in this article.

Conclusion #

A Recawr Sandwich is a specialization of the slightly more general Impureim Sandwich pattern. It specializes by assigning roles to the two impure layers of the sandwich. In the first, the code reads data. In the second impure layer, it writes data. In between, it performs referentially transparent calculations.

While more constraining, this specialization offers a good rule of thumb. Most well-designed sandwiches follow this template.

This blog is totally free, but if you like it, please consider supporting it.

Encapsulating rod-cutting

2025-01-06T10:45:00+00:00

Focusing on usage over implementation.

This article is a part of a small article series about implementation and usage mindsets. The hypothesis is that programmers who approach a problem with an implementation mindset may gravitate toward dynamically typed languages, whereas developers concerned with long-term maintenance and sustainability of a code base may be more inclined toward statically typed languages. This could be wrong, and is almost certainly too simplistic, but is still, I hope, worth thinking about. In the previous article you saw examples of an implementation-centric approach to problem-solving. In this article, I'll discuss what a usage-first perspective entails.

A usage perspective indicates that you're first and foremost concerned with how useful a programming interface is. It's what you do when you take advantage of test-driven development (TDD). First, you write a test, which furnishes an example of what a usage scenario looks like. Only then do you figure out how to implement the desired API.

In this article I didn't use TDD since I already had a particular implementation. Even so, while I didn't mention it in the previous article, I did add tests to verify that the code works as intended. In fact, because I wrote a few Hedgehog properties, I have more than 10.000 test cases covering my implementation.

I bring this up because TDD is only one way to focus on sustainability and encapsulation. It's the most scientific methodology that I know of, but you can employ more ad-hoc, ex-post analysis processes. I'll do that here.

Imperative origin #

In the previous article you saw how the Extended-Bottom-Up-Cut-Rod pseudocode was translated to this F# function:

let cut (p : _ array) n =
    let r = Array.zeroCreate (n + 1)
    let s = Array.zeroCreate (n + 1)
    r[0] <- 0
    for j = 1 to n do
        let mutable q = Int32.MinValue
        for i = 1 to j do
            if q < p[i] + r[j - i] then
                q <- p[i] + r[j - i]
                s[j] <- i
        r[j] <- q
    r, s

In case anyone is wondering: This is a bona-fide pure function, even if the implementation is as imperative as can be. Given the same input, cut always returns the same output, and there are no side effects. We may wish to implement the function in a more idiomatic way, but that's not our first concern. My first concern, at least, is to make sure that preconditions, invariants, and postconditions are properly communicated.

The same goal applies to the printSolution action, also repeated here for your convenience.

let printSolution p n =
    let _, s = cut p n
    let mutable n = n
    while n > 0 do
        printfn "%i" s[n]
        n <- n - s[n]

Not that I'm not interested in more idiomatic implementations, but after all, they're by definition just implementation details, so first, I'll discuss encapsulation. Or, if you will, the usage perspective.

Names and types #

Based on the above two code snippets, we're given two artefacts: cut and printSolution. Since F# is a statically typed language, each operation also has a type.

The type of cut is int array -> int -> int array * int array. If you're not super-comfortable with F# type signatures, this means that cut is a function that takes an integer array and an integer as inputs, and returns a tuple as output. The output tuple is a pair; that is, it contains two elements, and in this particular case, both elements have the same type: They are both integer arrays.

Likewise, the type of printSolution is int array -> int -> unit, which again indicates that inputs must be an integer array and an integer. In this case the output is unit, which, in a sense, corresponds to void in many C-based languages.

Both operations belong to a module called Rod, so their slightly longer, more formal names are Rod.cut and Rod.printSolution. Even so, good names are only skin-deep, and I'm not even convinced that these are particularly good names. To be fair to myself, I adopted the names from the pseudocode from Introduction to Algorithms. Had I been freer to name function and design APIs, I might have chosen different names. As it is, currently, there's no documentation, so the types are the only source of additional information.

Can we infer proper usage from these types? Do they sufficiently well communicate preconditions, invariants, and postconditions? In other words, do the types satisfactorily indicate the contract of each operation? Do the functions exhibit good encapsulation?

We may start with the cut function. It takes as inputs an integer array and an integer. Are empty arrays allowed? Are all integers valid, or perhaps only natural numbers? What about zeroes? Are duplicates allowed? Does the array need to be sorted? Is there a relationship between the array and the integer? Can the single integer parameter be negative?

And what about the return value? Are the two integer arrays related in any way? Can one be empty, but the other large? Can they both be empty? May negative numbers or zeroes be present?

Similar questions apply to the printSolution action.

Not all such questions can be answered by types, but since we already have a type system at our disposal, we might as well use it to address those questions that are easily modelled.

Encapsulating the relationship between price array and rod length #

The first question I decided to answer was this: Is there a relationship between the array and the integer?

The array, you may recall, is an array of prices. The integer is the length of the rod to cut up.

A relationship clearly exists. The length of the rod must not exceed the length of the array. If it does, cut throws an IndexOutOfRangeException. We can't calculate the optimal cuts if we lack price information.

Likewise, we can already infer that the length must be a non-negative number.

While we could choose to enforce this relationship with Guard Clauses, we may also consider a simpler API. Let the function infer the rod length from the array length.

let cut (p : _ array) =
    let n = p.Length - 1
    let r = Array.zeroCreate (n + 1)
    let s = Array.zeroCreate (n + 1)
    r[0] <- 0
    for j = 1 to n do
        let mutable q = Int32.MinValue
        for i = 1 to j do
            if q < p[i] + r[j - i] then
                q <- p[i] + r[j - i]
                s[j] <- i
        r[j] <- q
    r, s

You may argue that this API is more implicit, which we generally don't like. The implication is that the rod length is determined by the array length. If you have a (one-indexed) price array of length 10, then how do you calculate the optimal cuts for a rod of length 7?

By shortening the price array:

> let p = [|0; 1; 5; 8; 9; 10; 17; 17; 20; 24; 30|];;
val p: int array = [|0; 1; 5; 8; 9; 10; 17; 17; 20; 24; 30|]

> cut (p |> Array.take (7 + 1));;
val it: int array * int array =
  ([|0; 1; 5; 8; 10; 13; 17; 18|], [|0; 1; 2; 3; 2; 2; 6; 1|])

This is clearly still sub-optimal. Notice, for example, how you need to add 1 to 7 in order to deal with the prefixed 0. On the other hand, we're not done with the redesign, so it may be worth pursuing this course a little further.

(To be honest, while this is the direction I ultimately choose, I'm not blind to the disadvantages of this implicit design. It makes it less clear to a client developer how to indicate a rod length. An alternative design would keep the price array and the rod length as two separate parameters, but then introduce a Guard Clause to check that the rod length doesn't exceed the length of the price array. Outside of dependent types I can't think of a way to model such a relationship between two values, and I admit to having no practical experience with dependent types. All this said, however, it's also possible that I'm missing an obvious design alternative. If you can think of a way to model this relationship in a non-predicative way, please write a comment.)

I gave the printSolution the same treatment, after first having extracted a solve function in order to separate decisions from effects.

let solve p =
    let _, s = cut p
    let l = ResizeArray ()
    let mutable n = p.Length - 1
    while n > 0 do
        l.Add s[n]
        n <- n - s[n]
    l |> List.ofSeq
 
let printSolution p = solve p |> List.iter (printfn "%i")

The implementation of the solve function is still imperative, but if you view it as a black box, it's referentially transparent. We'll get back to the implementation later.

Returning a list of cuts #

Let's return to all the questions I enumerated above, particularly the questions about the return value. Are the two integer arrays related?

Indeed they are! In fact, they have the same length.

As explained in the previous article, in the original pseudocode, the r array is supposed to be zero-indexed, but non-empty and containing 0 as the first element. The s array is supposed to be one-indexed, and be exactly one element shorter than the r array. In practice, in all three implementations shown in that article, I made both arrays zero-indexed, non-empty, and of the exact same length. This is also true for the F# implementation.

We can communicate this relationship much better to client developers by changing the return type of the cut function. Currently, the return type is int array * int array, indicating a pair of arrays. Instead, we can change the return type to an array of pairs, thereby indicating that the values are related two-and-two.

That would be a decent change, but we can further improve the API. A pair of integers are still implicit, because it isn't clear which integer represents the revenue and which one represents the size. Instead, we introduce a custom type with clear labels:

type Cut = { Revenue : int; Size : int }

Then we change the cut function to return a collection of Cut values:

let cut (p : _ array) =
    let n = p.Length - 1
    let r = Array.zeroCreate (n + 1)
    let s = Array.zeroCreate (n + 1)
    r[0] <- 0
    for j = 1 to n do
        let mutable q = Int32.MinValue
        for i = 1 to j do
            if q < p[i] + r[j - i] then
                q <- p[i] + r[j - i]
                s[j] <- i
        r[j] <- q
 
    let result = ResizeArray ()
    for i = 0 to n do
        result.Add { Revenue = r[i]; Size = s[i] }
    result |> List.ofSeq

The type of cut is now int array -> Cut list. Notice that I decided to return a linked list rather than an array. This is mostly because I consider linked lists to be more idiomatic than arrays in a context of functional programming (FP), but to be honest, I'm not sure that it makes much difference as a return value.

In any case, you'll observe that the implementation is still imperative. The main topic of this article is how to give an API good encapsulation, so I treat the actual code as an implementation detail. It's not the most important thing.

Linked list input #

Although I wrote that I'm not sure it makes much difference whether cut returns an array or a list, it does matter when it comes to input values. Currently, cut takes an int array as input.

As the implementation so amply demonstrates, F# arrays are mutable; you can mutate the cells of an array. A client developer may worry, then, whether cut modifies the input array.

From the implementation code we know that it doesn't, but encapsulation is all about sparing client developers the burden of having to read the implementation. Rather, an API should communicate its contract in as succinct a way as possible, either via documentation or the type system.

In this case, we can use the type system to communicate this postcondition. Changing the input type to a linked list effectively communicates to all users of the API that cut doesn't mutate the input. This is because F# linked lists are truly immutable.

let cut prices =
    let p = prices |> Array.ofList
    let n = p.Length - 1
    let r = Array.zeroCreate (n + 1)
    let s = Array.zeroCreate (n + 1)
    r[0] <- 0
    for j = 1 to n do
        let mutable q = Int32.MinValue
        for i = 1 to j do
            if q < p[i] + r[j - i] then
                q <- p[i] + r[j - i]
                s[j] <- i
        r[j] <- q
 
    let result = ResizeArray ()
    for i = 0 to n do
        result.Add { Revenue = r[i]; Size = s[i] }
    result |> List.ofSeq

The type of the cut function is now int list -> Cut list, which informs client developers of an invariant. You can trust that cut will not change the input arguments.

Natural numbers #

You've probably gotten the point by now, so let's move a bit quicker. There are still issues that we'd like to document. Perhaps the worst part of the current API is that client code is required to supply a prices list where the first element must be zero. That's a very specific requirement. It's easy to forget, and if you do, the cut function just silently fails. It doesn't throw an exception; it just gives you a wrong answer.

We may choose to add a Guard Clause, but why are we even putting that responsibility on the client developer? Why can't the cut function add that prefix itself? It can, and it turns out that once you do that, and also remove the initial zero element from the output, you're now working with natural numbers.

First, add a NaturalNumber wrapper of integers:

type NaturalNumber = private NaturalNumber of int with
    member this.Value = let (NaturalNumber i) = this in i
    static member tryCreate candidate =
        if candidate < 1 then None else Some <| NaturalNumber candidate
    override this.ToString () = let (NaturalNumber i) = this in string i

Since the case constructor is private, external code can only try to create values. Once you have a NaturalNumber value, you know that it's valid, but creation requires a run-time check. In other words, this is what Hillel Wayne calls predicative data.

Armed with this new type, however, we can now strengthen the definition of the Cut record type:

type Cut = { Revenue : int; Size : NaturalNumber } with
    static member tryCreate revenue size =
        NaturalNumber.tryCreate size
        |> Option.map (fun size -> { Revenue = revenue; Size = size })

The Revenue may still be any integer, because it turns out that the algorithm also works with negative prices. (For a book that's very meticulous in its analysis of algorithms, CLRS is surprisingly silent on this topic. Thorough testing with Hedgehog, however, indicates that this is so.) On the other hand, the Size of the Cut must be a NaturalNumber. Since, again, we don't have any constructive way (outside of using refinement types) of modelling this requirement, we also supply a tryCreate function.

This enables us to define the cut function like this:

let cut prices =
    let p = prices |> List.append [0] |> Array.ofList
    let n = p.Length - 1
    let r = Array.zeroCreate (n + 1)
    let s = Array.zeroCreate (n + 1)
    r[0] <- 0
    for j = 1 to n do
        let mutable q = Int32.MinValue
        for i = 1 to j do
            if q < p[i] + r[j - i] then
                q <- p[i] + r[j - i]
                s[j] <- i
        r[j] <- q
 
    let result = ResizeArray ()
    for i = 1 to n do
        Cut.tryCreate r[i] s[i] |> Option.iter result.Add
    result |> List.ofSeq

It still has the type int list -> Cut list, but the Cut type is now more restrictively designed. In other words, we've provided a more conservative definition of what we return, in keeping with Postel's law.

Furthermore, notice that the first line prepends 0 to the p array, so that the client developer doesn't have to do that. Likewise, when returning the result, the for loop goes from 1 to n, which means that it omits the first zero cut.

These changes ripple through and also improves encapsulation of the solve function:

let solve prices =
    let cuts = cut prices
    let l = ResizeArray ()
    let mutable n = prices.Length
    while n > 0 do
        let idx = n - 1
        let s = cuts.[idx].Size
        l.Add s
        n <- n - s.Value
    l |> List.ofSeq

The type of solve is now int list -> NaturalNumber list.

This is about as strong as I can think of making the API using F#'s type system. A type like int list -> NaturalNumber list tells you something about what you're allowed to do, what you're expected to do, and what you can expect in return. You can provide (almost) any list of integers, both positive, zero, or negative. You may also give an empty list. If we had wanted to prevent that, we could have used a NonEmpty list, as seen (among other places) in the article Conservative codomain conjecture.

Okay, to be perfectly honest, there's one more change that might be in order, but this is where I ran out of steam. One remaining precondition that I haven't yet discussed is that the input list must not contain 'too big' numbers. The problem is that the algorithm adds numbers together, and since 32-bit integers are bounded, you could run into overflow situations. Ask me how I know.

Changing the types to use 64-bit integers doesn't solve that problem (it only moves the boundary of where overflow happens), but consistently changing the API to work with BigInteger values might. To be honest, I haven't tried.

Functional programming #

From an encapsulation perspective, we're done now. By using the type system, we've emphasized how to use the API, rather than how it's implemented. Along the way, we even hid away some warts that came with the implementation. If I wanted to take this further, I would seriously consider making the cut function a private helper function, because it doesn't really return a solution. It only returns an intermediary value that makes it easier for the solve function to return the actual solution.

If you're even just a little bit familiar with F# or functional programming, you may have found it painful to read this far. All that imperative code. My eyes! For the love of God, please rewrite the implementation with proper FP idioms and patterns.

Well, the point of the whole article is that the implementation doesn't really matter. It's how client code may use the API that's important.

That is, of course, until you have to go and change the implementation code. In any case, as a little consolation prize for those brave FP readers who've made it all this way, here follows more functional implementations of the functions.

The NaturalNumber and Cut types haven't changed, so the first change comes with the cut function:

let private cons x xs = x :: xs
 
let cut prices =
    let p = 0 :: prices |> Array.ofList
    let n = p.Length - 1
 
    let findBestCut revenues j =
        [1..j]
        |> List.map (fun i -> p[i] + Map.find (j - i) revenues, i)
        |> List.maxBy fst
 
    let aggregate acc j =
        let revenues = snd acc
        let q, i = findBestCut revenues j
        let cuts = fst acc
        cuts << (cons (q, i)), Map.add revenues.Count q revenues
 
    [1..n]
    |> List.fold aggregate (id, Map.add 0 0 Map.empty)
    |> fst <| [] // Evaluate Hughes list
    |> List.choose (fun (r, i) -> Cut.tryCreate r i)

Even here, however, some implementation choices are dubious at best. For instance, I decided to use a Hughes list or difference list (see Tail Recurse for a detailed explanation of how this works in F#) without measuring whether or not it was better than just using normal list consing followed by List.rev (which is, in fact, often faster). That's one of the advantages of writing code for articles; such things don't really matter that much in that context.

Another choice that may leave you scratching your head is that I decided to model the revenues as a map (that is, an immutable dictionary) rather than an array. I did this because I was concerned that with the move towards immutable code, I'd have n reallocations of arrays. Perhaps, I thought, adding incrementally to a Map structure would be more efficient.

But really, all of that is just wanking, because I haven't measured.

The FP-style implementation of solve is, I believe, less controversial:

let solve prices =
    let cuts = cut prices
    let rec imp n =
        if n <= 0 then [] else
            let idx = n - 1
            let s = cuts[idx].Size
            s :: imp (n - s.Value)
    imp prices.Length

This is a fairly standard implementation using a local recursive helper function.

Both cut and solve have the types previously reported. In other words, this final refactoring to functional implementations didn't change their types.

Conclusion #

This article goes through a series of code improvements to illustrate how a static type system can make it easier to use an API. Use it correctly, that is.

There's a common misconception about ease of use that it implies typing fewer characters, or getting instant gratification. That's not my position. Typing is not a bottleneck, and in any case, not much is gained if you make it easier for client developers to get the wrong answers from your API.

Static types gives you a consistent vocabulary you can use to communicate an API's contract to client developers. What must client code do in order to make a valid method or function call? What guarantees can client code rely on? Encapsulation, in other words.

P.S. 2025-01-20:

For a type-level technique for modelling the relationship between rod size and price list, see Modelling data relationships with F# types.

This blog is totally free, but if you like it, please consider supporting it.

Pytest is fast

2024-12-30T16:01:00+00:00

One major attraction of Python. A recent realization.

Ever since I became aware of the distinction between statically and dynamically typed languages, I've struggled to understand the attraction of dynamically typed languages. As regular readers may have noticed, this is a bias that doesn't sit well with me. Clearly, there are advantages to dynamic languages that I fail to notice. Is it a question of mindset? Or is it a combination of several small advantages?

In this article, I'll discuss another potential benefit of at least one dynamically typed language, Python.

Fast feedback #

Rapid feedback is a cornerstone of modern software engineering. I've always considered the feedback from the compiler an important mechanism, but I've recently begun to realize that it comes with a price. While a good type system keeps you honest, compilation takes time, too.

Since I've been so entrenched in the camp of statically typed languages (C#, F#, Haskell), I've tended to regard compilation as a mandatory step. And since the compiler needs to run anyway, you might as well take advantage of it. Use the type system to make illegal states unrepresentable, and all that.

Even so, I've noticed that compilation time isn't a fixed size. This observation surely borders on the banal, but with sufficient cognitive bias, it can, apparently, take years to come to even such a trivial conclusion. After initial years with various programming languages, my formative years as a programmer were spent with C#. As it turns out, the C# compiler is relatively fast.

This is probably a combination of factors. Since C# is a one of the most popular languages, it has a big and skilled engineering team, and it's my impression that much effort goes into making it as fast and efficient as possible.

I also speculate that, since the C# type system isn't as powerful as F#'s or Haskell's, there's simply work that it can't do. When you can't expression certain constraints or relationships with the type system, the compiler can't check them, either.

That said, the C# compiler seems to have become slower over the years. This could be a consequence of all the extra language features that accumulate.

The F# compiler, in comparison, has always taken longer than the C# compiler. Again, this may be due to a combination of a smaller engineering team and that it actually can check more things at compile time, since the type system is more expressive.

This, at least, seems to fit with the observation that the Haskell compiler is even slower than F#. The language is incredibly expressive. There's a lot of constraints and relationships that you can model with the type system. Clearly, the compiler has to perform extra work to check that your types line up.

You're often left with the impression that if it compiles, it works. The drawback is that getting Haskell code to compile may be a non-trivial undertaking.

One thing is that you'll have to wait for the compiler. Another is that if you practice test-driven development (TDD), you'll have to compile the test code, too. Only once the tests are compiled can you run them. And TDD test suites should run in 10 seconds or less.

Skipping compilation with pytest #

A few years ago I had to learn a bit of Python, so I decided to try Advent of Code 2022 in Python. As the puzzles got harder, I added unit tests with pytest. When I ran them, I was taken aback at how fast they ran.

There's no compilation step, so the test suite runs immediately. Obviously, if you've made a mistake that a compiler would have caught, the test fails, but if the code makes sense to the interpreter, it just runs.

For various reasons, I ran out of steam, as one does with Advent of Code, but I managed to write a good little test suite. Until day 17, it ran in 0.15-0.20 seconds on my little laptop. To be honest, though, once I added tests for day 17, feedback time jumped to just under two seconds. This is clearly because I'd written some inefficient code for my System Under Test.

I can't really blame a test framework for being slow, when it's really my own code that slows it down.

A counter-argument is that a compiled language is much faster than an interpreted one. Thus, one might think that a faster language would counter poor implementations. Not so.

TDD with Haskell #

As I've already outlined, the Haskell compiler takes more time than C#, and obviously it takes more time than a language that isn't compiled at all. On the other hand, Haskell compiles to native machine code. My experience with it is that once you've compiled your program, it's fast.

In order to compare the two styles, I decided to record compilation and test times while doing Advent of Code 2024 in Haskell. I set up a Haskell code base with Stack and HUnit, as I usually do. As I worked through the puzzles, I'd be adding and running tests. Every time I recorded the time it took, using the time command to measure the time it took for stack test to run.

I've plotted the observations in this chart:

The chart shows more than a thousand observations, with the first to the left, and the latest to the right. The times recorded are the entire time it took from I started a test run until I had an answer. For this, I used the time command's real time measurement, rather than user or sys time. What matters is the feedback time; not the CPU time.

Each measurement is in seconds. The dashed orange line indicates the linear trend.

It's not the first time I've written Haskell code, so I knew what to expect. While you get the occasional fast turnaround time, it easily takes around ten seconds to compile even an empty code base. It seems that there's a constant overhead of that size. While there's an upward trend line as I added more and more code, and more tests, actually running the tests takes almost no time. The initial 'average' feedback time was around eight seconds, and 1100 observations later, the trends sits around 11.5 seconds. At this time, I had more than 200 test cases.

You may also notice that the observations vary quite a bit. You occasionally see sub-second times, but also turnaround times over thirty seconds. There's an explanation for both.

The sub-second times usually happen if I run the test suite twice without changing any code. In that case, the Haskell Stack correctly skips recompiling the code and instead just reruns the tests. This only highlights that I'm not waiting for the tests to execute. The tests are fast. It's the compiler that causes over 90% of the delay.

(Why would I rerun a test suite without changing any code? This mostly happens when I take a break from programming, or if I get distracted by another task. In such cases, when I return to the code, I usually run the test suite in order to remind myself of the state in which I left it. Sometimes, it turns out, I'd left the code in a state were the last thing I did was to run all tests.)

The other extremes have a different explanation.

IDE woes #

Why do I have to suffer through those turnaround times over twenty seconds? A few times over thirty?

The short answer is that these represent complete rebuilds. Most of these are caused by problems with the IDE. For Haskell development, I use Visual Studio Code with the Haskell extension.

Perhaps it's only my setup that's messed up, but whenever I change a function in the System Under Test (SUT), I can. not. make. VS Code pick up that the API changed. Even if I correct my tests so that they still compile and run successfully from the command line, VS Code will keep insisting that the code is wrong.

This is, of course, annoying. One of the major selling points of statically type languages is that a good IDE can tell you if you made mistakes. Well, if it operates on an outdated view of what the SUT looks like, this no longer works.

I've tried restarting the Haskell Language Server, but that doesn't work. The only thing that works, as far as I've been able to discover, is to close VS Code, delete .stack-work, recompile, and reopen VS Code. Yes, that takes a minute or two, so not something I like doing too often.

Deleting .stack-work does trigger a full rebuild, which is why we see those long build times.

Striking a good balance #

What bothers me about dynamic languages is that I find discoverability and encapsulation so hard. I can't just look at the type of an operation and deduce what inputs it might take, or what the output might look like.

To be honest, if you give me a plain text file with F# or Haskell, I can't do that either. A static type system doesn't magically surface that kind of information. Instead, you may rely on an IDE to provide such information at your fingertips. The Haskell extension, for example, gives you a little automatic type annotation above your functions, as discussed in the article Pendulum swing: no Haskell type annotation by default, and shown in a figure reprinted here for your convenience:

If this is to work well, this information must be immediate and responsive. On my system it isn't.

It may, again, be that there's some problem with my particular tool chain setup. Or perhaps a four-year-old Lenovo X1 Carbon is just too puny a machine to effectively run such a tool.

On the other hand, I don't have similar issues with C# in Visual Studio (not VS Code). When I make changes, the IDE quickly responds and tells me if I've made a mistake. To be honest, even here, I feel that it was faster and more responsive a decade ago, but compared to Haskell programming, the feedback I get with C# is close to immediate.

My experience with F# is somewhere in between. Visual Studio is quicker to pick up changes in F# code than VS Code is to reflect changes in Haskell, but it's not as fast as C#.

With Python, what little IDE integration is available is usually not trustworthy. Essentially, when suggesting callable operations, the IDE is mostly guessing, based on what it's already seen.

But, good Lord! The tests run fast.

Conclusion #

My recent experiences with both Haskell and Python programming is giving me a better understanding of the balances and trade-offs involved with picking a language. While I still favour statically typed languages, I'm beginning to see some attractive qualities on the other side.

Particularly, if you buy the argument that TDD suites should run in 10 seconds or less, this effectively means that I can't do TDD in Haskell. Not with the hardware I'm running. Python, on the other hand, seems eminently well-suited for TDD.

That doesn't sit too well with me, but on the other hand, I'm glad. I've learned about a benefit of a dynamically typed language. While you may consider all of this ordinary and trite, it feels like a small breakthrough to me. I've been trying hard to see past my own limitations, and it finally feels as though I've found a few chinks in the armour of my biases.

I'll keep pushing those envelopes to see what else I may learn.

Comments

Daniel Tartaglia #

An interesting insight, but if you consider that the compiler is effectively an additional test suit that is verifying the types are being used correctly, that extra compilation time is really just a whole suite of tests that you didn't have to write. I can't help but wonder how long it would take to manually implement all the tests that would be required to satisfy those checks in Python, and how much slower the Python test suite would then be.

Like you, I have a strong bias for typesafe languages (or at least moderately typesafe ones). The way I've always explained it is as follows: Developers tend to work faster when writing with dynamic typed languages because they don't have to explain as much to a compiler. This literally means less code to write. However, because the developer hasen't fully explained themself, any follow-on developer does not have as much context to work with.

After all, whether the language requires it or not, the developers need to define and consider types. The only question is, do they have to write it down

2025-01-01 01:26 UTC

Mark Seemann #

Daniel, thank you for writing. I'm well aware that a type checker is a 'first line of defence', and I agree that if we truly had to replicate everything that a type checker does, as tests, it would take a long time. It would take a long time to write all those tests, and it would probably also take a long time to execute them all.

That said, I think that any sane proponent of dynamically typed languages would counter that that's an unreasonable demand. After all, in most cases, it's hardly the case that the code was written by a monkey with a typewriter, but rather by a well-meaning human who did his or her best to write correct code.

In the end, however, it's all a question about context. How important is correctness, after all? Dan North once kindly pointed out to me that in many cases, the software owner doesn't even know what he or she wants. It's only through a series of iterations that we learn what a business system is supposed to do. Until we reach that point, correctness is, at best, a secondary priority. On the other hand, you should really test your outer space proble software.

But you're right. The types are still there, either way.

The last word in this debate are hardly said yet, but you may also find my recent article series Implementation and usage mindsets interesting.

2025-01-07 06:53 UTC

This blog is totally free, but if you like it, please consider supporting it.

Implementing rod-cutting

2024-12-23T08:53:00+00:00

From pseudocode to implementation in three languages.

This article picks up where Implementation and usage mindsets left off, examining how easy it is to implement an algorithm in three different programming languages.

As an example, I'll use the bottom-up rod-cutting algorithm from Introduction to Algorithms.

Rod-cutting #

The problem is simple:

"Serling Enterprises buys long steel rods and cuts them into shorter rods, which it then sells. Each cut is free. The management of Serling Enterprises wants to know the best way to cut up the rods."

Introduction to Algorithms. Fourth edition, ch. 14.1

You're given an array of prices, or rather revenues, that each size is worth. The example from the book is given as a table:

length i	1	2	3	4	5	6	7	8	9	10
price p_i	1	5	8	9	10	17	17	20	24	30

Notice that while this implies an array like [1, 5, 8, 9, 10, 17, 17, 20, 24, 30], the array is understood to be one-indexed, as is the most common case in the book. Most languages, including all three languages in this article, have zero-indexed arrays, but it turns out that we can get around the issue by adding a leading zero to the array: [0, 1, 5, 8, 9, 10, 17, 17, 20, 24, 30].

Thus, given that price array, the best you can do with a rod of length 10 is to leave it uncut, yielding a revenue of 30.

On the other hand, if you have a rod of length 7, you can cut it into two rods of lengths 1 and 6.

Another solution for a rod of length 7 is to cut it into three rods of sizes 2, 2, and 3. Both solutions yield a total revenue of 18. Thus, while more than one optimal solution exists, the algorithm given here only identifies one of them.

Extended-Bottom-Up-Cut-Rod(p, n)
 1 let r[0:n] and s[1:n] be new arrays
 2 r[0] = 0
 3 for j = 1 to n                // for increasing rod length j
 4     q = -∞
 5     for i = 1 to j            // i is the position of the first cut
 6         if q < p[i] + r[j - i]
 7             q = p[i] + r[j - i]
 8             s[j] = i         // best cut location so far for length j
 9     r[j] = q                 // remember the solution value for length j
10 return r and s

Which programming language is this? It's no particular language, but rather pseudocode.

The reason that the function is called Extended-Bottom-Up-Cut-Rod is that the book pedagogically goes through a few other algorithms before arriving at this one. Going forward, I don't intend to keep that rather verbose name, but instead just call the function cut_rod, cutRod, or Rod.cut.

The p parameter is a one-indexed price (or revenue) array, as explained above, and n is a rod size (e.g. 10 or 7, reflecting the above examples).

Given the above price array and n = 10, the algorithm returns two arrays, r for maximum possible revenue for a given cut, and s for the size of the maximizing cut.

i	0	1	2	3	4	5	6	7	8	9	10
r[i]	0	1	5	8	10	13	17	18	22	25	30
s[i]		1	2	3	2	2	6	1	2	3	10

Such output doesn't really give a solution, but rather the raw data to find a solution. For example, for n = 10 (= i), you consult the table for (one-indexed) index 10, and see that you can get the revenue 30 from making a cut at position 10 (which effectively means no cut). For n = 7, you consult the table for index 7 and observe that you can get the total revenue 18 by making a cut at position 1. This leaves you with two rods, and you again consult the table. For n = 1, you can get the revenue 1 by making a cut at position 1; i.e. no further cut. For n = 7 - 1 = 6 you consult the table and observe that you can get the revenue 17 by making a cut at position 6, again indicating that no further cut is necessary.

Another procedure prints the solution, using the above process:

Print-Cut-Rod-Solution(p, n)
 1 (r, s) = Extended-Bottom-Up-Cut-Rod(p, n)
 2 while n > 0
 3     print s[n]    // cut location for length n
 4     n = n - s[n]  // length of the remainder of the rod

Again, the procedure is given as pseudocode.

How easy is it translate this algorithm into code in a real programming language? Not surprisingly, this depends on the language.

Translation to Python #

The hypothesis of the previous article is that dynamically typed languages may be more suited for implementation tasks. The dynamically typed language that I know best is Python, so let's try that.

def cut_rod(p, n):
    r = [0] * (n + 1)
    s = [0] * (n + 1)
    r[0] = 0
    for j in range(1, n + 1):
        q = float('-inf')
        for i in range(1, j + 1):
            if q < p[i] + r[j - i]:
                q = p[i] + r[j - i]
                s[j] = i
        r[j] = q
    return r, s

That does, indeed, turn out to be straightforward. I had to figure out the syntax for initializing arrays, and how to represent negative infinity, but a combination of GitHub Copilot and a few web searches quickly cleared that up.

The same is true for the Print-Cut-Rod-Solution procedure.

def print_cut_rod_solution(p, n):
    r, s = cut_rod(p, n)
    while n > 0:
        print(s[n])
        n = n - s[n]

Apart from minor syntactical differences, the pseudocode translates directly to Python.

So far, the hypothesis seems to hold. This particular dynamically typed language, at least, easily implements that particular algorithm. If we must speculate about underlying reasons, we may argue that a dynamically typed language is low on ceremony. You don't have to get side-tracked by declaring types of parameters, variables, or return values.

That, at least, is a common complaint about statically typed languages that I hear when I discuss with lovers of dynamically typed languages.

Let us, then, try to implement the rod-cutting algorithm in a statically typed language.

Translation to Java #

Together with other C-based languages, Java is infamous for requiring a high amount of ceremony to get anything done. How easy is it to translate the rod-cutting pseudocode to Java? Not surprisingly, it turns out that one has to jump through a few more hoops.

First, of course, one has to set up a code base and choose a build system. I'm not well-versed in Java development, but here I (more or less) arbitrarily chose gradle. When you're new to an ecosystem, this can be a significant barrier, but I know from decades of C# programming that tooling alleviates much of that pain. Still, a single .py file this isn't.

Apart from that, the biggest hurdle turned out to be that, as far as I can tell, Java doesn't have native tuple support. Thus, in order to return two arrays, I would have to either pick a reusable package that implements tuples, or define a custom class for that purpose. Object-oriented programmers often argue that tuples represent poor design, since a tuple doesn't really communicate the role or intent of each element. Given that the rod-cutting algorithm returns two integer arrays, I'd be inclined to agree. You can't even tell them apart based on their types. For that reason, I chose to define a class to hold the result of the algorithm.

public class RodCuttingSolution {
    private int[] revenues;
    private int[] sizes;
 
    public RodCuttingSolution(int[] revenues, int[] sizes) {
        this.revenues = revenues;
        this.sizes = sizes;
    }
 
    public int[] getRevenues() {
        return revenues;
    }
 
    public int[] getSizes() {
        return sizes;
    }
}

Armed with this return type, the rest of the translation went smoothly.

public static RodCuttingSolution cutRod(int[] p, int n) {
    var r = new int[n + 1];
    var s = new int[n + 1];
    r[0] = 0;
    for (int j = 1; j <= n; j++) {
        var q = Integer.MIN_VALUE;
        for (int i = 1; i <= j; i++) {
            if (q < p[i] + r[j - i]) {
                q = p[i] + r[j - i];
                s[j] = i;
            }
        }
        r[j] = q;
    }
    return new RodCuttingSolution(r, s);
}

Granted, there's a bit more ceremony involved compared to the Python code, since one must declare the types of both input parameters and method return type. You also have to declare the type of the arrays when initializing them, and you could argue that the for loop syntax is more complicated than Python's for ... in range ... syntax. One may also complain that all the brackets and parentheses makes it harder to read the code.

While I'm used to such C-like code, I'm not immune to such criticism. I actually do find the Python code more readable.

Translating the Print-Cut-Rod-Solution pseudocode is a bit easier:

public static void printCutRodSolution(int[] p, int n) {
    var result = cutRod(p, n);
    while (n > 0) {
        System.out.println(result.getSizes()[n]);
        n = n - result.getSizes()[n];
    }
}

The overall structure of the code remains intact, but again we're burdened with extra ceremony. We have to declare input and output types, and call that awkward getSizes method to retrieve the array of cut sizes.

It's possible that my Java isn't perfectly idiomatic. After all, although I've read many books with Java examples over the years, I rarely write Java code. Additionally, you may argue that static methods exhibit a code smell like Feature Envy. I might agree, but the purpose of the current example is to examine how easy or difficult it is to implement a particular algorithm in various languages. Now that we have an implementation in Java, we might wish to refactor to a more object-oriented design, but that's outside the scope of this article.

Given that the rod-cutting algorithm isn't the most complex algorithm that exists, we may jump to the conclusion that Java isn't that bad compared to Python. Consider, however, how the extra ceremony on display here impacts your work if you have to implement a larger algorithm, or if you need to iterate to find an algorithm on your own.

To be clear, C# would require a similar amount of ceremony, and I don't even want to think about doing this in C.

All that said, it'd be irresponsible to extrapolate from only a few examples. You'd need both more languages and more problems before it even seems reasonable to draw any conclusions. I don't, however, intend the present example to constitute a full argument. Rather, it's an illustration of an idea that I haven't pulled out of thin air.

One of the points of Zone of Ceremony is that the degree of awkwardness isn't necessarily correlated to whether types are dynamically or statically defined. While I'm sure that I miss lots of points made by 'dynamists', this is a point that I often feel is missed by that camp. One language that exemplifies that 'beyond-ceremony' zone is F#.

Translation to F# #

If I'm right, we should be able to translate the rod-cutting pseudocode to F# with approximately the same amount of trouble than when translating to Python. How do we fare?

let cut (p : _ array) n =
    let r = Array.zeroCreate (n + 1)
    let s = Array.zeroCreate (n + 1)
    r[0] <- 0
    for j = 1 to n do
        let mutable q = Int32.MinValue
        for i = 1 to j do
            if q < p[i] + r[j - i] then
                q <- p[i] + r[j - i]
                s[j] <- i
        r[j] <- q
    r, s

Fairly well, as it turns out, although we do have to annotate p by indicating that it's an array. Still, the underscore in front of the array keyword indicates that we're happy to let the compiler infer the type of array (which is int array).

(We can get around that issue by writing Array.item i p instead of p[i], but that's verbose in a different way.)

Had we chosen to instead implement the algorithm based on an input list or map, we wouldn't have needed the type hint. One could therefore argue that the reason that the hint is even required is because arrays aren't the most idiomatic data structure for a functional language like F#.

Otherwise, I don't find that this translation was much harder than translating to Python, and I personally prefer for j = 1 to n do over for j in range(1, n + 1):.

We also need to add the mutable keyword to allow q to change during the loop. You could argue that this is another example of additional ceremony, While I agree, it's not much related to static versus dynamic typing, but more to how values are immutable by default in F#. If I recall correctly, JavaScript similarly distinguishes between let, var, and const.

Translating Print-Cut-Rod-Solution requires, again due to values being immutable by default, a bit more effort than Python, but not much:

let printSolution p n =
    let _, s = cut p n
    let mutable n = n
    while n > 0 do
        printfn "%i" s[n]
        n <- n - s[n]

I had to shadow the n parameter with a mutable variable to stay as close to the pseudocode as possible. Again, one may argue that the overall problem here isn't the static type system, but that programming based on mutation isn't idiomatic for F# (or other functional programming languages). As you'll see in the next article, a more idiomatic implementation is even simpler than this one.

Notice, however, that the printSolution action requires no type declarations or annotations.

Let's see it all in use:

> let p = [|0; 1; 5; 8; 9; 10; 17; 17; 20; 24; 30|];;
val p: int array = [|0; 1; 5; 8; 9; 10; 17; 17; 20; 24; 30|]

> Rod.printSolution p 7;;
1
6

This little interactive session reproduces the example illustrated in the beginning of this article, when given the price array from the book and a rod of size 7, the solution printed indicates cuts at positions 1 and 6.

I find it telling that the translation to F# is on par with the translation to Python, even though the structure of the pseudocode is quite imperative.

Conclusion #

You could, perhaps, say that if your mindset is predominantly imperative, implementing an algorithm using Python is likely easier than both F# or Java. If, on the other hand, you're mostly in an implementation mindset, but not strongly attached to whether the implementation should be imperative, object-oriented, or functional, I'd offer the conjecture that a language like F# is as implementation-friendly as a language like Python.

If, on the other hand, you're more focused on encapsulating and documenting how an existing API works, perhaps that shift of perspective suggests another evaluation of dynamically versus statically typed languages.

In any case, the F# code shown here is hardly idiomatic, so it might be illuminating to see what happens if we refactor it.

Next: Encapsulating rod-cutting.

This blog is totally free, but if you like it, please consider supporting it.

A restaurant sandwich

2024-12-16T19:11:00+00:00

An Impureim Sandwich example in C#.

When learning functional programming (FP) people often struggle with how to organize code. How do you discern and maintain purity? How do you do Dependency Injection in FP? What does a functional architecture look like?

A common FP design pattern is the Impureim Sandwich. The entry point of an application is always impure, so you push all impure actions to the boundary of the system. This is also known as Functional Core, Imperative Shell. If you have a micro-operation-based architecture, which includes all web-based systems, you can often get by with a 'sandwich'. Perform impure actions to collect all the data you need. Pass all data to a pure function. Finally, use impure actions to handle the referentially transparent return value from the pure function.

No design pattern applies universally, and neither does this one. In my experience, however, it's surprisingly often possible to apply this architecture. We're far past the Pareto principle's 80 percent.

Examples may help illustrate the pattern, as well as explore its boundaries. In this article you'll see how I refactored an entry point of a REST API, specifically the PUT handler in the sample code base that accompanies Code That Fits in Your Head.

Starting point #

As discussed in the book, the architecture of the sample code base is, in fact, Functional Core, Imperative Shell. This isn't, however, the main theme of the book, and the code doesn't explicitly apply the Impureim Sandwich. In spirit, that's actually what's going on, but it isn't clear from looking at the code. This was a deliberate choice I made, because I wanted to highlight other software engineering practices. This does have the effect, though, that the Impureim Sandwich is invisible.

For example, the book follows the 80/24 rule closely. This was a didactic choice on my part. Most code bases I've seen in the wild have far too big methods, so I wanted to hammer home the message that it's possible to develop and maintain a non-trivial code base with small code blocks. This meant, however, that I had to split up HTTP request handlers (in ASP.NET known as action methods on Controllers).

The most complex HTTP handler is the one that handles PUT requests for reservations. Clients use this action when they want to make changes to a restaurant reservation.

The action method actually invoked by an HTTP request is this Put method:

[HttpPut("restaurants/{restaurantId}/reservations/{id}")]
public async Task<ActionResult> Put(
    int restaurantId,
    string id,
    ReservationDto dto)
{
    if (dto is null)
        throw new ArgumentNullException(nameof(dto));
    if (!Guid.TryParse(id, out var rid))
        return new NotFoundResult();
 
    Reservation? reservation = dto.Validate(rid);
    if (reservation is null)
        return new BadRequestResult();
 
    var restaurant = await RestaurantDatabase
        .GetRestaurant(restaurantId).ConfigureAwait(false);
    if (restaurant is null)
        return new NotFoundResult();
 
    return
        await TryUpdate(restaurant, reservation).ConfigureAwait(false);
}

Since I, for pedagogical reasons, wanted to fit each method inside an 80x24 box, I made a few somewhat unnatural design choices. The above code is one of them. While I don't consider it completely indefensible, this method does a bit of up-front input validation and verification, and then delegates execution to the TryUpdate method.

This may seem all fine and dandy until you realize that the only caller of TryUpdate is that Put method. A similar thing happens in TryUpdate: It calls a method that has only that one caller. We may try to inline those two methods to see if we can spot the Impureim Sandwich.

Inlined Transaction Script #

Inlining those two methods leave us with a larger, Transaction Script-like entry point:

[HttpPut("restaurants/{restaurantId}/reservations/{id}")]
public async Task<ActionResult> Put(
    int restaurantId,
    string id,
    ReservationDto dto)
{
    if (dto is null)
        throw new ArgumentNullException(nameof(dto));
    if (!Guid.TryParse(id, out var rid))
        return new NotFoundResult();
 
    Reservation? reservation = dto.Validate(rid);
    if (reservation is null)
        return new BadRequestResult();
 
    var restaurant = await RestaurantDatabase
        .GetRestaurant(restaurantId).ConfigureAwait(false);
    if (restaurant is null)
        return new NotFoundResult();
 
    using var scope = new TransactionScope(
        TransactionScopeAsyncFlowOption.Enabled);
 
    var existing = await Repository
        .ReadReservation(restaurant.Id, reservation.Id)
        .ConfigureAwait(false);
    if (existing is null)
        return new NotFoundResult();
 
    var reservations = await Repository
        .ReadReservations(restaurant.Id, reservation.At)
        .ConfigureAwait(false);
    reservations =
        reservations.Where(r => r.Id != reservation.Id).ToList();
    var now = Clock.GetCurrentDateTime();
    var ok = restaurant.MaitreD.WillAccept(
        now,
        reservations,
        reservation);
    if (!ok)
        return NoTables500InternalServerError();
 
    await Repository.Update(restaurant.Id, reservation)
        .ConfigureAwait(false);
 
    scope.Complete();
 
    return new OkObjectResult(reservation.ToDto());
}

While I've definitely seen longer methods in the wild, this variation is already so big that it no longer fits on my laptop screen. I have to scroll up and down to read the whole thing. When looking at the bottom of the method, I have to remember what was at the top, because I can no longer see it.

A major point of Code That Fits in Your Head is that what limits programmer productivity is human cognition. If you have to scroll your screen because you can't see the whole method at once, does that fit in your brain? Chances are, it doesn't.

Can you spot the Impureim Sandwich now?

If you can't, that's understandable. It's not really clear because there's quite a few small decisions being made in this code. You could argue, for example, that this decision is referentially transparent:

if (existing is null)
    return new NotFoundResult();

These two lines of code are deterministic and have no side effects. The branch only returns a NotFoundResult when existing is null. Additionally, these two lines of code are surrounded by impure actions both before and after. Is this the Sandwich, then?

No, it's not. This is how idiomatic imperative code looks. To borrow a diagram from another article, pure and impure code is interleaved without discipline:

Even so, the above Put method implements the Functional Core, Imperative Shell architecture. The Put method is the Imperative Shell, but where's the Functional Core?

Shell perspective #

One thing to be aware of is that when looking at the Imperative Shell code, the Functional Core is close to invisible. This is because it's typically only a single function call.

In the above Put method, this is the Functional Core:

var ok = restaurant.MaitreD.WillAccept(
    now,
    reservations,
    reservation);
if (!ok)
    return NoTables500InternalServerError();

It's only a few lines of code, and had I not given myself the constraint of staying within an 80 character line width, I could have instead laid it out like this and inlined the ok flag:

if (!restaurant.MaitreD.WillAccept(now, reservations, reservation))
    return NoTables500InternalServerError();

Now that I try this, in fact, it turns out that this actually still stays within 80 characters. To be honest, I don't know exactly why I had that former code instead of this, but perhaps I found the latter alternative too dense. Or perhaps I simply didn't think of it. Code is rarely perfect. Usually when I revisit a piece of code after having been away from it for some time, I find some thing that I want to change.

In any case, that's beside the point. What matters here is that when you're looking through the Imperative Shell code, the Functional Core looks insignificant. Blink and you'll miss it. Even if we ignore all the other small pure decisions (the if statements) and pretend that we already have an Impureim Sandwich, from this viewpoint, the architecture looks like this:

It's tempting to ask, then: What's all the fuss about? Why even bother?

This is a natural experience for a code reader. After all, if you don't know a code base well, you often start at the entry point to try to understand how the application handles a certain stimulus. Such as an HTTP PUT request. When you do that, you see all of the Imperative Shell code before you see the Functional Core code. This could give you the wrong impression about the balance of responsibility.

After all, code like the above Put method has inlined most of the impure code so that it's right in your face. Granted, there's still some code hiding behind, say, Repository.ReadReservations, but a substantial fraction of the imperative code is visible in the method.

On the other hand, the Functional Core is just a single function call. If we inlined all of that code, too, the picture might rather look like this:

This obviously depends on the de-facto ratio of pure to imperative code. In any case, inlining the pure code is a thought experiment only, because the whole point of functional architecture is that a referentially transparent function fits in your head. Regardless of the complexity and amount of code hiding behind that MaitreD.WillAccept function, the return value is equal to the function call. It's the ultimate abstraction.

Standard combinators #

As I've already suggested, the inlined Put method looks like a Transaction Script. The cyclomatic complexity fortunately hovers on the magical number seven, and branching is exclusively organized around Guard Clauses. Apart from that, there are no nested if statements or for loops.

Apart from the Guard Clauses, this mostly looks like a procedure that runs in a straight line from top to bottom. The exception is all those small conditionals that may cause the procedure to exit prematurely. Conditions like this:

if (!Guid.TryParse(id, out var rid))
    return new NotFoundResult();

if (reservation is null)
    return new BadRequestResult();

Such checks occur throughout the method. Each of them are actually small pure islands amidst all the imperative code, but each is ad hoc. Each checks if it's possible for the procedure to continue, and returns a kind of error value if it decides that it's not.

Is there a way to model such 'derailments' from the main flow?

If you've ever encountered Scott Wlaschin's Railway Oriented Programming you may already see where this is going. Railway-oriented programming is a fantastic metaphor, because it gives you a way to visualize that you have, indeed, a main track, but then you have a side track that you may shuffle some trains too. And once the train is on the side track, it can't go back to the main track.

That's how the Either monad works. Instead of all those ad-hoc if statements, we should be able to replace them with what we may call standard combinators. The most important of these combinators is monadic bind. Composing a Transaction Script like Put with standard combinators will 'hide away' those small decisions, and make the Sandwich nature more apparent.

If we had had pure code, we could just have composed Either-valued functions. Unfortunately, most of what's going on in the Put method happens in a Task-based context. Thankfully, Either is one of those monads that nest well, implying that we can turn the combination into a composed TaskEither monad. The linked article shows the core TaskEither SelectMany implementations.

The way to encode all those small decisions between 'main track' or 'side track', then, is to wrap 'naked' values in the desired Task<Either<L, R>> container:

Task.FromResult(id.TryParseGuid().OnNull((ActionResult)new NotFoundResult()))

This little code snippet makes use of a few small building blocks that we also need to introduce. First, .NET's standard TryParse APIs don't, compose, but since they're isomorphic to Maybe-valued functions, you can write an adapter like this:

public static Guid? TryParseGuid(this string candidate)
{
    if (Guid.TryParse(candidate, out var guid))
        return guid;
    else
        return null;
}

In this code base, I treat nullable reference types as equivalent to the Maybe monad, but if your language doesn't have that feature, you can use Maybe instead.

To implement the Put method, however, we don't want nullable (or Maybe) values. We need Either values, so we may introduce a natural transformation:

public static Either<L, R> OnNull<L, R>(this R? candidate, L left) where R : struct
{
    if (candidate.HasValue)
        return Right<L, R>(candidate.Value);
 
    return Left<L, R>(left);
}

In Haskell one might just make use of the built-in Maybe catamorphism:

ghci> maybe (Left "boo!") Right $ Just 123
Right 123
ghci> maybe (Left "boo!") Right $ Nothing
Left "boo!"

Such conversions from Maybe to Either hover just around the Fairbairn threshold, but since we are going to need it more than once, it makes sense to add a specialized OnNull transformation to the C# code base. The one shown here handles nullable value types, but the code base also includes an overload that handles nullable reference types. It's almost identical.

Support for query syntax #

There's more than one way to consume monadic values in C#. While many C# developers like LINQ, most seem to prefer the familiar method call syntax; that is, just call the Select, SelectMany, and Where methods as the normal extension methods they are. Another option, however, is to use query syntax. This is what I'm aiming for here, since it'll make it easier to spot the Impureim Sandwich.

You'll see the entire sandwich later in the article. Before that, I'll highlight details and explain how to implement them. You can always scroll down to see the end result, and then scroll back here, if that's more to your liking.

The sandwich starts by parsing the id into a GUID using the above building blocks:

var sandwich =
    from rid in Task.FromResult(id.TryParseGuid().OnNull((ActionResult)new NotFoundResult()))

It then immediately proceeds to Validate (parse, really) the dto into a proper Domain Model:

from reservation in dto.Validate(rid).OnNull((ActionResult)new BadRequestResult())

Notice that the second from expression doesn't wrap the result with Task.FromResult. How does that work? Is the return value of dto.Validate already a Task? No, this works because I added 'degenerate' SelectMany overloads:

public static Task<Either<L, R1>> SelectMany<L, R, R1>(
    this Task<Either<L, R>> source,
    Func<R, Either<L, R1>> selector)
{
    return source.SelectMany(x => Task.FromResult(selector(x)));
}
 
public static Task<Either<L, R1>> SelectMany<L, U, R, R1>(
    this Task<Either<L, R>> source,
    Func<R, Either<L, U>> k,
    Func<R, U, R1> s)
{
    return source.SelectMany(x => k(x).Select(y => s(x, y)));
}

Notice that the selector only produces an Either<L, R1> value, rather than Task<Either<L, R1>>. This allows query syntax to 'pick up' the previous value (rid, which is 'really' a Task<Either<ActionResult, Guid>>) and continue with a function that doesn't produce a Task, but rather just an Either value. The first of these two overloads then wraps that Either value and wraps it with Task.FromResult. The second overload is just the usual ceremony that enables query syntax.

Why, then, doesn't the sandwich use the same trick for rid? Why does it explicitly call Task.FromResult?

As far as I can tell, this is because of type inference. It looks as though the C# compiler infers the monad's type from the first expression. If I change the first expression to

from rid in id.TryParseGuid().OnNull((ActionResult)new NotFoundResult())

the compiler thinks that the query expression is based on Either<L, R>, rather than Task<Either<L, R>>. This means that once we run into the first Task value, the entire expression no longer works.

By explicitly wrapping the first expression in a Task, the compiler correctly infers the monad we'd like it to. If there's a more elegant way to do this, I'm not aware of it.

Values that don't fail #

The sandwich proceeds to query various databases, using the now-familiar OnNull combinators to transform nullable values to Either values.

from restaurant in RestaurantDatabase
    .GetRestaurant(restaurantId)
    .OnNull((ActionResult)new NotFoundResult())
from existing in Repository
    .ReadReservation(restaurant.Id, reservation.Id)
    .OnNull((ActionResult)new NotFoundResult())

This works like before because both GetRestaurant and ReadReservation are queries that may fail to return a value. Here's the interface definition of ReadReservation:

Task<Reservation?> ReadReservation(int restaurantId, Guid id);

Notice the question mark that indicates that the result may be null.

The GetRestaurant method is similar.

The next query that the sandwich has to perform, however, is different. The return type of the ReadReservations method is Task<IReadOnlyCollection<Reservation>>. Notice that the type contained in the Task is not nullable. Barring database connection errors, this query can't fail. If it finds no data, it returns an empty collection.

Since the value isn't nullable, we can't use OnNull to turn it into a Task<Either<L, R>> value. We could try to use the Right creation function for that.

public static Either<L, R> Right<L, R>(R right)
{
    return Either<L, R>.Right(right);
}

This works, but is awkward:

from reservations in Repository
    .ReadReservations(restaurant.Id, reservation.At)
    .Traverse(rs => Either.Right<ActionResult, IReadOnlyCollection<Reservation>>(rs))

The problem with calling Either.Right is that while the compiler can infer which type to use for R, it doesn't know what the L type is. Thus, we have to tell it, and we can't tell it what L is without also telling it what R is. Even though it already knows that.

In such scenarios, the F# compiler can usually figure it out, and GHC always can (unless you add some exotic language extensions to your code). C# doesn't have any syntax that enables you to tell the compiler about only the type that it doesn't know about, and let it infer the rest.

All is not lost, though, because there's a little trick you can use in cases such as this. You can let the C# compiler infer the R type so that you only have to tell it what L is. It's a two-stage process. First, define an extension method on R:

public static RightBuilder<R> ToRight<R>(this R right)
{
    return new RightBuilder<R>(right);
}

The only type argument on this ToRight method is R, and since the right parameter is of the type R, the C# compiler can always infer the type of R from the type of right.

What's RightBuilder<R>? It's this little auxiliary class:

public sealed class RightBuilder<R>
{
    private readonly R right;
 
    public RightBuilder(R right)
    {
        this.right = right;
    }
 
    public Either<L, R> WithLeft<L>()
    {
        return Either.Right<L, R>(right);
    }
}

The code base for Code That Fits in Your Head was written on .NET 3.1, but today you could have made this a record instead. The only purpose of this class is to break the type inference into two steps so that the R type can be automatically inferred. In this way, you only need to tell the compiler what the L type is.

from reservations in Repository
    .ReadReservations(restaurant.Id, reservation.At)
    .Traverse(rs => rs.ToRight().WithLeft<ActionResult>())

As indicated, this style of programming isn't language-neutral. Even if you find this little trick neat, I'd much rather have the compiler just figure it out for me. The entire sandwich query expression is already defined as working with Task<Either<ActionResult, R>>, and the L type can't change like the R type can. Functional compilers can figure this out, and while I intend this article to show object-oriented programmers how functional programming sometimes work, I don't wish to pretend that it's a good idea to write code like this in C#. I've covered that ground already.

Not surprisingly, there's a mirror-image ToLeft/WithRight combo, too.

Working with Commands #

The ultimate goal with the Put method is to modify a row in the database. The method to do that has this interface definition:

Task Update(int restaurantId, Reservation reservation);

I usually call that non-generic Task class for 'asynchronous void' when explaining it to non-C# programmers. The Update method is an asynchronous Command.

Task and void aren't legal values for use with LINQ query syntax, so you have to find a way to work around that limitation. In this case I defined a local helper method to make it look like a Query:

async Task<Reservation> RunUpdate(int restaurantId, Reservation reservation, TransactionScope scope)
{
    await Repository.Update(restaurantId, reservation).ConfigureAwait(false);
    scope.Complete();
    return reservation;
}

It just echoes back the reservation parameter once the Update has completed. This makes it composable in the larger query expression.

You'll probably not be surprised when I tell you that both F# and Haskell handle this scenario gracefully, without requiring any hoop-jumping.

Full sandwich #

Those are all the building block. Here's the full sandwich definition, colour-coded like the examples in Impureim sandwich.

Task<Either<ActionResult, OkObjectResult>> sandwich =
    from rid in Task.FromResult(
        id.TryParseGuid().OnNull((ActionResult)new NotFoundResult()))
    from reservation in
        dto.Validate(rid).OnNull(
            (ActionResult)new BadRequestResult())
 
    from restaurant in RestaurantDatabase
            .GetRestaurant(restaurantId)
        .OnNull((ActionResult)new NotFoundResult())
    from existing in Repository
        .ReadReservation(restaurant.Id, reservation.Id)
        .OnNull((ActionResult)new NotFoundResult())
    from reservations in Repository
        .ReadReservations(restaurant.Id, reservation.At)
        .Traverse(rs => rs.ToRight().WithLeft<ActionResult>())
    let now = Clock.GetCurrentDateTime()
 
    let reservations2 =
            reservations.Where(r => r.Id != reservation.Id)
    let ok = restaurant.MaitreD.WillAccept(
        now,
        reservations2,
        reservation)
    from reservation2 in
        ok 
            ? reservation.ToRight().WithLeft<ActionResult>()
            : NoTables500InternalServerError().ToLeft().WithRight<Reservation>()
 
    from reservation3 in 
        RunUpdate(restaurant.Id, reservation2, scope)
        .Traverse(r => r.ToRight().WithLeft<ActionResult>())
    select new OkObjectResult(reservation3.ToDto());

As is evident from the colour-coding, this isn't quite a sandwich. The structure is honestly more accurately depicted like this:

As I've previously argued, while the metaphor becomes strained, this still works well as a functional-programming architecture.

As defined here, the sandwich value is a Task that must be awaited.

Either<ActionResult, OkObjectResult> either = await sandwich.ConfigureAwait(false);
return either.Match(x => x, x => x);

By awaiting the task, we get an Either value. The Put method, on the other hand, must return an ActionResult. How do you turn an Either object into a single object?

By pattern matching on it, as the code snippet shows. The L type is already an ActionResult, so we return it without changing it. If C# had had a built-in identity function, I'd used that, but idiomatically, we instead use the x => x lambda expression.

The same is the case for the R type, because OkObjectResult inherits from ActionResult. The identity expression automatically performs the type conversion for us.

This, by the way, is a recurring pattern with Either values that I run into in all languages. You've essentially computed an Either<T, T>, with the same type on both sides, and now you just want to return whichever T value is contained in the Either value. You'd think this is such a common pattern that Haskell has a nice abstraction for it, but even Hoogle fails to suggest a commonly-accepted function that does this. Apparently, either id id is considered below the Fairbairn threshold, too.

Conclusion #

This article presents an example of a non-trivial Impureim Sandwich. When I introduced the pattern, I gave a few examples. I'd deliberately chosen these examples to be simple so that they highlighted the structure of the idea. The downside of that didactic choice is that some commenters found the examples too simplistic. Therefore, I think that there's value in going through more complex examples.

The code base that accompanies Code That Fits in Your Head is complex enough that it borders on the realistic. It was deliberately written that way, and since I assume that the code base is familiar to readers of the book, I thought it'd be a good resource to show how an Impureim Sandwich might look. I explicitly chose to refactor the Put method, since it's easily the most complicated process in the code base.

The benefit of that code base is that it's written in a programming language that reach a large audience. Thus, for the reader curious about functional programming I thought that this could also be a useful introduction to some intermediate concepts.

As I've commented along the way, however, I wouldn't expect anyone to write production C# code like this. If you're able to do this, you're also able to do it in a language better suited for this programming paradigm.

This blog is totally free, but if you like it, please consider supporting it.

Implementation and usage mindsets

2024-12-09T21:45:00+00:00

A one-dimensional take on the enduring static-versus-dynamic debate.

It recently occurred to me that one possible explanation for the standing, and probably never-ending, debate about static versus dynamic types may be that each camp have disjoint perspectives on the kinds of problems their favourite languages help them solve. In short, my hypothesis is that perhaps lovers of dynamically-typed languages often approach a problem from an implementation mindset, whereas proponents of static types emphasize usage.

I'll expand on this idea here, and then provide examples in two subsequent articles.

Background #

For years I've struggled to understand 'the other side'. While I'm firmly in the statically typed camp, I realize that many highly skilled programmers and thought leaders enjoy, or get great use out of, dynamically typed languages. This worries me, because it might indicate that I'm stuck in a local maximum.

In other words, just because I, personally, prefer static types, it doesn't follow that static types are universally better than dynamic types.

In reality, it's probably rather the case that we're dealing with a false dichotomy, and that the problem is really multi-dimensional.

"Let me stop you right there: I don't think there is a real dynamic typing versus static typing debate.

"What such debates normally are is language X vs language Y debates (where X happens to be dynamic and Y happens to be static)."

Kevlin Henney

Even so, I can't help thinking about such things. Am I missing something?

For the past few years, I've dabbled with Python to see what writing in a popular dynamically typed language is like. It's not a bad language, and I can clearly see how it's attractive. Even so, I'm still frustrated every time I return to some Python code after a few weeks or more. The lack of static types makes it hard for me to pick up, or revisit, old code.

A question of perspective? #

Whenever I run into a difference of opinion, I often interpret it as a difference in perspective. Perhaps it's my academic background as an economist, but I consider it a given that people have different motivations, and that incentives influence actions.

A related kind of analysis deals with problem definitions. Are we even trying to solve the same problem?

I've discussed such questions before, but in a different context. Here, it strikes me that perhaps programmers who gravitate toward dynamically typed languages are focused on another problem than the other group.

Again, I'd like to emphasize that I don't consider the world so black and white in reality. Some developers straddle the two camps, and as the above Kevlin Henney quote suggests, there really aren't only two kinds of languages. C and Haskell are both statically typed, but the similarities stop there. Likewise, I don't know if it's fair to put JavaScript and Clojure in the same bucket.

That said, I'd still like to offer the following hypothesis, in the spirit that although all models are wrong, some are useful.

The idea is that if you're trying to solve a problem related to implementation, dynamically typed languages may be more suitable. If you're trying to implement an algorithm, or even trying to invent one, a dynamic language seems useful. One year, I did a good chunk of Advent of Code in Python, and didn't find it harder than in Haskell. (I ultimately ran out of steam for reasons unrelated to Python.)

On the other hand, if your main focus may be usage of your code, perhaps you'll find a statically typed language more useful. At least, I do. I can use the static type system to communicate how my APIs work. How to instantiate my classes. How to call my functions. How return values are shaped. In other words, the preconditions, invariants, and postconditions of my reusable code: Encapsulation.

Examples #

Some examples may be in order. In the next two articles, I'll first examine how easy it is to implement an algorithm in various programming languages. Then I'll discuss how to encapsulate that algorithm.

The articles will both discuss the rod-cutting problem from Introduction to Algorithms, but I'll introduce the problem in the next article.

Conclusion #

I'd be naive if I believed that a single model can fully explain why some people prefer dynamically typed languages, and others rather like statically typed languages. Even so, suggesting a model helps me understand how to analyze problems.

My hypothesis is that dynamically typed languages may be suitable for implementing algorithms, whereas statically typed languages offer better encapsulation.

This may be used as a heuristic for 'picking the right tool for the job'. If I need to suss out an algorithm, perhaps I should do it in Python. If, on the other hand, I need to publish a reusable library, perhaps Haskell is a better choice.

Next: Implementing rod-cutting.

This blog is totally free, but if you like it, please consider supporting it.

Short-circuiting an asynchronous traversal

2024-12-02T09:32:00+00:00

Another C# example.

This article is a continuation of an earlier post about refactoring a piece of imperative code to a functional architecture. It all started with a Stack Overflow question, but read the previous article, and you'll be up to speed.

Imperative outset #

To begin, consider this mostly imperative code snippet:

var storedItems = new List<ShoppingListItem>();
var failedItems = new List<ShoppingListItem>();
var state = (storedItems, failedItems, hasError: false);
foreach (var item in itemsToUpdate)
{
    OneOf<ShoppingListItem, NotFound, Error> updateResult = await UpdateItem(item, dbContext);
    state = updateResult.Match<(List<ShoppingListItem>, List<ShoppingListItem>, bool)>(
        storedItem => { storedItems.Add(storedItem); return state;  },
        notFound => { failedItems.Add(item); return state; },
        error => { state.hasError = true; return state; }
        );
    if (state.hasError)
        return Results.BadRequest();
}
 
await dbContext.SaveChangesAsync();
 
return Results.Ok(new BulkUpdateResult([.. storedItems], [.. failedItems]));

I'll recap a few points from the previous article. Apart from one crucial detail, it's similar to the other post. One has to infer most of the types and APIs, since the original post didn't show more code than that. If you're used to engaging with Stack Overflow questions, however, it's not too hard to figure out what most of the moving parts do.

The most non-obvious detail is that the code uses a library called OneOf, which supplies general-purpose, but rather abstract, sum types. Both the container type OneOf, as well as the two indicator types NotFound and Error are defined in that library.

The Match method implements standard Church encoding, which enables the code to pattern-match on the three alternative values that UpdateItem returns.

One more detail also warrants an explicit description: The itemsToUpdate object is an input argument of the type IEnumerable<ShoppingListItem>.

The major difference from before is that now the update process short-circuits on the first Error. If an error occurs, it stops processing the rest of the items. In that case, it now returns Results.BadRequest(), and it doesn't save the changes to dbContext.

The implementation makes use of mutable state and undisciplined I/O. How do you refactor it to a more functional design?

Short-circuiting traversal #

The standard Traverse function isn't lazy, or rather, it does consume the entire input sequence. Even various Haskell data structures I investigated do that. And yes, I even tried to traverse ListT. If there's a data structure that you can traverse with deferred execution of I/O-bound actions, I'm not aware of it.

That said, all is not lost, but you'll need to implement a more specialized traversal. While consuming the input sequence, the function needs to know when to stop. It can't do that on just any IEnumerable<T>, because it has no information about T.

If, on the other hand, you specialize the traversal to a sequence of items with more information, you can stop processing if it encounters a particular condition. You could generalize this to, say, IEnumerable<Either<L, R>>, but since I already have the OneOf library in scope, I'll use that, instead of implementing or pulling in a general-purpose Either data type.

In fact, I'll just use a three-way OneOf type compatible with the one that UpdateItem returns.

internal static async Task<IEnumerable<OneOf<T1, T2, Error>>> Sequence<T1, T2>(
    this IEnumerable<Task<OneOf<T1, T2, Error>>> tasks)
{
    var results = new List<OneOf<T1, T2, Error>>();
    foreach (var task in tasks)
    {
        var result = await task;
        results.Add(result);
        if (result.IsT2)
            break;
    }
    return results;
}

This implementation doesn't care what T1 or T2 is, so they're free to be ShoppingListItem and NotFound. The third type argument, on the other hand, must be Error.

The if conditional looks a bit odd, but as I wrote, the types that ship with the OneOf library have rather abstract APIs. A three-way OneOf value comes with three case tests called IsT0, IsT1, and IsT2. Notice that the library uses a zero-indexed naming convention for its type parameters. IsT2 returns true if the value is the third kind, in this case Error. If a task turns out to produce an Error, the Sequence method adds that one error, but then stops processing any remaining items.

Some readers may complain that the entire implementation of Sequence is imperative. It hardly matters that much, since the mutation doesn't escape the method. The behaviour is as functional as it's possible to make it. It's fundamentally I/O-bound, so we can't consider it a pure function. That said, if we hypothetically imagine that all the tasks are deterministic and have no side effects, the Sequence function does become a pure function when viewed as a black box. From the outside, you can't tell that the implementation is imperative.

It is possible to implement Sequence in a proper functional style, and it might make a good exercise. I think, however, that it'll be difficult in C#. In F# or Haskell I'd use recursion, and while you can do that in C#, I admit that I've lost sight of whether or not tail recursion is supported by the C# compiler.

Be that as it may, the traversal implementation doesn't change.

internal static Task<IEnumerable<OneOf<TResult, T2, Error>>> Traverse<T1, T2, TResult>(
    this IEnumerable<T1> items,
    Func<T1, Task<OneOf<TResult, T2, Error>>> selector)
{
    return items.Select(selector).Sequence();
}

You can now Traverse the itemsToUpdate:

// Impure
IEnumerable<OneOf<ShoppingListItem, NotFound<ShoppingListItem>, Error>> results =
    await itemsToUpdate.Traverse(item => UpdateItem(item, dbContext));

As the // Impure comment may suggest, this constitutes the first impure layer of an Impureim Sandwich.

Aggregating the results #

Since the above statement awaits the traversal, the results object is a 'pure' object that can be passed to a pure function. This does, however, assume that ShoppingListItem is an immutable object.

The next step must collect results and NotFound-related failures, but contrary to the previous article, it must short-circuit if it encounters an Error. This again suggests an Either-like data structure, but again I'll repurpose a OneOf container. I'll start by defining a seed for an aggregation (a left fold).

var seed =
    OneOf<(IEnumerable<ShoppingListItem>, IEnumerable<ShoppingListItem>), Error>
        .FromT0(([], []));

This type can be either a tuple or an error. The .NET tendency is often to define an explicit Result<TSuccess, TFailure> type, where TSuccess is defined to the left of TFailure. This, for example, is how F# defines Result types, and other .NET libraries tend to emulate that design. That's also what I've done here, although I admit that I'm regularly confused when going back and forth between F# and Haskell, where the Right case is idiomatically considered to indicate success.

As already discussed, OneOf follows a zero-indexed naming convention for type parameters, so FromT0 indicates the first (or leftmost) case. The seed is thus initialized with a tuple that contains two empty sequences.

As in the previous article, you can now use the Aggregate method to collect the result you want.

OneOf<BulkUpdateResult, Error> result = results
    .Aggregate(
        seed,
        (state, result) =>
            result.Match(
                storedItem => state.MapT0(
                    t => (t.Item1.Append(storedItem), t.Item2)),
                notFound => state.MapT0(
                    t => (t.Item1, t.Item2.Append(notFound.Item))),
                e => e))
    .MapT0(t => new BulkUpdateResult(t.Item1.ToArray(), t.Item2.ToArray()));

This expression is a two-step composition. I'll get back to the concluding MapT0 in a moment, but let's first discuss what happens in the Aggregate step. Since the state is now a discriminated union, the big lambda expression not only has to Match on the result, but it also has to deal with the two mutually exclusive cases in which state can be.

Although it comes third in the code listing, it may be easiest to explain if we start with the error case. Keep in mind that the seed starts with the optimistic assumption that the operation is going to succeed. If, however, we encounter an error e, we now switch the state to the Error case. Once in that state, it stays there.

The two other result cases map over the first (i.e. the success) case, appending the result to the appropriate sequence in the tuple t. Since these expressions map over the first (zero-indexed) case, these updates only run as long as the state is in the success case. If the state is in the error state, these lambda expressions don't run, and the state doesn't change.

After having collected the tuple of sequences, the final step is to map over the success case, turning the tuple t into a BulkUpdateResult. That's what MapT0 does: It maps over the first (zero-indexed) case, which contains the tuple of sequences. It's a standard functor projection.

Saving the changes and returning the results #

The final, impure step in the sandwich is to save the changes and return the results:

// Impure
return await result.Match(
    async bulkUpdateResult =>
    {
        await dbContext.SaveChangesAsync();
        return Results.Ok(bulkUpdateResult);
    },
    _ => Task.FromResult(Results.BadRequest()));

Note that it only calls dbContext.SaveChangesAsync() in case the result is a success.

Accumulating the bulk-update result #

So far, I've assumed that the final BulkUpdateResult class is just a simple immutable container without much functionality. If, however, we add some copy-and-update functions to it, we can use that to aggregate the result, instead of an anonymous tuple.

internal BulkUpdateResult Store(ShoppingListItem item) =>
    new([.. StoredItems, item], FailedItems);
 
internal BulkUpdateResult Fail(ShoppingListItem item) =>
    new(StoredItems, [.. FailedItems, item]);

I would have personally preferred the name NotFound instead of Fail, but I was going with the original post's failedItems terminology, and I thought that it made more sense to call a method Fail when it adds to a collection called FailedItems.

Adding these two instance methods to BulkUpdateResult simplifies the composing code:

// Pure
OneOf<BulkUpdateResult, Error> result = results
    .Aggregate(
        OneOf<BulkUpdateResult, Error>.FromT0(new([], [])),
        (state, result) =>
            result.Match(
                storedItem => state.MapT0(bur => bur.Store(storedItem)),
                notFound => state.MapT0(bur => bur.Fail(notFound.Item)),
                e => e));

This variation starts with an empty BulkUpdateResult and then uses Store or Fail as appropriate to update the state. The final, impure step of the sandwich remains the same.

Conclusion #

It's a bit more tricky to implement a short-circuiting traversal than the standard traversal. You can, still, implement a specialized Sequence or Traverse method, but it requires that the input stream carries enough information to decide when to stop processing more items. In this article, I used a specialized three-way union, but you could generalize this to use a standard Either or Result type.

This blog is totally free, but if you like it, please consider supporting it.

Nested monads

2024-11-25T07:31:00+00:00

You can stack some monads in such a way that the composition is also a monad.

This article is part of a series of articles about functor relationships. In a previous article you learned that nested functors form a functor. You may have wondered if monads compose in the same way. Does a monad nested in a monad form a monad?

As far as I know, there's no universal rule like that, but some monads compose well. Fortunately, it's been my experience that the combinations that you need in practice are among those that exist and are well-known. In a Haskell context, it's often the case that you need to run some kind of 'effect' inside IO. Perhaps you want to use Maybe or Either nested within IO.

In .NET, you may run into a similar need to compose task-based programming with an effect. This happens more often in F# than in C#, since F# comes with other native monads (option and Result, to name the most common).

Abstract shape #

You'll see some real examples in a moment, but as usual it helps to outline what it is that we're looking for. Imagine that you have a monad. We'll call it F in keeping with tradition. In this article series, you've seen how two or more functors compose. When discussing the abstract shapes of things, we've typically called our two abstract functors F and G. I'll stick to that naming scheme here, because monads are functors (that you can flatten).

Now imagine that you have a value that stacks two monads: F<G<T>>. If the inner monad G is the 'right' kind of monad, that configuration itself forms a monad.

In the diagram, I've simply named the combined monad FG, which is a naming strategy I've seen in the real world, too: TaskResult, etc.

As I've already mentioned, if there's a general theorem that says that this is always possible, I'm not aware of it. To the contrary, I seem to recall reading that this is distinctly not the case, but the source escapes me at the moment. One hint, though, is offered in the documentation of Data.Functor.Compose:

"The composition of applicative functors is always applicative, but the composition of monads is not always a monad."

Thankfully, the monads that you mostly need to compose do, in fact, compose. They include Maybe, Either, State, Reader, and Identity (okay, that one maybe isn't that useful). In other words, any monad F that composes with e.g. Maybe, that is, F<Maybe<T>>, also forms a monad.

Notice that it's the 'inner' monad that determines whether composition is possible. Not the 'outer' monad.

For what it's worth, I'm basing much of this on my personal experience, which was again helpfully guided by Control.Monad.Trans.Class. I don't, however, wish to turn this article into an article about monad transformers, because if you already know Haskell, you can read the documentation and look at examples. And if you don't know Haskell, the specifics of monad transformers don't readily translate to languages like C# or F#.

The conclusions do translate, but the specific language mechanics don't.

Let's look at some common examples.

TaskMaybe monad #

We'll start with a simple, yet realistic example. The article Asynchronous Injection shows a simple operation that involves reading from a database, making a decision, and potentially writing to the database. The final composition, repeated here for your convenience, is an asynchronous (that is, Task-based) process.

return await Repository.ReadReservations(reservation.Date)
    .Select(rs => maîtreD.TryAccept(rs, reservation))
    .SelectMany(m => m.Traverse(Repository.Create))
    .Match(InternalServerError("Table unavailable"), Ok);

The problem here is that TryAccept returns Maybe<Reservation>, but since the overall workflow already 'runs in' an asynchronous monad (Task), the monads are now nested as Task<Maybe<T>>.

The way I dealt with that issue in the above code snippet was to rely on a traversal, but it's actually an inelegant solution. The way that the SelectMany invocation maps over the Maybe<Reservation> m is awkward. Instead of composing a business process, the scaffolding is on display, so to speak. Sometimes this is unavoidable, but at other times, there may be a better way.

In my defence, when I wrote that article in 2019 I had another pedagogical goal than teaching nested monads. It turns out, however, that you can rewrite the business process using the Task<Maybe<T>> stack as a monad in its own right.

A monad needs two functions: return and either bind or join. In C# or F#, you can often treat return as 'implied', in the sense that you can always wrap new Maybe<T> in a call to Task.FromResult. You'll see that in a moment.

While you can be cavalier about monadic return, you'll need to explicitly implement either bind or join. In this case, it turns out that the sample code base already had a SelectMany implementation:

public static async Task<Maybe<TResult>> SelectMany<T, TResult>(
    this Task<Maybe<T>> source,
    Func<T, Task<Maybe<TResult>>> selector)
{
    Maybe<T> m = await source;
    return await m.Match(
        nothing: Task.FromResult(new Maybe<TResult>()),
        just: x => selector(x));
}

The method first awaits the Maybe value, and then proceeds to Match on it. In the nothing case, you see the implicit return being used. In the just case, the SelectMany method calls selector with whatever x value was contained in the Maybe object. The result of calling selector already has the desired type Task<Maybe<TResult>>, so the implementation simply returns that value without further ado.

This enables you to rewrite the SelectMany call in the business process so that it instead looks like this:

return await Repository.ReadReservations(reservation.Date)
    .Select(rs => maîtreD.TryAccept(rs, reservation))
    .SelectMany(r => Repository.Create(r).Select(i => new Maybe<int>(i)))
    .Match(InternalServerError("Table unavailable"), Ok);

At first glance, it doesn't look like much of an improvement. To be sure, the lambda expression within the SelectMany method no longer operates on a Maybe value, but rather on the Reservation Domain Model r. On the other hand, we're now saddled with that graceless Select(i => new Maybe<int>(i)).

Had this been Haskell, we could have made this more succinct by eta reducing the Maybe case constructor and used the <$> infix operator instead of fmap; something like Just <$> create r. In C#, on the other hand, we can do something that Haskell doesn't allow. We can overload the SelectMany method:

public static Task<Maybe<TResult>> SelectMany<T, TResult>(
    this Task<Maybe<T>> source,
    Func<T, Task<TResult>> selector)
{
    return source.SelectMany(x => selector(x).Select(y => new Maybe<TResult>(y)));
}

This overload generalizes the 'pattern' exemplified by the above business process composition. Instead of a specific method call, it now works with any selector function that returns Task<TResult>. Since selector only returns a Task<TResult> value, and not a Task<Maybe<TResult>> value, as actually required in this nested monad, the overload has to map (that is, Select) the result by wrapping it in a new Maybe<TResult>.

This now enables you to improve the business process composition to something more readable.

return await Repository.ReadReservations(reservation.Date)
    .Select(rs => maîtreD.TryAccept(rs, reservation))
    .SelectMany(Repository.Create)
    .Match(InternalServerError("Table unavailable"), Ok);

It even turned out to be possible to eta reduce the lambda expression instead of the (also valid, but more verbose) r => Repository.Create(r).

If you're interested in the sample code, I've pushed a branch named use-monad-stack to the GitHub repository.

Not surprisingly, the F# bind function is much terser:

let bind f x = async {
    match! x with
    | Some x' -> return! f x'
    | None -> return None }

You can find that particular snippet in the code base that accompanies the article Refactoring registration flow to functional architecture, although as far as I can tell, it's not actually in use in that code base. I probably just added it because I could.

You can find Haskell examples of combining MaybeT with IO in various articles on this blog. One of them is Dependency rejection.

TaskResult monad #

A similar, but slightly more complex, example involves nesting Either values in asynchronous workflows. In some languages, such as F#, Either is rather called Result, and asynchronous workflows are modelled by a Task container, as already demonstrated above. Thus, on .NET at least, this nested monad is often called TaskResult, but you may also see AsyncResult, AsyncEither, or other combinations. Depending on the programming language, such names may be used only for modules, and not for the container type itself. In C# or F# code, for example, you may look in vain after a class called TaskResult<T>, but rather find a TaskResult static class or module.

In C# you can define monadic bind like this:

public static async Task<Either<L, R1>> SelectMany<L, R, R1>(
    this Task<Either<L, R>> source,
    Func<R, Task<Either<L, R1>>> selector)
{
    if (source is null)
        throw new ArgumentNullException(nameof(source));
 
    Either<L, R> x = await source.ConfigureAwait(false);
    return await x.Match(
        l => Task.FromResult(Either.Left<L, R1>(l)),
        selector).ConfigureAwait(false);
}

Here I've again passed the eta-reduced selector straight to the right case of the Either value, but r => selector(r) works, too.

The left case shows another example of 'implicit monadic return'. I didn't bother defining an explicit Return function, but rather use Task.FromResult(Either.Left<L, R1>(l)) to return a Task<Either<L, R1>> value.

As is the case with C#, you'll also need to add a special overload to enable the syntactic sugar of query expressions:

public static Task<Either<L, R1>> SelectMany<L, U, R, R1>(
    this Task<Either<L, R>> source,
    Func<R, Task<Either<L, U>>> k,
    Func<R, U, R1> s)
{
    return source.SelectMany(x => k(x).Select(y => s(x, y)));
}

You'll see a comprehensive example using these functions in a future article.

In F# I'd often first define a module with a few functions including bind, and then use those implementations to define a computation expression, but in one article, I jumped straight to the expression builder:

type AsyncEitherBuilder () =
    // Async<Result<'a,'c>> * ('a -> Async<Result<'b,'c>>)
    // -> Async<Result<'b,'c>>
    member this.Bind(x, f) =
        async {
            let! x' = x
            match x' with
            | Success s -> return! f s
            | Failure f -> return Failure f }
    // 'a -> 'a
    member this.ReturnFrom x = x
 
let asyncEither = AsyncEitherBuilder ()

That article also shows usage examples. Another article, A conditional sandwich example, shows more examples of using this nested monad, although there, the computation expression is named taskResult.

Stateful computations that may fail #

To be honest, you mostly run into a scenario where nested monads are useful when some kind of 'effect' (errors, mostly) is embedded in an I/O-bound computation. In Haskell, this means IO, in C# Task, and in F# either Task or Async.

Other combinations are possible, however, but I've rarely encountered a need for additional nested monads outside of Haskell. In multi-paradigmatic languages, you can usually find other good designs that address issues that you may occasionally run into in a purely functional language. The following example is a Haskell-only example. You can skip it if you don't know or care about Haskell.

Imagine that you want to keep track of some statistics related to a software service you offer. If the variance of some number (say, response time) exceeds 10 then you want to issue an alert that the SLA was violated. Apparently, in your system, reliability means staying consistent.

You have millions of observations, and they keep arriving, so you need an online algorithm. For average and variance we'll use Welford's algorithm.

The following code uses these imports:

import Control.Monad
import Control.Monad.Trans.State.Strict
import Control.Monad.Trans.Maybe

First, you can define a data structure to hold the aggregate values required for the algorithm, as well as an initial, empty value:

data Aggregate = Aggregate { count :: Int, meanA :: Double, m2 :: Double } deriving (Eq, Show)
 
emptyA :: Aggregate
emptyA = Aggregate 0 0 0

You can also define a function to update the aggregate values with a new observation:

update :: Aggregate -> Double -> Aggregate
update (Aggregate count mean m2) x =
  let count' = count + 1
      delta = x - mean
      mean' = mean + delta / fromIntegral count'
      delta2 = x - mean'
      m2' = m2 + delta * delta2
  in Aggregate count' mean' m2'

Given an existing Aggregate record and a new observation, this function implements the algorithm to calculate a new Aggregate record.

The values in an Aggregate record, however, are only intermediary values that you can use to calculate statistics such as mean, variance, and sample variance. You'll need a data type and function to do that, as well:

data Statistics =
  Statistics
    { mean :: Double, variance :: Double, sampleVariance :: Maybe Double }
    deriving (Eq, Show)
 
extractStatistics :: Aggregate -> Maybe Statistics
extractStatistics (Aggregate count mean m2) =
  if count < 1 then Nothing
  else
    let variance = m2 / fromIntegral count
        sampleVariance =
          if count < 2 then Nothing else Just $ m2 / fromIntegral (count - 1)
    in Just $ Statistics mean variance sampleVariance

This is where the computation becomes 'failure-prone'. Granted, we only have a real problem when we have zero observations, but this still means that we need to return a Maybe Statistics value in order to avoid division by zero.

(There might be other designs that avoid that problem, or you might simply decide to tolerate that edge case and code around it in other ways. I've decided to design the extractStatistics function in this particular way in order to furnish an example. Work with me here.)

Let's say that as the next step, you'd like to compose these two functions into a single function that both adds a new observation, computes the statistics, but also returns the updated Aggregate.

You could write it like this:

addAndCompute :: Double -> Aggregate -> Maybe (Statistics, Aggregate)
addAndCompute x agg = do
  let agg' = update agg x
  stats <- extractStatistics agg'
  return (stats, agg')

This implementation uses do notation to automate handling of Nothing values. Still, it's a bit inelegant with its two agg values only distinguishable by the prime sign after one of them, and the need to explicitly return a tuple of the value and the new state.

This is the kind of problem that the State monad addresses. You could instead write the function like this:

addAndCompute :: Double -> State Aggregate (Maybe Statistics)
addAndCompute x = do
  modify $ flip update x
  gets extractStatistics

You could actually also write it as a one-liner, but that's already a bit too terse to my liking:

addAndCompute :: Double -> State Aggregate (Maybe Statistics)
addAndCompute x = modify (`update` x) >> gets extractStatistics

And if you really hate your co-workers, you can always visit pointfree.io to entirely obscure that expression, but I digress.

The point is that the State monad amplifies the essential and eliminates the irrelevant.

Now you'd like to add a function that issues an alert if the variance is greater than 10. Again, you could write it like this:

monitor :: Double -> State Aggregate (Maybe String)
monitor x = do
  stats <- addAndCompute x
  case stats of
    Just Statistics { variance } -> return $
      if 10 < variance
      then Just "SLA violation"
      else Nothing
    Nothing -> return Nothing

But again, the code is graceless with its explicit handling of Maybe cases. Whenever you see code that matches Maybe cases and maps Nothing to Nothing, your spider sense should be tingling. Could you abstract that away with a functor or monad?

Yes you can! You can use the MaybeT monad transformer, which nests Maybe computations inside another monad. In this case State:

monitor :: Double -> State Aggregate (Maybe String)
monitor x = runMaybeT $ do
  Statistics { variance } <- MaybeT $ addAndCompute x
  guard (10 < variance)
  return "SLA Violation"

The function type is the same, but the implementation is much simpler. First, the code lifts the Maybe-valued addAndCompute result into MaybeT and pattern-matches on the variance. Since the code is now 'running in' a Maybe-like context, this line of code only executes if there's a Statistics value to extract. If, on the other hand, addAndCompute returns Nothing, the function already short-circuits there.

The guard works just like imperative Guard Clauses. The third line of code only runs if the variance is greater than 10. In that case, it returns an alert message.

The entire do workflow gets unwrapped with runMaybeT so that we return back to a normal stateful computation that may fail.

Let's try it out:

ghci> (evalState $ monitor 1 >> monitor 7) emptyA
Nothing
ghci> (evalState $ monitor 1 >> monitor 8) emptyA
Just "SLA Violation"

Good, rigorous testing suggests that it's working.

Conclusion #

You sometimes run into situations where monads are nested. This mostly happens in I/O-bound computations, where you may have a Maybe or Either value embedded inside Task or IO. This can sometimes make working with the 'inner' monad awkward, but in many cases there's a good solution at hand.

Some monads, like Maybe, Either, State, Reader, and Identity, nest nicely inside other monads. Thus, if your 'inner' monad is one of those, you can turn the nested arrangement into a monad in its own right. This may help simplify your code base.

In addition to the common monads listed here, there are few more exotic ones that also play well in a nested configuration. Additionally, if your 'inner' monad is a custom data structure of your own creation, it's up to you to investigate if it nests nicely in another monad. As far as I can tell, though, if you can make it nest in one monad (e.g Task, Async, or IO) you can probably make it nest in any monad.

Next: Software design isomorphisms.

This blog is totally free, but if you like it, please consider supporting it.

Collecting and handling result values

2024-11-18T07:39:00+00:00

The answer is traverse. It's always traverse.

I recently came across a Stack Overflow question about collecting and handling sum types (AKA discriminated unions or, in this case, result types). While the question was tagged functional-programming, the overall structure of the code was so imperative, with so much interleaved I/O, that it hardly qualified as functional architecture.

Instead, I gave an answer which involved a minimal change to the code. Subsequently, the original poster asked to see a more functional version of the code. That's a bit too large a task for a Stack Overflow answer, I think, so I'll do it here on the blog instead.

Further comments and discussion on the original post reveal that the poster is interested in two alternatives. I'll start with the alternative that's only discussed, but not shown, in the question. The motivation for this ordering is that this variation is easier to implement than the other one, and I consider it pedagogical to start with the simplest case.

I'll do that in this article, and then follow up with another article that covers the short-circuiting case.

Imperative outset #

To begin, consider this mostly imperative code snippet:

var storedItems = new List<ShoppingListItem>();
var failedItems = new List<ShoppingListItem>();
var errors = new List<Error>();
var state = (storedItems, failedItems, errors);
foreach (var item in itemsToUpdate)
{
    OneOf<ShoppingListItem, NotFound, Error> updateResult = await UpdateItem(item, dbContext);
    state = updateResult.Match<(List<ShoppingListItem>, List<ShoppingListItem>, List<Error>)>(
        storedItem => { storedItems.Add(storedItem); return state;  },
        notFound => { failedItems.Add(item); return state; },
        error => { errors.Add(error); return state; }
        );
}
 
await dbContext.SaveChangesAsync();
 
return Results.Ok(new BulkUpdateResult([.. storedItems], [.. failedItems], [.. errors]));

There's quite a few things to take in, and one has to infer most of the types and APIs, since the original post didn't show more code than that. If you're used to engaging with Stack Overflow questions, however, it's not too hard to figure out what most of the moving parts do.

The Match method implements standard Church encoding, which enables the code to pattern-match on the three alternative values that UpdateItem returns.

One more detail also warrants an explicit description: The itemsToUpdate object is an input argument of the type IEnumerable<ShoppingListItem>.

The implementation makes use of mutable state and undisciplined I/O. How do you refactor it to a more functional design?

Standard traversal #

I'll pretend that we only need to turn the above code snippet into a functional design. Thus, I'm ignoring that the code is most likely part of a larger code base. Because of the implied database interaction, the method isn't a pure function. Unless it's a top-level method (that is, at the boundary of the application), it doesn't exemplify larger-scale functional architecture.

That said, my goal is to refactor the code to an Impureim Sandwich: Impure actions first, then the meat of the functionality as a pure function, and then some more impure actions to complete the functionality. This strongly suggests that the first step should be to map over itemsToUpdate and call UpdateItem for each.

If, however, you do that, you get this:

IEnumerable<Task<OneOf<ShoppingListItem, NotFound, Error>>> results =
    itemsToUpdate.Select(item => UpdateItem(item, dbContext));

The results object is a sequence of tasks. If we consider Task as a surrogate for IO, each task should be considered impure, as it's either non-deterministic, has side effects, or both. This means that we can't pass results to a pure function, and that frustrates the ambition to structure the code as an Impureim Sandwich.

This is one of the most common problems in functional programming, and the answer is usually: Use a traversal.

IEnumerable<OneOf<ShoppingListItem, NotFound<ShoppingListItem>, Error>> results =
    await itemsToUpdate.Traverse(item => UpdateItem(item, dbContext));

Because this first, impure layer of the sandwich awaits the task, results is now an immutable value that can be passed to the pure step. This, by the way, assumes that ShoppingListItem is immutable, too.

Notice that I adjusted one of the cases of the discriminated union to NotFound<ShoppingListItem> rather than just NotFound. While the OneOf library ships with a NotFound type, it doesn't have a generic container of that name, so I defined it myself:

internal sealed record NotFound<T>(T Item);

I added it to make the next step simpler.

Aggregating the results #

The next step is to sort the results into three 'buckets', as it were.

// Pure
var seed =
    (
        Enumerable.Empty<ShoppingListItem>(),
        Enumerable.Empty<ShoppingListItem>(),
        Enumerable.Empty<Error>()
    );
var result = results.Aggregate(
    seed,
    (state, result) =>
        result.Match(
            storedItem => (state.Item1.Append(storedItem), state.Item2, state.Item3),
            notFound => (state.Item1, state.Item2.Append(notFound.Item), state.Item3),
            error => (state.Item1, state.Item2, state.Item3.Append(error))));

It's also possible to inline the seed value, but here I defined it in a separate expression in an attempt at making the code a little more readable. I don't know if I succeeded, because regardless of where it goes, it's hardly idiomatic to break tuple initialization over multiple lines. I had to, though, because otherwise the code would run too far to the right.

The lambda expression handles each result in results and uses Match to append the value to its proper 'bucket'. The outer result is a tuple of the three collections.

Saving the changes and returning the results #

The final, impure step in the sandwich is to save the changes and return the results:

// Impure
await dbContext.SaveChangesAsync();
return new OkResult(
    new BulkUpdateResult([.. result.Item1], [.. result.Item2], [.. result.Item3]));

To be honest, the last line of code is pure, but that's not unusual when it comes to Impureim Sandwiches.

Accumulating the bulk-update result #

So far, I've assumed that the final BulkUpdateResult class is just a simple immutable container without much functionality. If, however, we add some copy-and-update functions to it, we can use them to aggregate the result, instead of an anonymous tuple.

internal BulkUpdateResult Store(ShoppingListItem item) =>
    new([.. StoredItems, item], FailedItems, Errors);
 
internal BulkUpdateResult Fail(ShoppingListItem item) =>
    new(StoredItems, [.. FailedItems, item], Errors);
 
internal BulkUpdateResult Error(Error error) =>
    new(StoredItems, FailedItems, [.. Errors, error]);

Adding these three instance methods to BulkUpdateResult simplifies the composing code:

// Impure
IEnumerable<OneOf<ShoppingListItem, NotFound<ShoppingListItem>, Error>> results =
    await itemsToUpdate.Traverse(item => UpdateItem(item, dbContext));
 
// Pure
var result = results.Aggregate(
    new BulkUpdateResult([], [], []),
    (state, result) =>
        result.Match(
            storedItem => state.Store(storedItem),
            notFound => state.Fail(notFound.Item),
            error => state.Error(error)));
 
// Impure
await dbContext.SaveChangesAsync();
return new OkResult(result);

This variation starts with an empty BulkUpdateResult and then uses Store, Fail, or Error as appropriate to update the state.

Parallel Sequence #

If the tasks you want to traverse are thread-safe, you might consider making the traversal concurrent. You can use Task.WhenAll for that. It has the same type as Sequence, so if you can live with the extra non-determinism that comes with parallel execution, you can use that instead:

internal static async Task<IEnumerable<T>> Sequence<T>(this IEnumerable<Task<T>> tasks)
{
    return await Task.WhenAll(tasks);
}

Since the method signature doesn't change, the rest of the code remains unchanged.

Conclusion #

One of the most common stumbling blocks in functional programming is when you have a collection of values, and you need to perform an impure action (typically I/O) for each. This leaves you with a collection of impure values (Task in C#, Task or Async in F#, IO in Haskell, etc.). What you actually need is a single impure value that contains the collection of results.

The solution to this kind of problem is to traverse the collection, rather than mapping over it (with Select, map, fmap, or similar). Note that computer scientists often talk about traversing a data structure like a tree. This is a less well-defined use of the word, and not directly related. That said, you can also write Traverse and Sequence functions for trees.

This article used a Stack Overflow question as the starting point for an example showing how to refactor imperative code to an Impureim Sandwich.

This completes the first variation requested in the Stack Overflow question.

Next: Short-circuiting an asynchronous traversal.

This blog is totally free, but if you like it, please consider supporting it.

Traversals

2024-11-11T07:45:00+00:00

How to convert a list of tasks into an asynchronous list, and similar problems.

This article is part of a series of articles about functor relationships. In a previous article you learned about natural transformations, and then how functors compose. You can skip several of them if you like, but you might find the one about functor compositions relevant. Still, this article can be read independently of the rest of the series.

You can go a long way with just a single functor or monad. Consider how useful C#'s LINQ API is, or similar kinds of APIs in other languages - typically map and flatMap methods. These APIs work exclusively with the List monad (which is also a functor). Working with lists, sequences, or collections is so useful that many languages have other kinds of special syntax specifically aimed at working with multiple values: List comprehension.

Asynchronous monads like Task<T> or F#'s Async<'T> are another kind of functor so useful in their own right that languages have special async and await keywords to compose them.

Sooner or later, though, you run into situations where you'd like to combine two different functors.

Lists and tasks #

It's not unusual to combine collections and asynchrony. If you make an asynchronous database query, you could easily receive something like Task<IEnumerable<Reservation>>. This, in isolation, hardly causes problems, but things get more interesting when you need to compose multiple reads.

Consider a query like this:

public static Task<Foo> Read(int id)

What happens if you have a collection of IDs that you'd like to read? This happens:

var ids = new[] { 42, 1337, 2112 };
IEnumerable<Task<Foo>> fooTasks = ids.Select(id => Foo.Read(id));

You get a collection of Tasks, which may be awkward because you can't await it. Perhaps you'd rather prefer a single Task that contains a collection: Task<IEnumerable<Foo>>. In other words, you'd like to flip the functors:

IEnumerable<Task<Foo>>
Task<IEnumerable<Foo>>

The top type is what you have. The bottom type is what you'd like to have.

The combination of asynchrony and collections is so common that .NET has special methods to do that. I'll briefly mention one of these later, but what's the general solution to this problem?

Whenever you need to flip two functors, you need a traversal.

Sequence #

As is almost always the case, we can look to Haskell for a canonical definition of traversals - or, as the type class is called: Traversable.

A traversable functor is a functor that enables you to flip that functor and another functor, like the above C# example. In more succinct syntax:

t (f a) -> f (t a)

Here, t symbolises any traversable functor (like IEnumerable<T> in the above C# example), and f is another functor (like Task<T>, above). By flipping the functors I mean making t and f change places; just like IEnumerable and Task, above.

Thinking of functors as containers we might depict the function like this:

To the left, we have an outer functor t (e.g. IEnumerable) that contains another functor f (e.g. Task) that again 'contains' values of type a (in C# typically called T). We'd like to flip how the containers are nested so that f contains t.

Contrary to what you might expect, the function that does that isn't called traverse; it's called sequence. (For those readers who are interested in Haskell specifics, the function I'm going to be talking about is actually called sequenceA. There's also a function called sequence, but it's not as general. The reason for the odd names are related to the evolution of various Haskell type classes.)

The sequence function doesn't work for any old functor. First, t has to be a traversable functor. We'll get back to that later. Second, f has to be an applicative functor. (To be honest, I'm not sure if this is always required, or if it's possible to produce an example of a specific functor that isn't applicative, but where it's still possible to implement a sequence function. The Haskell sequenceA function has Applicative f as a constraint, but as far as I can tell, this only means that this is a sufficient requirement - not that it's necessary.)

Since tasks (e.g. Task<T>) are applicative functors (they are, because they are monads, and all monads are applicative functors), that second requirement is fulfilled for the above example. I'll show you how to implement a Sequence function in C# and how to use it, and then we'll return to the general discussion of what a traversable functor is:

public static Task<IEnumerable<T>> Sequence<T>(
    this IEnumerable<Task<T>> source)
{
    return source.Aggregate(
        Task.FromResult(Enumerable.Empty<T>()),
        async (acc, t) =>
        {
            var xs = await acc;
            var x = await t;
            return xs.Concat(new[] { x });
        });
}

This Sequence function enables you to flip any IEnumerable<Task<T>> to a Task<IEnumerable<T>>, including the above fooTasks:

Task<IEnumerable<Foo>> foosTask = fooTasks.Sequence();

You can also implement sequence in F#:

// Async<'a> list -> Async<'a list>
let sequence asyncs =
    let go acc t = async {
        let! xs = acc
        let! x  = t
        return List.append xs [x] }
    List.fold go (fromValue []) asyncs

and use it like this:

let fooTasks = ids |> List.map Foo.Read
let foosTask = fooTasks |> Async.sequence

For this example, I put the sequence function in a local Async module; it's not part of any published Async module.

These C# and F# examples are specific translations: From lists of tasks to a task of list. If you need another translation, you'll have to write a new function for that particular combination of functors. Haskell has more general capabilities, so that you don't have to write functions for all combinations. I'm not assuming that you know Haskell, however, so I'll proceed with the description.

Traversable functor #

The sequence function requires that the 'other' functor (the one that's not the traversable functor) is an applicative functor, but what about the traversable functor itself? What does it take to be a traversable functor?

I have to admit that I have to rely on Haskell specifics to a greater extent than normal. For most other concepts and abstractions in the overall article series, I've been able to draw on various sources, chief of which are Category Theory for Programmers. In various articles, I've cited my sources whenever possible. While I've relied on Haskell libraries for 'canonical' ways to represent concepts in a programming language, I've tried to present ideas as having a more universal origin than just Haskell.

When it comes to traversable functors, I haven't come across universal reasoning like that which gives rise to concepts like monoids, functors, Church encodings, or catamorphisms. This is most likely a failing on my part.

Traversals of the Haskell kind are, however, so useful that I find it appropriate to describe them. When consulting, it's a common solution to a lot of problems that people are having with functional programming.

Thus, based on Haskell's Data.Traversable, a traversable functor must:

be a functor
be a 'foldable' functor
define a sequence or traverse function

You've already seen examples of sequence functions, and I'm also assuming that (since you've made it so far in the article already) you know what a functor is. But what's a foldable functor?

Haskell comes with a Foldable type class. It defines a class of data that has a particular type of catamorphism. As I've outlined in my article on catamorphisms, Haskell's notion of a fold sometimes coincides with a (or 'the') catamorphism for a type, and sometimes not. For Maybe and List they do coincide, while they don't for Either or Tree. It's not that you can't define Foldable for Either or Tree, it's just that it's not 'the' general catamorphism for that type.

I can't tell whether Foldable is a universal abstraction, or if it's just an ad-hoc API that turns out to be useful in practice. It looks like the latter to me, but my knowledge is only limited. Perhaps I'll be wiser in a year or two.

I will, however, take it as licence to treat this topic a little less formally than I've done with other articles. While there are laws associated with Traversable, they are rather complex, so I'm going to skip them.

The above requirements will enable you to define traversable functors if you run into some more exotic ones, but in practice, the common functors List, Maybe, Either, Tree, and Identity are all traversable. That it useful to know. If any of those functors is the outer functor in a composition of functors, then you can flip them to the inner position as long as the other functor is an applicative functor.

Since IEnumerable<T> is traversable, and Task<T> (or Async<'T>) is an applicative functor, it's possible to use Sequence to convert IEnumerable<Task<Foo>> to Task<IEnumerable<Foo>>.

Traverse #

The C# and F# examples you've seen so far arrive at the desired type in a two-step process. First they produce the 'wrong' type with ids.Select(Foo.Read) or ids |> List.map Foo.Read, and then they use Sequence to arrive at the desired type.

When you use two expressions, you need two lines of code, and you also need to come up with a name for the intermediary value. It might be easier to chain the two function calls into a single expression:

Task<IEnumerable<Foo>> foosTask = ids.Select(Foo.Read).Sequence();

Or, in F#:

let foosTask = ids |> List.map Foo.Read |> Async.sequence

Chaining Select/map with Sequence/sequence is so common that it's a named function: traverse. In C#:

public static Task<IEnumerable<TResult>> Traverse<T, TResult>(
    this IEnumerable<T> source,
    Func<T, Task<TResult>> selector)
{
    return source.Select(selector).Sequence();
}

This makes usage a little easier:

Task<IEnumerable<Foo>> foosTask = ids.Traverse(Foo.Read);

In F# the implementation might be similar:

// ('a -> Async<'b>) -> 'a list -> Async<'b list>
let traverse f xs = xs |> List.map f |> sequence

Usage then looks like this:

let foosTask = ids |> Async.traverse Foo.Read

As you can tell, if you've already implemented sequence you can always implement traverse. The converse is also true: If you've already implemented traverse, you can always implement sequence. You'll see an example of that later.

A reusable idea #

If you know the .NET Task Parallel Library (TPL), you may demur that my implementation of Sequence seems like an inefficient version of Task.WhenAll, and that Traverse could be written like this:

public static async Task<IEnumerable<TResult>> Traverse<T, TResult>(
    this IEnumerable<T> source,
    Func<T, Task<TResult>> selector)
{
    return await Task.WhenAll(source.Select(selector));
}

This alternative is certainly possible. Whether it's more efficient I don't know; I haven't measured. As foreshadowed in the beginning of the article, the combination of collections and asynchrony is so common that .NET has special APIs to handle that. You may ask, then: What's the point?

The point of is that a traversable functor is a reusable idea.

You may be able to find existing APIs like Task.WhenAll to deal with combinations of collections and asynchrony, but what if you need to deal with asynchronous Maybe or Either? Or a List of Maybes?

There may be no existing API to flip things around - before you add it. Now you know that there's a (dare I say it?) design pattern you can implement.

Asynchronous Maybe #

Once people go beyond collections they often run into problems. You may, for example, decide to use the Maybe monad in order to model the presence or absence of a value. Then, once you combine Maybe-based decision values with asynchronous processesing, you may run into problems.

For example, in my article Asynchronous Injection I modelled the core domaim logic as returning Maybe<Reservation>. When handling an HTTP request, the application should use that value to determine what to do next. If the return value is empty it should do nothing, but when the Maybe value is populated, it should save the reservation in a data store using this method:

Task<int> Create(Reservation reservation)

Finally, if accepting the reservation, the HTTP handler (ReservationsController) should return the resevation ID, which is the int returned by Create. Please refer to the article for details. It also links to the sample code on GitHub.

The entire expression is, however, Task-based:

public async Task<IActionResult> Post(Reservation reservation)
{
    return await Repository.ReadReservations(reservation.Date)
        .Select(rs => maîtreD.TryAccept(rs, reservation))
        .SelectMany(m => m.Traverse(Repository.Create))
        .Match(InternalServerError("Table unavailable"), Ok);
}

The Select and SelectMany methods are defined on the Task monad. The m in the SelectMany lambda expression is the Maybe<Reservation> returned by TryAccept. What would happen if you didn't have a Traverse method?

Task<Maybe<Task<int>>> whatIsThis = Repository.ReadReservations(reservation.Date)
    .Select(rs => maîtreD.TryAccept(rs, reservation))
    .Select(m => m.Select(Repository.Create));

Notice that whatIsThis (so named because it's a temporary variable used to investigate the type of the expression so far) has an awkward type: Task<Maybe<Task<int>>>. That's a Task within a Maybe within a Task.

This makes it difficult to continue the composition and return an HTTP result.

Instead, use Traverse:

Task<Task<Maybe<int>>> whatIsThis = Repository.ReadReservations(reservation.Date)
    .Select(rs => maîtreD.TryAccept(rs, reservation))
    .Select(m => m.Traverse(Repository.Create));

This flips the inner Maybe<Task<int>> to Task<Maybe<int>>. Now you have a Maybe within a Task within a Task. The outer two Tasks are now nicely nested, and it's a job for a monad to remove one level of nesting. That's the reason that the final composition uses SelectMany instead of Select.

The Traverse function is implemented like this:

public static Task<Maybe<TResult>> Traverse<T, TResult>(
    this Maybe<T> source,
    Func<T, Task<TResult>> selector)
{
    return source.Match(
        nothing: Task.FromResult(new Maybe<TResult>()),
        just: async x => new Maybe<TResult>(await selector(x)));
}

The idea is reusable. You can also implement a similar traversal in F#:

// ('a -> Async<'b>) -> 'a option -> Async<'b option>
let traverse f = function
    | Some x -> async {
        let! x' = f x
        return Some x' }
    | None -> async { return None }

You can see the F# function as well as a usage example in the article Refactoring registration flow to functional architecture.

Sequence from traverse #

You've already seen that if you have a sequence function, you can implement traverse. I also claimed that the reverse is true: If you have traverse you can implement sequence.

When you've encountered these kinds of dual definitions a couple of times, you start to expect the ubiquitous identity function to make an appearance, and indeed it does:

let sequence x = traverse id x

That's the F# version where the identity function is built in as id. In C# you'd use a lambda expression:

public static Task<Maybe<T>> Sequence<T>(this Maybe<Task<T>> source)
{
    return source.Traverse(x => x);
}

Since C# doesn't come with a predefined identity function, it's idiomatic to use x => x instead.

Conclusion #

Traversals are useful when you need to 'flip' the order of two different, nested functors. The outer one must be a traversable functor, and the inner an applicative functor.

Common traversable functors are List, Maybe, Either, Tree, and Identity, but there are more than those. In .NET you often need them when combining them with Tasks. In Haskell, they are useful when combined with IO.

Next: Nested monads.

Comments

qfilip #

Thanks for this one. You might be interested in Andrew Lock's take on the whole subject as well.

2024-11-17 14:51 UTC

This blog is totally free, but if you like it, please consider supporting it.

Pendulum swing: no Haskell type annotation by default

2024-11-04T07:45:00+00:00

Are Haskell IDE plugins now good enough that you don't need explicit type annotations?

More than three years ago, I published a small article series to document that I'd changed my mind on various small practices. Belatedly, here comes a fourth article, which, frankly, is a cousin rather than a sibling. Still, it fits the overall theme well enough to become another instalment in the series.

Here, I consider using fewer Haskell type annotations, following a practice that I've always followed in F#.

To be honest, though, it's not that I've already applied the following practice for a long time, and only now write about it. It's rather that I feel the need to write this article to kick an old habit and start a new.

Inertia #

As I write in the dedication in Code That Fits in Your Head,

"To my parents:

"My mother, Ulla Seemann, to whom I owe my attention to detail.

"My father, Leif Seemann, from whom I inherited my contrarian streak."

Code That Fits in Your Head, dedication

One should always be careful simplifying one's personality to a simple, easy-to-understand model, but a major point here is that I have two traits that pull in almost the opposite direction.

Despite much work, I only make slow progress. My desire to make things neat and proper almost cancel out my tendency to go against the norms. I tend to automatically toe whatever line that exists until the cognitive dissonance becomes so great that I can no longer ignore it.

I then write an article for the blog to clarify my thoughts.

You may read what comes next and ask, what took you so long?!

I can only refer to the above. I may look calm on the surface, but underneath I'm paddling like the dickens. Despite much work, though, only limited progress is visible.

Nudge #

Haskell is a statically typed language with the most powerful type system I know my way around. The types carry so much information that one can often infer a function's contract from the type alone. This is also fortunate, since many Haskell libraries tend to have, shall we say, minimal documentation. Even so, I've often found myself able to figure out how to use an unfamiliar Haskell API by examining the various types that a library exports.

In fact, the type system is so powerful that it drives a specialized search engine. If you need a function with the type (String -> IO Int) -> [String] -> IO [Int] you can search for it. Hoogle will list all functions that match that type, including functions that are more abstract than your specialized need. You don't even have to imagine what the name might be.

Since the type system is so powerful, it's a major means of communication. Thus, it makes sense that GHC regularly issues a warning if a function lacks a type annotation.

While the compiler enables you to control which warnings are turned on, the missing-signatures warning is included in the popular all flag that most people, I take it, use. I do, at least.

If you forget to declare the type of a function, the compiler will complain:

src\SecurityManager.hs:15:1: warning: [GHC-38417] [-Wmissing-signatures]
    Top-level binding with no type signature:
      createUser :: (Monad m, Text.Printf.PrintfArg b,
                     Text.Printf.PrintfArg (t a), Foldable t, Eq (t a)) =>
                    (String -> m ()) -> m (t a) -> (t a -> b) -> m ()
   |
15 | createUser writeLine readLine encrypt = do
   | ^^^^^^^^^^

This is a strong nudge that you're supposed to give each function a type declaration, so I've been doing that for years. Neat and proper.

Of course, if you treat warnings as errors, as I recommend, the nudge becomes a law.

Learning from F# #

While I try to adopt the style and idioms of any language I work in, it's always annoyed me that I had to add a type annotation to a Haskell function. After all, the compiler can usually infer the type. Frankly, adding a type signature feels like redundant ceremony. It's like having to declare a function in a header file before being able to implement it in another file.

This particularly bothers me because I've long since abandoned type annotations in F#. As far as I can tell, most of the F# community has, too.

When you implement an F# function, you just write the implementation and let the compiler infer the type. (Code example from Zone of Ceremony.)

let inline consume quantity =
    let go (acc, xs) x =
        if quantity <= acc
        then (acc, Seq.append xs (Seq.singleton x))
        else (acc + x, xs)
    Seq.fold go (LanguagePrimitives.GenericZero, Seq.empty) >> snd

Since F# often has to interact with .NET code written in C#, you regularly have to add some type annotations to help the compiler along:

let average (timeSpans : NonEmpty<TimeSpan>) =
    [ timeSpans.Head ] @ List.ofSeq timeSpans.Tail
    |> List.averageBy (_.Ticks >> double)
    |> int64
    |> TimeSpan.FromTicks

Even so, I follow the rule of minimal annotations: Only add the type information required to compile, and let the compiler infer the rest. For example, the above average function has the inferred type NonEmpty<TimeSpan> -> TimeSpan. While I had to specify the input type in order to be able to use the Ticks property, I didn't have to specify the return type. So I didn't.

My impression from reading other people's F# code is that this is a common, albeit not universal, approach to type annotation.

This minimizes ceremony, since you only need to declare and maintain the types that the compiler can't infer. There's no reason to repeat the work that the compiler can already do, and in practice, if you do, it just gets in the way.

Motivation for explicit type definitions #

When I extol the merits of static types, proponents of dynamically typed languages often argue that the types are in the way. Granted, this is a discussion that I still struggle with, but based on my understanding of the argument, it seems entirely reasonable. After all, if you have to spend time declaring the type of each and every parameter, as well as a function's return type, it does seem to be in the way. This is only exacerbated if you later change your mind.

Programming is, to a large extend, an explorative activity. You start with one notion of how your code should be structured, but as you progress, you learn. You'll often have to go back and change existing code. This, as far as I can tell, is much easier in, say, Python or Clojure than in C# or Java.

If, however, one extrapolates from the experience with Java or C# to all statically typed languages, that would be a logical fallacy. My point with Zone of Ceremony was exactly that there's a group of languages 'to the right' of high-ceremony languages with low levels of ceremony. Even though they're statically typed.

I have to admit, however, that in that article I cheated a little in order to drive home a point. While you can write Haskell code in a low-ceremony style, the tooling (in the form of the all warning set, at least) encourages a high-ceremony style. Add those type definitions, even thought they're redundant.

It's not that I don't understand some of the underlying motivation behind that rule. Daniel Wagner enumerated several reasons in a 2013 Stack Overflow answer. Some of the reasons still apply, but on the other hand, the world has also moved on in the intervening decade.

To be honest, the Haskell IDE situation has always been precarious. One day, it works really well; the next day, I struggle with it. Over the years, though, things have improved.

There was a time when an explicit type definition was a indisputable help, because you couldn't rely on tools to light up and tell you what the inferred type was.

Today, on the other hand, the Haskell extension for Visual Studio Code automatically displays the inferred type above a function implementation:

To be clear, the top line that shows the type definition is not part of the source code. It's just shown by Visual Studio Code as a code lens (I think it's called), and it automatically changes if I edit the code in such a way that the type changes.

If you can rely on such automatic type information, it seems that an explicit type declaration is less useful. It's at least one less reason to add type annotations to the source code.

Ceremony example #

In order to explain what I mean by the types being in the way, I'll give an example. Consider the code example from the article Legacy Security Manager in Haskell. In it, I described how every time I made a change to the createUser action, I had to effectively remove and re-add the type declaration.

It doesn't have to be like that. If instead I'd started without type annotations, I could have moved forward without being slowed down by having to edit type definitions. Take the first edit, breaking the dependency on the console, as an example. Without type annotations, the createUser action would look exactly as before, just without the type declaration. Its type would still be IO ().

After the first edit, the first lines of the action now look like this:

createUser writeLine readLine = do
  () <- writeLine "Enter a username"
  -- ...

Even without a type definition, the action still has a type. The compiler infers it to be (Monad m, Eq a, IsChar a) => (String -> m ()) -> m [a] -> m (), which is certainly a bit of a mouthful, but exactly what I had explicitly added in the other article.

The code doesn't compile until I also change the main method to pass the new parameters:

main = createUser putStrLn getLine

You'd have to make a similar edit in, say, Python, although there'd be no compiler to remind you. My point isn't that this is better than a dynamically typed language, but rather that it's on par. The types aren't in the way.

We see the similar lack of required ceremony when the createUser action finally pulls in the comparePasswords and validatePassword functions:

createUser writeLine readLine encrypt = do
  () <- writeLine "Enter a username"
  username <- readLine
  writeLine "Enter your full name"
  fullName <- readLine
  writeLine "Enter your password"
  password <- readLine
  writeLine "Re-enter your password"
  confirmPassword <- readLine
 
  writeLine $ either
    id
    (printf "Saving Details for User (%s, %s, %s)" username fullName . encrypt)
    (validatePassword =<< comparePasswords password confirmPassword)

Again, there's no type annotation, and while the type actually does change to

(Monad m, PrintfArg b, PrintfArg (t a), Foldable t, Eq (t a)) =>
(String -> m ()) -> m (t a) -> (t a -> b) -> m ()

it impacts none of the existing code. Again, the types aren't in the way, and no ceremony is required.

Compare that inferred type signature with the explicit final type annotation in the previous article. The inferred type is much more abstract and permissive than the explicit declaration, although I also grant that Daniel Wagner had a point that you can make explicit type definitions more reader-friendly.

Flies in the ointment #

Do the inferred types communicate intent? That's debatable. For example, it's not immediately clear that the above t a allows String.

Another thing that annoys me is that I had to add that unit binding on the first line:

createUser writeLine readLine encrypt = do
  () <- writeLine "Enter a username"
  -- ...

The reason for that is that if I don't do that (that is, if I just write writeLine "Xyz" all the way), the compiler infers the type of writeLine to be String -> m b2, rather than just String -> m (). In effect, I want b2 ~ (), but because the compiler thinks that b2 may be anything, it issues an unused-do-bind warning.

The idiomatic way to resolve that situation is to add a type definition, but that's the situation I'm trying to avoid. Thus, my desire to do without annotations pushes me to write unnatural implementation code. This reminds me of the notion of test-induced damage. This is at best a disagreeable compromise.

It also annoys me that implementation details leak out to the inferred type, witnessed by the PrintfArg type constraint. What happens if I change the implementation to use list concatenation?

createUser writeLine readLine encrypt = do
  () <- writeLine "Enter a username"
  username <- readLine
  writeLine "Enter your full name"
  fullName <- readLine
  writeLine "Enter your password"
  password <- readLine
  writeLine "Re-enter your password"
  confirmPassword <- readLine
 
  let createMsg pwd =
        "Saving Details for User (" ++ username ++", " ++ fullName ++ ", " ++ pwd ++")"
  writeLine $ either
    id
    (createMsg . encrypt)
    (validatePassword =<< comparePasswords password confirmPassword)

If I do that, the type also changes:

Monad m => (String -> m ()) -> m [Char] -> ([Char] -> [Char]) -> m ()

While we get rid of the PrintfArg type constraint, the type becomes otherwise more concrete, now operating on String values (keeping in mind that String is a type synonym for [Char]).

The code still compiles, and all tests still pass, because the abstraction I've had in mind all along is essentially this last type.

The writeLine action should take a String and have some side effect, but return no data. The type String -> m () nicely models that, striking a fine balance between being sufficiently concrete to capture intent, but still abstract enough to be testable.

The readLine action should provide input String values, and again m String nicely models that concern.

Finally, encrypt is indeed a naked String endomorphism: String -> String.

With my decades of experience with object-oriented design, it still strikes me as odd that implementation details can make a type more abstract, but once you think it over, it may be okay.

More liberal abstractions #

The inferred types are consistently more liberal than the abstraction I have in mind, which is

Monad m => (String -> m ()) -> m String -> (String -> String) -> m ()

In all cases, the inferred types include that type as a subset.

I hope that I've created the above diagram so that it makes sense, but the point I'm trying to get across is that the two type definitions in the lower middle are equivalent, and are the most specific types. That's the intended abstraction. Thinking of types as sets, all the other inferred types are supersets of that type, in various ways. Even though implementation details leak out in the shape of PrintfArg and IsChar, these are effectually larger sets.

This takes some getting used to: The implementation details are more liberal than the abstraction. This seems to be at odds with the Dependency Inversion Principle (DIP), which suggests that abstractions shouldn't depend on implementation details. I'm not yet sure what to make of this, but I suspect that this is more of problem of overlapping linguistic semantics than software design. What I mean is that I have a feeling that 'implementation detail' have more than one meaning. At least, in the perspective of the DIP, an implementation detail limits your options. For example, depending on a particular database technology is more constraining than depending on some abstract notion of what the persistence mechanism might be. Contrast this with an implementation detail such as the PrintfArg type constraint. It doesn't narrow your options; on the contrary, it makes the implementation more liberal.

Still, while an implementation should be liberal in what it accepts, it's probably not a good idea to publish such a capability to the wider world. After all, if you do, someone will eventually rely on it.

For internal use only #

Going through all these considerations, I think I'll revise my position as the following.

I'll forgo type annotations as long as I explore a problem space. For internal application use, this may effectively mean forever, in the sense that how you compose an application from smaller building blocks is likely to be in permanent flux. Here I have in mind your average web asset or other public-facing service that's in constant development. You keep adding new features, or changing domain logic as the overall business evolves.

As I've also recently discussed, Haskell is a great scripting language, and I think that here, too, I'll dial down the type definitions.

If I ever do another Advent of Code in Haskell, I think I'll also eschew explicit type annotations.

On the other hand, I can see that once an API stabilizes, you may want to lock it down. This may also apply to internal abstractions if you're working in a team and you explicitly want to communicate what a contract is.

If the code is a reusable library, I think that explicit type definitions are still required. Both for the reasons outlined by Daniel Wagner, and also to avoid being the victim of Hyrum's law.

That's why I phrase this pendulum swing as a new default. I'll begin programming without type definitions, but add them as needed. The point is rather that there may be parts of a code base where they're never needed, and then it's okay to keep going without them.

You can use a language pragma to opt out of the missing-signatures compiler warning on a module-by-module basis:

{-# OPTIONS_GHC -Wno-missing-signatures #-}

This will enable me to rely on type inference in parts of the code base, while keeping the build clean of compiler warnings.

Conclusion #

I've always appreciated the F# compiler's ability to infer types and just let type changes automatically ripple through the code base. For that reason, the Haskell norm of explicitly adding a (redundant) type annotation has always vexed me.

It often takes me a long time to reach seemingly obvious conclusions, such as: Don't always add type definitions to Haskell functions. Let the type inference engine do its job.

The reason it takes me so long to take such a small step is that I want to follow 'best practice'; I want to write idiomatic code. When the standard compiler-warning set complains about missing type definitions, it takes me significant deliberation to discard such advice. I could imagine other programmers being in the same situation, which is one reason I wrote this article.

The point isn't that type definitions are a universally bad idea. They aren't. Rather, the point is only that it's also okay to do without them in parts of a code base. Perhaps only temporarily, but in some cases maybe permanently.

The missing-signatures warning shouldn't, I now believe, be considered an absolute law, but rather a contextual rule.

This blog is totally free, but if you like it, please consider supporting it.

Functor compositions

2024-10-28T06:58:00+00:00

A functor nested within another functor forms a functor. With examples in C# and another language.

This article is part of a series of articles about functor relationships. In this one you'll learn about a universal composition of functors. In short, if you have one functor nested within another functor, then this composition itself gives rise to a functor.

Together with other articles in this series, this result can help you answer questions such as: Does this data structure form a functor?

Since functors tend to be quite common, and since they're useful enough that many programming languages have special support or syntax for them, the ability to recognize a potential functor can be useful. Given a type like Foo<T> (C# syntax) or Bar<T1, T2>, being able to recognize it as a functor can come in handy. One scenario is if you yourself have just defined this data type. Recognizing that it's a functor strongly suggests that you should give it a Select method in C#, a map function in F#, and so on.

Not all generic types give rise to a (covariant) functor. Some are rather contravariant functors, and some are invariant.

If, on the other hand, you have a data type where one functor is nested within another functor, then the data type itself gives rise to a functor. You'll see some examples in this article.

Abstract shape #

Before we look at some examples found in other code, it helps if we know what we're looking for. Imagine that you have two functors F and G, and you're now considering a data structure that contains a value where G is nested inside of F.

public sealed class GInF<T>
{
    private readonly F<G<T>> ginf;
 
    public GInF(F<G<T>> ginf)
    {
        this.ginf = ginf;
    }
 
    // Methods go here...

The GInF<T> class has a single class field. The type of this field is an F container, but 'inside' F there's a G functor.

This kind of data structure gives rise to a functor. Knowing that, you can give it a Select method:

public GInF<TResult> Select<TResult>(Func<T, TResult> selector)
{
    return new GInF<TResult>(ginf.Select(g => g.Select(selector)));
}

The composed Select method calls Select on the F functor, passing it a lambda expression that calls Select on the G functor. That nested Select call produces an F<G<TResult>> that the composed Select method finally wraps in a new GInF<TResult> object that it returns.

I'll have more to say about how this generalizes to a nested composition of more than two functors, but first, let's consider some examples.

Priority list #

A common configuration is when the 'outer' functor is a collection, and the 'inner' functor is some other kind of container. The article An immutable priority collection shows a straightforward example. The PriorityCollection<T> class composes a single class field:

private readonly Prioritized<T>[] priorities;

The priorities field is an array (a collection) of Prioritized<T> objects. That type is a simple record type:

public sealed record Prioritized<T>(T Item, byte Priority);

If we squint a little and consider only the parameter list, we may realize that this is fundamentally an 'embellished' tuple: (T Item, byte Priority). A pair forms a bifunctor, but in the Haskell Prelude a tuple is also a Functor instance over its rightmost element. In other words, if we'd swapped the Prioritized<T> constructor parameters, it might have naturally looked like something we could fmap:

ghci> fmap (elem 'r') (55, "foo")
(55,False)

Here we have a tuple of an integer and a string. Imagine that the number 55 is the priority that we give to the label "foo". This little ad-hoc example demonstrates how to map that tuple to another tuple with a priority, but now it instead holds a Boolean value indicating whether or not the string contained the character 'r' (which it didn't).

You can easily swap the elements:

ghci> import Data.Tuple
ghci> swap (55, "foo")
("foo",55)

This looks just like the Prioritized<T> parameter list. This also implies that if you originally have the parameter list in that order, you could swap it, map it, and swap it again:

ghci> swap $ fmap (elem 'r') $ swap ("foo", 55)
(False,55)

My point is only that Prioritized<T> is isomorphic to a known functor. In reality you rarely need to analyze things that thoroughly to come to that realization, but the bottom line is that you can give Prioritized<T> a lawful Select method:

public sealed record Prioritized<T>(T Item, byte Priority)
{
    public Prioritized<TResult> Select<TResult>(Func<T, TResult> selector)
    {
        return new(selector(Item), Priority);
    }
}

Hardly surprising, but since this article postulates that a functor of a functor is a functor, and since we already know that collections give rise to a functor, we should deduce that we can give PriorityCollection<T> a Select method. And we can:

public PriorityCollection<TResult> Select<TResult>(Func<T, TResult> selector)
{
    return new PriorityCollection<TResult>(
        priorities.Select(p => p.Select(selector)).ToArray());
}

Notice how much this implementation looks like the above GInF<T> 'shape' implementation.

Tree #

An example only marginally more complicated than the above is shown in A Tree functor. The Tree<T> class shown in that article contains two constituents:

private readonly IReadOnlyCollection<Tree<T>> children;
 
public T Item { get; }

Just like PriorityCollection<T> there's a collection, as well as a 'naked' T value. The main difference is that here, the collection is of the same type as the object itself: Tree<T>.

You've seen a similar example in the previous article, which also had a recursive data structure. If you assume, however, that Tree<T> gives rise to a functor, then so does the nested composition of putting it in a collection. This means, from the 'theorem' put forth in this article, that IReadOnlyCollection<Tree<T>> composes as a functor. Finally you have a product of a T (which is isomorphic to the Identity functor) and that composed functor. From Functor products it follows that that's a functor too, which explains why Tree<T> forms a functor. The article shows the Select implementation.

Binary tree Zipper #

In both previous articles you've seen pieces of the puzzle explaining why the binary tree Zipper gives rise to functor. There's one missing piece, however, that we can now finally address.

Recall that BinaryTreeZipper<T> composes these two objects:

public BinaryTree<T> Tree { get; }
public IEnumerable<Crumb<T>> Breadcrumbs { get; }

We've already established that both BinaryTree<T> and Crumb<T> form functors. In this article you've learned that a functor in a functor is a functor, which applies to IEnumerable<Crumb<T>>. Both of the above read-only properties are functors, then, which means that the entire class is a product of functors. The Select method follows:

public BinaryTreeZipper<TResult> Select<TResult>(Func<T, TResult> selector)
{
    return new BinaryTreeZipper<TResult>(
        Tree.Select(selector),
        Breadcrumbs.Select(c => c.Select(selector)));
}

Notice that this Select implementation calls Select on the 'outer' Breadcrumbs by calling Select on each Crumb<T>. This is similar to the previous examples in this article.

Other nested containers #

There are plenty of other examples of functors that contains other functor values. Asynchronous programming supplies its own family of examples.

The way that C# and many other languages model asynchronous or I/O-bound actions is to wrap them in a Task container. If the value inside the Task<T> container is itself a functor, you can make that a functor, too. Examples include Task<IEnumerable<T>>, Task<Maybe<T>> (or its close cousin Task<T?>; notice the question mark), Task<Result<T1, T2>>, etc. You'll run into such types every time you have an I/O-bound or concurrent operation that returns IEnumerable<T>, Maybe<T> etc. as an asynchronous result.

While you can make such nested task functors a functor in its own right, you rarely need that in languages with native async and await features, since those languages nudge you in other directions.

You can, however, run into other issues with task-based programming, but you'll see examples and solutions in a future article.

You'll run into other examples of nested containers with many property-based testing libraries. They typically define Test Data Generators, often called Gen<T>. For .NET, both FsCheck, Hedgehog, and CsCheck does this. For Haskell, QuickCheck, too, defines Gen a.

You often need to generate random collections, in which case you'd work with Gen<IEnumerable<T>> or a similar collection type. If you need random Maybe values, you'll work with Gen<Maybe<T>>, and so on.

On the other hand, sometimes you need to work with a collection of generators, such as seq<Gen<'a>>.

These are all examples of functors within functors. It's not a given that you must treat such a combination as a functor in its own right. To be honest, typically, you don't. On the other hand, if you find yourself writing Select within Select, or map within map, depending on your language, it might make your code more succinct and readable if you give that combination a specialized functor affordance.

Higher arities #

Like the previous two articles, the 'theorem' presented here generalizes to more than two functors. If you have a third H functor, then F<G<H<T>>> also gives rise to a functor. You can easily prove this by simple induction. We may first consider the base case. With a single functor (n = 1) any functor (say, F) is trivially a functor.

In the induction step (n > 1), you then assume that the n - 1 'stack' of functors already gives rise to a functor, and then proceed to prove that the configuration where all those nested functors are wrapped by yet another functor also forms a functor. Since the 'inner stack' of functors forms a functor (by assumption), you only need to prove that a configuration of the outer functor, and that 'inner stack', gives rise to a functor. You've seen how this works in this article, but I admit that a few examples constitute no proof. I'll leave you with only a sketch of this step, but you may consider using equational reasoning as demonstrated by Bartosz Milewski and then prove the functor laws for such a composition.

The Haskell Data.Functor.Compose module defines a general-purpose data type to compose functors. You may, for example, compose a tuple inside a Maybe inside a list:

thriceNested :: Compose [] (Compose Maybe ((,) Integer)) String
thriceNested = Compose [Compose (Just (42, "foo")), Compose Nothing, Compose (Just (89, "ba"))]

You can easily fmap that data structure, for example by evaluating whether the number of characters in each string is an odd number (if it's there at all):

ghci> fmap (odd . length) thriceNested
Compose [Compose (Just (42,True)),Compose Nothing,Compose (Just (89,False))]

The first element now has True as the second tuple element, since "foo" has an odd number of characters (3). The next element is Nothing, because Nothing maps to Nothing. The third element has False in the rightmost tuple element, since "ba" doesn't have an odd number of characters (it has 2).

Relations to monads #

A nested 'stack' of functors may remind you of the way that I prefer to teach monads: A monad is a functor your can flatten. In short, the definition is the ability to 'flatten' F<F<T>> to F<T>. A function that can do that is often called join or Flatten.

So far in this article, we've been looking at stacks of different functors, abstractly denoted F<G<T>>. There's no rule, however, that says that F and G may not be the same. If F = G then F<G<T>> is really F<F<T>>. This starts to look like the antecedent of the monad definition.

While the starting point may be the same, these notions are not equivalent. Yes, F<F<T>> may form a monad (if you can flatten it), but it does, universally, give rise to a functor. On the other hand, we can hardly talk about flattening F<G<T>>, because that would imply that you'd have to somehow 'throw away' either F or G. There may be specific functors (e.g. Identity) for which this is possible, but there's no universal law to that effect.

Not all 'stacks' of functors are monads. All monads, on the other hand, are functors.

Conclusion #

A data structure that configures one type of functor inside of another functor itself forms a functor. The examples shown in this article are mostly constrained to two functors, but if you have a 'stack' of three, four, or more functors, that arrangement still gives rise to a functor.

This is useful to know, particularly if you're working in a language with only partial support for functors. Mainstream languages aren't going to automatically turn such stacks into functors, in the way that Haskell's Compose container almost does. Thus, knowing when you can safely give your generic types a Select method or map function may come in handy.

To be honest, though, this result is hardly the most important 'theorem' concerning stacks of functors. In reality, you often run into situations where you do have a stack of functors, but they're in the wrong order. You may have a collection of asynchronous tasks, but you really need an asynchronous task that contains a collection of values. The next article addresses that problem.

Next: Traversals.

This blog is totally free, but if you like it, please consider supporting it.

Legacy Security Manager in Haskell

2024-10-21T06:14:00+00:00

A translation of the kata, and my first attempt at it.

In early 2013 Richard Dalton published an article about legacy code katas. The idea is to present a piece of 'legacy code' that you have to somehow refactor or improve. Of course, in order to make the exercise manageable, it's necessary to reduce it to some essence of what we might regard as legacy code. It'll only be one aspect of true legacy code. For the legacy Security Manager exercise, the main problem is that the code is difficult to unit test.

The original kata presents the 'legacy code' in C#, which may exclude programmers who aren't familiar with that language and platform. Since I find the exercise useful, I've previous published a port to Python. In this article, I'll port the exercise to Haskell, as well as walk through one attempt at achieving the goals of the kata.

The legacy code #

The original C# code is a static procedure that uses the Console API to ask a user a few simple questions, do some basic input validation, and print a message to the standard output stream. That's easy enough to port to Haskell:

module SecurityManager (createUser) where
 
import Text.Printf (printf)
 
createUser :: IO ()
createUser = do
  putStrLn "Enter a username"
  username <- getLine
  putStrLn "Enter your full name"
  fullName <- getLine
  putStrLn "Enter your password"
  password <- getLine
  putStrLn "Re-enter your password"
  confirmPassword <- getLine
 
  if password /= confirmPassword
  then
    putStrLn "The passwords don't match"
  else
    if length password < 8
    then
      putStrLn "Password must be at least 8 characters in length"
    else do
      -- Encrypt the password (just reverse it, should be secure)
      let array = reverse password
      putStrLn $
        printf "Saving Details for User (%s, %s, %s)" username fullName array

Notice how the Haskell code seems to suffer slightly from the Arrow code smell, which is a problem that the C# code actually doesn't exhibit. The reason is that when using Haskell in an 'imperative style' (which you can, after a fashion, with do notation), you can't 'exit early' from a an if check. The problem is that you can't have if-then without else.

Haskell has other language features that enable you to get rid of Arrow code, but in the spirit of the exercise, this would take us too far away from the original C# code. Making the code prettier should be a task for the refactoring exercise, rather than the starting point.

I've published the code to GitHub, if you want a leg up.

Combined with Richard Dalton's original article, that's all you need to try your hand at the exercise. In the rest of this article, I'll go through my own attempt at the exercise. That said, while this was my first attempt at the Haskell version of it, I've done it multiple times in C#, and once in Python. In other words, this isn't my first rodeo.

Break the dependency on the Console #

As warned, the rest of the article is a walkthrough of the exercise, so if you'd like to try it yourself, stop reading now. On the other hand, if you want to read on, but follow along in the GitHub repository, I've pushed the rest of the code to a branch called first-pass.

The first part of the exercise is to break the dependency on the console. In a language like Haskell where functions are first-class citizens, this part is trivial. I removed the type declaration, moved putStrLn and getLine to parameters and renamed them. Finally, I asked the compiler what the new type is, and added the new type signature.

import Text.Printf (printf, IsChar)
 
createUser :: (Monad m, Eq a, IsChar a) => (String -> m ()) -> m [a] -> m ()
createUser writeLine readLine = do
  writeLine "Enter a username"
  username <- readLine
  writeLine "Enter your full name"
  fullName <- readLine
  writeLine "Enter your password"
  password <- readLine
  writeLine "Re-enter your password"
  confirmPassword <- readLine
 
  if password /= confirmPassword
  then
    writeLine "The passwords don't match"
  else
    if length password < 8
    then
      writeLine "Password must be at least 8 characters in length"
    else do
      -- Encrypt the password (just reverse it, should be secure)
      let array = reverse password
      writeLine $
        printf "Saving Details for User (%s, %s, %s)" username fullName array

I also changed the main action of the program to pass putStrLn and getLine as arguments:

import SecurityManager (createUser)
 
main :: IO ()
main = createUser putStrLn getLine

Manual testing indicates that I didn't break any functionality.

Get the password comparison feature under test #

The next task is to get the password comparison feature under test. Over a small series of Git commits, I added these inlined, parametrized HUnit tests:

"Matching passwords" ~: do
  pw <- ["password", "12345678", "abcdefgh"]
  let actual = comparePasswords pw pw
  return $ Right pw ~=? actual
,
"Non-matching passwords" ~: do
  (pw1, pw2) <-
    [
      ("password", "PASSWORD"),
      ("12345678", "12345677"),
      ("abcdefgh", "bacdefgh"),
      ("aaa", "bbb")
    ]
  let actual = comparePasswords pw1 pw2
  return $ Left "The passwords don't match" ~=? actual

The resulting implementation is this comparePasswords function:

comparePasswords :: String -> String -> Either String String
comparePasswords pw1 pw2 =
  if pw1 == pw2
  then Right pw1
  else Left "The passwords don't match"

You'll notice that I chose to implement it as an Either-valued function. While I consider validation a solved problem, the usual solution involves some applicative validation container. In this exercise, validation is already short-circuiting, which means that we can use the standard monadic composition that Either affords.

At this point in the exercise, I just left the comparePasswords function there, without trying to use it within createUser. The reason for that is that Either-based composition is sufficiently different from if-then-else code that I wanted to get the entire system under test before I attempted that.

Get the password validation feature under test #

The third task of the exercise is to get the password validation feature under test. That's similar to the previous task. Once more, I'll show the tests first, and then the function driven by those tests, but I want to point out that both code artefacts came iteratively into existence through the usual red-green-refactor cycle.

"Validate short password" ~: do
  pw <- ["", "1", "12", "abc", "1234", "gtrex", "123456", "1234567"]
  let actual = validatePassword pw
  return $ Left "Password must be at least 8 characters in length" ~=? actual
,
"Validate long password" ~: do
  pw <- ["12345678", "123456789", "abcdefghij", "elevenchars"]
  let actual = validatePassword pw
  return $ Right pw ~=? actual

The resulting function is hardly surprising.

validatePassword :: String -> Either String String
validatePassword pw =
  if length pw < 8
  then Left "Password must be at least 8 characters in length"
  else Right pw

As in the previous step, I chose to postpone using this function from within createUser until I had a set of characterization tests. That may not be entirely in the spirit of the four subtasks of the exercise, but on the other hand, I intended to do more than just those four activities. The code here is actually simple enough that I could easily refactor without full test coverage, but recalling that this is a legacy code exercise, I find it warranted to pretend that it's complicated.

To be fair to the exercise, there'd also be a valuable exercise in attempting to extract each feature piecemeal, because it's not alway possible to add complete characterization test coverage to a piece of gnarly legacy code. Be that as it may, I've already done that kind of exercise in C# a few times, and I had a different agenda for the Haskell exercise. In short, I was curious about what sort of inferred type createUser would have, once I'd gone through all four subtasks. I'll return to that topic in a moment. First, I want to address the fourth subtask.

Allow different encryption algorithms to be used #

The final part of the exercise is to add a feature to allow different encryption algorithms to be used. Once again, when you're working in a language where functions are first-class citizens, and higher-order functions are idiomatic, one solution is easily at hand:

createUser :: (Monad m, Foldable t, Eq (t a), PrintfArg (t a), PrintfArg b)
           => (String -> m ()) -> m (t a) -> (t a -> b) -> m ()
createUser writeLine readLine encrypt = do
  writeLine "Enter a username"
  username <- readLine
  writeLine "Enter your full name"
  fullName <- readLine
  writeLine "Enter your password"
  password <- readLine
  writeLine "Re-enter your password"
  confirmPassword <- readLine
 
  if password /= confirmPassword
  then
    writeLine "The passwords don't match"
  else
    if length password < 8
    then
      writeLine "Password must be at least 8 characters in length"
    else do
      let array = encrypt password
      writeLine $
        printf "Saving Details for User (%s, %s, %s)" username fullName array

The only change I've made is to promote encrypt to a parameter. This, of course, ripples through the code that calls the action, but currently, that's only the main action, where I had to add reverse as a third argument:

main :: IO ()
main = createUser putStrLn getLine reverse

Before I made the change, I removed the type annotation from createUser, because adding a parameter causes the type to change. Keeping the type annotation would have caused a compilation error. Eschewing type annotations makes it easier to make changes. Once I'd made the change, I added the new annotation, inferred by the Haskell Visual Studio Code extension.

I was curious what kind of abstraction would arise. Would it be testable in some way?

Testability #

Consider the inferred type of createUser above. It's quite abstract, and I was curious if it was flexible enough to allow testability without adding test-induced damage. In short, in object-oriented programming, you often need to add Dependency Injection to make code testable, and the valid criticism is that this makes code more complicated than it would otherwise have been. I consider such reproval justified, although I disagree with the conclusion. It's not the desire for testability that causes the damage, but rather that object-oriented design is at odds with testability.

That's my conjecture, anyway, so I'm always curious when working with other paradigms like functional programming. Is idiomatic code already testable, or do you need to 'do damage to it' in order to make it testable?

As a Haskell action goes, I would consider its type fairly idiomatic. The code, too, is straightforward, although perhaps rather naive. It looks like beginner Haskell, and as we'll see later, we can rewrite it to be more elegant.

Before I started the exercise, I wondered whether it'd be necessary to use free monads to model pure command-line interactions. Since createUser returns m (), where m is any Monad instance, using a free monad would be possible, but turns out to be overkill. After having thought about it a bit, I recalled that in many languages and platforms, you can redirect standard in and standard out for testing purposes. The way you do that is typically by replacing each with some kind of text stream. Based on that knowledge, I thought I could use the State monad for characterization testing, with a list of strings for each text stream.

In other words, the code is already testable as it is. No test-induced damage here.

Characterization tests #

To use the State monad, I started by importing Control.Monad.Trans.State.Lazy into my test code. This enabled me to write the first characterization test:

"Happy path" ~: flip evalState
    (["just.inhale", "Justin Hale", "12345678", "12345678"], []) $ do
  let writeLine x = modify (second (++ [x]))
  let readLine = state (\(i, o) -> (head i, (tail i, o)))
  let encrypt = reverse
 
  createUser writeLine readLine encrypt
 
  actual <- gets snd
  let expected = [
        "Enter a username",
        "Enter your full name",
        "Enter your password",
        "Re-enter your password",
        "Saving Details for User (just.inhale, Justin Hale, 87654321)"]
  return $ expected ~=? actual

I consulted my earlier code from An example of state-based testing in Haskell instead of reinventing the wheel, so if you want a more detailed walkthrough, you may want to consult that article as well as this one.

The type of the state that the test makes use of is ([String], [String]). As the lambda expression suggests by naming the elements i and o, the two string lists are used for respectively input and output. The test starts with an 'input stream' populated by 'user input' values, corresponding to each of the four answers a user might give to the questions asked.

The readLine function works by pulling the head off the input list i, while on the other hand not touching the output list o. Its type is State ([a], b) a, compatible with createUser, which requires its readLine parameter to have the type m (t a), where m is a Monad instance, and t a Foldable instance. The effective type turns out to be t a ~ [Char] = String, so that readLine effectively has the type State ([String], b) String. Since State ([String], b) is a Monad instance, it fits the m type argument of the requirement.

The same kind of reasoning applies to writeLine, which appends the input value to the 'output stream', which is the second list in the I/O tuple.

The test runs the createUser action and then checks that the output list contains the expected values.

A similar test verifies the behaviour when the passwords don't match:

"Mismatched passwords" ~: flip evalState
    (["i.lean.right", "Ilene Wright", "password", "Password"], []) $ do
  let writeLine x = modify (second (++ [x]))
  let readLine = state (\(i, o) -> (head i, (tail i, o)))
  let encrypt = reverse
 
  createUser writeLine readLine encrypt
 
  actual <- gets snd
  let expected = [
        "Enter a username",
        "Enter your full name",
        "Enter your password",
        "Re-enter your password",
        "The passwords don't match"]
  return $ expected ~=? actual

You can see the third and final characterization test in the GitHub repository.

Refactored action #

With full test coverage I could proceed to refactor the createUser action, pulling in the two functions I'd test-driven into existence earlier:

createUser :: (Monad m, PrintfArg a)
           => (String -> m ()) -> m String -> (String -> a) -> m ()
createUser writeLine readLine encrypt = do
  writeLine "Enter a username"
  username <- readLine
  writeLine "Enter your full name"
  fullName <- readLine
  writeLine "Enter your password"
  password <- readLine
  writeLine "Re-enter your password"
  confirmPassword <- readLine
 
  writeLine $ either
    id
    (printf "Saving Details for User (%s, %s, %s)" username fullName . encrypt)
    (validatePassword =<< comparePasswords password confirmPassword)

Because createUser now calls comparePasswords and validatePassword, the type of the overall composition is also more concrete. That's really just an artefact of my (misguided?) decision to give each of the two helper functions types that are more concrete than necessary.

As you can see, I left the initial call-and-response sequence intact, since I didn't feel that it needed improvement.

Conclusion #

I ported the Legacy Security Manager kata to Haskell because I thought it'd be easy enough to port the code itself, and I also found the exercise compelling enough in its own right.

The most interesting point, I think, is that the createUser action remains testable without making any other concession to testability than turning it into a higher-order function. For pure functions, we would expect this to be the case, since pure functions are intrinsically testable, but for impure actions like createUser, this isn't a given. Interacting exclusively with the command-line API is, however, sufficiently simple that we can get by with the State monad. No free monad is needed, and so test-induced damage is kept at a minimum.

This blog is totally free, but if you like it, please consider supporting it.

Functor sums

2024-10-14T18:26:00+00:00

A choice of two or more functors gives rise to a functor. An article for object-oriented programmers.

This article is part of a series of articles about functor relationships. In this one you'll learn about a universal composition of functors. In short, if you have a sum type of functors, that data structure itself gives rise to a functor.

Together with other articles in this series, this result can help you answer questions such as: Does this data structure form a functor?

Not all generic types give rise to a (covariant) functor. Some are rather contravariant functors, and some are invariant.

If, on the other hand, you have a data type which is a sum of two or more (covariant) functors with the same type parameter, then the data type itself gives rise to a functor. You'll see some examples in this article.

Abstract shape in F# #

Before we look at some examples found in other code, it helps if we know what we're looking for. You'll see a C# example in a minute, but since sum types require so much ceremony in C#, we'll make a brief detour around F#.

Imagine that you have two lawful functors, F and G. Also imagine that you have a data structure that holds either an F<'a> value or a G<'a> value:

type FOrG<'a> = FOrGF of F<'a> | FOrGG of G<'a>

The name of the type is FOrG. In the FOrGF case, it holds an F<'a> value, and in the FOrGG case it holds a G<'a> value.

The point of this article is that since both F and G are (lawful) functors, then FOrG also gives rise to a functor. The composed map function can pattern-match on each case and call the respective map function that belongs to each of the two functors.

let map f forg =
    match forg with
    | FOrGF fa -> FOrGF (F.map f fa)
    | FOrGG ga -> FOrGG (G.map f ga)

For clarity I've named the values fa indicating f of a and ga indicating g of a.

Notice that it's an essential requirement that the individual functors (here F and G) are parametrized by the same type parameter (here 'a). If your data structure contains F<'a> and G<'b>, the 'theorem' doesn't apply.

Abstract shape in C# #

The same kind of abstract shape requires much more boilerplate in C#. When defining a sum type in a language that doesn't support them, we may instead either turn to the Visitor design pattern or alternatively use Church encoding. While the two are isomorphic, Church encoding is a bit simpler while the Visitor pattern seems more object-oriented. In this example I've chosen the simplicity of Church encoding.

Like in the above F# code, I've named the data structure the same, but it's now a class:

public sealed class FOrG<T>

Two constructors enable you to initialize it with either an F<T> or a G<T> value.

public FOrG(F<T> f)

public FOrG(G<T> g)

Notice that F<T> and G<T> share the same type parameter T. If a class had, instead, composed either F<T1> or G<T2>, the 'theorem' doesn't apply.

Finally, a Match method completes the Church encoding.

public TResult Match<TResult>(
    Func<F<T>, TResult> whenF,
    Func<G<T>, TResult> whenG)

Regardless of exactly what F and G are, you can add a Select method to FOrG<T> like this:

public FOrG<TResult> Select<TResult>(Func<T, TResult> selector)
{
    return Match(
        whenF: f => new FOrG<TResult>(f.Select(selector)),
        whenG: g => new FOrG<TResult>(g.Select(selector)));
}

Since we assume that F and G are functors, which in C# idiomatically have a Select method, we pass the selector to their respective Select methods. f.Select returns a new F value, while g.Select returns a new G value, but there's a constructor for each case, so the composed Select method repackages those return values in new FOrG<TResult> objects.

I'll have more to say about how this generalizes to a sum of more than two alternatives, but first, let's consider some examples.

Open or closed endpoints #

The simplest example that I can think of is that of range endpoints. A range may be open, closed, or a mix thereof. Some mathematical notations use (1, 6] to indicate the range between 1 and 6, where 1 is excluded from the range, but 6 is included. An alternative notation is ]1, 6].

A given endpoint (1 and 6, above) is either open or closed, which implies a sum type. In F# I defined it like this:

type Endpoint<'a> = Open of 'a | Closed of 'a

If you're at all familiar with F#, this is clearly a discriminated union, which is just what the F# documentation calls sum types.

The article Range as a functor goes through examples in both Haskell, F#, and C#, demonstrating, among other points, how an endpoint sum type forms a functor.

Binary tree #

The next example we'll consider is the binary tree from A Binary Tree Zipper in C#. In the original Haskell Zippers article, the data type is defined like this:

data Tree a = Empty | Node a (Tree a) (Tree a) deriving (Show)

Even if you're not familiar with Haskell syntax, the vertical bar (|) indicates a choice between the left-hand side and the right-hand side. Many programming languages use the | character for Boolean disjunction (or), so the syntax should be intuitive. In this definition, a binary tree is either empty or a node with a value and two subtrees. What interests us here is that it's a sum type.

One way this manifests in C# is in the choice of two alternative constructors:

public BinaryTree() : this(Empty.Instance)
{
}
 
public BinaryTree(T value, BinaryTree<T> left, BinaryTree<T> right)
    : this(new Node(value, left.root, right.root))
{
}

BinaryTree<T> clearly has a generic type parameter. Does the class give rise to a functor?

It does if it's composed from a sum of two functors. Is that the case?

On the 'left' side, it seems that we have nothing. In the Haskell code, it's called Empty. In the C# code, this case is represented by the parameterless constructor (also known as the default constructor). There's no T there, so that doesn't look much like a functor.

All is, however, not lost. We may view this lack of data as a particular value ('nothing') wrapped in the Const functor. In Haskell and F# a value without data is called unit and written (). In C# or Java you may think of it as void, although unit is a value that you can pass around, which isn't the case for void.

In Haskell, we could instead represent Empty as Const (), which is a bona-fide Functor instance that you can fmap:

ghci> emptyNode = Const ()
ghci> fmap (+1) emptyNode
Const ()

This examples pretends to 'increment' a number that isn't there. Not that you'd need to do this. I'm only showing you this to make the argument that the empty node forms a functor.

The 'right' side of the sum type is most succinctly summarized by the Haskell code:

Node a (Tree a) (Tree a)

It's a 'naked' generic value and two generic trees. In C# it's the parameter list

(T value, BinaryTree<T> left, BinaryTree<T> right)

Does that make a functor? Yes, it's a triple of a 'naked' generic value and two recursive subtrees, all sharing the same T. Just like in the previous article we can view a 'naked' generic value as equivalent to the Identity functor, so that parameter is a functor. The other ones are recursive types: They are of the same type as the type we're trying to evaluate, BinaryTree<T>. If we assume that that forms a functor, that triple is a product type of functors. From the previous article, we know that that gives rise to a functor.

This means that in C#, for example, you can add the idiomatic Select method:

public BinaryTree<TResult> Select<TResult>(Func<T, TResult> selector)
{
    return Aggregate(
        whenEmpty: () => new BinaryTree<TResult>(),
        whenNode: (value, left, right) =>
            new BinaryTree<TResult>(selector(value), left, right));
}

In languages that support pattern-matching on sum types (such as F#), you'd have to match on each case and explicitly deal with the recursive mapping. Notice, however, that here I've used the Aggregate method to implement Select. The Aggregate method is the BinaryTree<T> class' catamorphism, and it already handles the recursion for us. In other words, left and right are already BinaryTree<TResult> objects.

What remains is only to tell Aggregate what to do when the tree is empty, and how to transform the 'naked' node value. The Select implementation handles the former by returning a new empty tree, and the latter by invoking selector(value).

Not only does the binary tree form a functor, but it turns out that the Zipper does as well, because the breadcrumbs also give rise to a functor.

Breadcrumbs #

The original Haskell Zippers article defines a breadcrumb for the binary tree Zipper like this:

data Crumb a = LeftCrumb a (Tree a) | RightCrumb a (Tree a) deriving (Show)

That's another sum type with generics on the left as well as the right. In C# the two options may be best illustrated by these two creation methods:

public static Crumb<T> Left<T>(T value, BinaryTree<T> right)
{
    return Crumb<T>.Left(value, right);
}
 
public static Crumb<T> Right<T>(T value, BinaryTree<T> left)
{
    return Crumb<T>.Right(value, left);
}

Notice that the Left and Right choices have the same structure: A 'naked' generic T value, and a BinaryTree<T> object. Only the names differ. This suggests that we only need to think about one of them, and then we can reuse our conclusion for the other.

As we've already done once, we consider a T value equivalent with Identity<T>, which is a functor. We've also, just above, established that BinaryTree<T> forms a functor. We have a product (argument list, or tuple) of functors, so that combination forms a functor.

Since this is true for both alternatives, this sum type, too, gives rise to a functor. This enables you to implement a Select method:

public Crumb<TResult> Select<TResult>(Func<T, TResult> selector)
{
    return Match(
        (v, r) => Crumb.Left(selector(v), r.Select(selector)),
        (v, l) => Crumb.Right(selector(v), l.Select(selector)));
}

By now the pattern should be familiar. Call selector(v) directly on the 'naked' values, and pass selector to any other functors' Select method.

That's almost all the building blocks we have to declare BinaryTreeZipper<T> a functor as well, but we need one last theorem before we can do that. We'll conclude this work in the next article.

Higher arities #

Although we finally saw a 'real' triple product, all the sum types have involved binary choices between a 'left side' and a 'right side'. As was the case with functor products, the result generalizes to higher arities. A sum type with any number of cases forms a functor if all the cases give rise to a functor.

We can, again, use canonicalized forms to argue the case. (See Thinking with Types for a clear explanation of canonicalization of types.) A two-way choice is isomorphic to Either, and a three-way choice is isomorphic to Either a (Either b c). Just like it's possible to build triples, quadruples, etc. by nesting pairs, we can construct n-ary choices by nesting Eithers. It's the same kind of inductive reasoning.

This is relevant because just as Haskell's base library provides Data.Functor.Product for composing two (and thereby any number of) functors, it also provides Data.Functor.Sum for composing functor sums.

The Sum type defines two case constructors: InL and InR, but it's isomorphic with Either:

canonizeSum :: Sum f g a -> Either (f a) (g a)
canonizeSum (InL x) = Left x
canonizeSum (InR y) = Right y
 
summarizeEither :: Either (f a) (g a) -> Sum f g a
summarizeEither (Left x) = InL x
summarizeEither (Right y) = InR y

The point is that we can compose not only a choice of two, but of any number of functors, to a single functor type. A simple example is this choice between Maybe, list, or Tree:

maybeOrListOrTree :: Sum (Sum Maybe []) Tree String
maybeOrListOrTree = InL (InL (Just "foo"))

If we rather wanted to embed a list in that type, we can do that as well:

maybeOrListOrTree' :: Sum (Sum Maybe []) Tree String
maybeOrListOrTree' = InL (InR ["bar", "baz"])

Both values have the same type, and since it's a Functor instance, you can fmap over it:

ghci> fmap (elem 'r') maybeOrListOrTree
InL (InL (Just False))
ghci> fmap (elem 'r') maybeOrListOrTree'
InL (InR [True,False])

These queries examine each String to determine whether or not they contain the letter 'r', which only "bar" does.

The point, anyway, is that sum types of any arity form a functor if all the cases do.

Conclusion #

In the previous article, you learned that a functor product gives rise to a functor. In this article, you learned that a functor sum does, too. If a data structure contains a choice of two or more functors, then that data type itself forms a functor.

As the previous article argues, this is useful to know, particularly if you're working in a language with only partial support for functors. Mainstream languages aren't going to automatically turn such sums into functors, in the way that Haskell's Sum container almost does. Thus, knowing when you can safely give your generic types a Select method or map function may come in handy.

There's one more rule like this one.

Next: Functor compositions.

This blog is totally free, but if you like it, please consider supporting it.

The Const functor

2024-10-07T18:37:00+00:00

Package a constant value, but make it look like a functor. An article for object-oriented programmers.

This article is an instalment in an article series about functors. In previous articles, you've learned about useful functors such as Maybe and Either. You've also seen at least one less-than useful functor: The Identity functor. In this article, you'll learn about another (practically) useless functor called Const. You can skip this article if you want.

Like Identity, the Const functor may not be that useful, but it nonetheless exists. You'll probably not need it for actual programming tasks, but knowing that it exists, like Identity, can be a useful as an analysis tool. It may help you quickly evaluate whether a particular data structure affords various compositions. For example, it may enable you to quickly identify whether, say, a constant type and a list may compose to a functor.

This article starts with C#, then proceeds over F# to finally discuss Haskell's built-in Const functor. You can just skip the languages you don't care about.

C# Const class #

While C# supports records, and you can implement Const as one, I here present it as a full-fledged class. For readers who may not be that familiar with modern C#, a normal class may be more recognizable.

public sealed class Const<T1, T2>
{
    public T1 Value { get; }
 
    public Const(T1 value)
    {
        Value = value;
    }
 
    public Const<T1, TResult> Select<TResult>(Func<T2, TResult> selector)
    {
        return new Const<T1, TResult>(Value);
    }
 
    public override bool Equals(object obj)
    {
        return obj is Const<T1, T2> @const &&
               EqualityComparer<T1>.Default.Equals(Value, @const.Value);
    }
 
    public override int GetHashCode()
    {
        return -1584136870 + EqualityComparer<T1>.Default.GetHashCode(Value);
    }
}

The point of the Const functor is to make a constant value look like a functor; that is, a container that you can map from one type to another. The difference from the Identity functor is that Const doesn't allow you to map the constant. Rather, it cheats and pretends having a mappable type that, however, has no value associated with it; a phantom type.

In Const<T1, T2>, the T2 type parameter is the 'pretend' type. While the class contains a T1 value, it contains no T2 value. The Select method, on the other hand, maps T2 to TResult. The operation is close to being a no-op, but still not quite. While it doesn't do anything particularly practical, it does change the type of the returned value.

Here's a simple example of using the Select method:

Const<string, double> c = new Const<string, int>("foo").Select(i => Math.Sqrt(i));

The new c value also contains "foo". Only its type has changed.

If you find this peculiar, think of it as similar to mapping an empty list, or an empty Maybe value. In those cases, too, no values change; only the type changes. The difference between empty Maybe objects or empty lists, and the Const functor is that Const isn't empty. There is a value; it's just not the value being mapped.

Functor laws #

Although the Const functor doesn't really do anything, it still obeys the functor laws. To illustrate it (but not to prove it), here's an FsCheck property that exercises the first functor law:

[Property(QuietOnSuccess = true)]
public void ConstObeysFirstFunctorLaw(int i)
{
    var left = new Const<int, string>(i);
    var right = new Const<int, string>(i).Select(x => x);
 
    Assert.Equal(left, right);
}

If you think it over for a minute, this makes sense. The test creates a Const<int, string> that contains the integer i, and then proceeds to map the string that isn't there to 'itself'. Clearly, this doesn't change the i value contained in the Const<int, string> container.

In the same spirit, a property demonstrates the second functor law:

[Property(QuietOnSuccess = true)]
public void ConstObeysSecondFunctorLaw(
    Func<string, byte> f,
    Func<int, string> g,
    short s)
{
    Const<short, byte> left = new Const<short, int>(s).Select(g).Select(f);
    Const<short, byte> right = new Const<short, int>(s).Select(x => f(g(x)));
 
    Assert.Equal(left, right);
}

Again, the same kind of almost-no-op takes place. The g function first changes the int type to string, and then f changes the string type to byte, but no value ever changes; only the second type parameter. Thus, left and right remain equal, since they both contain the same value s.

F# Const #

In F# we may idiomatically express Const as a single-case union:

type Const<'v, 'a> = Const of 'v

Here I've chosen to name the first type parameter 'v (for value) in order to keep the 'functor type parameter' name 'a. This enables me to meaningfully annotate the functor mapping function with the type 'a -> 'b:

module Const =
    let get (Const x) = x
    let map (f : 'a -> 'b) (Const x : Const<'v, 'a>) : Const<'v, 'b> = Const x

Usually, you don't need to annotate F# functions like map, but in this case I added explicit types in order to make it a recognizable functor map.

I could also have defined map like this:

// 'a -> Const<'b,'c> -> Const<'b,'d>
let map f (Const x) = Const x

This still works, but is less recognizable as a functor map, since f may be any 'a. Notice that if type inference is left to its own devices, it names the input type Const<'b,'c> and the return type Const<'b,'d>. This also means that if you want to supply f as a mapping function, this is legal, because we may consider 'a ~ 'c -> 'd. It's still a functor map, but a less familiar representation.

Similar to the above C# code, two FsCheck properties demonstrate that the Const type obeys the functor laws.

[<Property(QuietOnSuccess = true)>]
let ``Const obeys first functor law`` (i : int) =
    let  left = Const i
    let right = Const i |> Const.map id
 
    left =! right
 
[<Property(QuietOnSuccess = true)>]
let ``Const obeys second functor law`` (f : string -> byte) (g : int -> string) (s : int16) =
    let  left = Const s |> Const.map g |> Const.map f
    let right = Const s |> Const.map (g >> f)
 
    left =! right

The assertions use Unquote's =! operator, which I usually read as should equal or must equal.

Haskell Const #

The Haskell base library already comes with a Const newtype.

You can easily create a new Const value:

ghci> Const "foo"
Const "foo"

If you inquire about its type, GHCi will tell you in a rather verbose way that the first type parameter is String, but the second may be any type b:

ghci> :t Const "foo"
Const "foo" :: forall {k} {b :: k}. Const String b

You can also map by 'incrementing' its non-existent second value:

ghci> (+1) <$> Const "foo"
Const "foo"
ghci> :t (+1) <$> Const "foo"
(+1) <$> Const "foo" :: Num b => Const String b

While the value remains Const "foo", the type of b is now constrained to a Num instance, which follows from the use of the + operator.

Functor law proofs #

If you look at the source code for the Functor instance, it looks much like its F# equivalent:

instance Functor (Const m) where
    fmap _ (Const v) = Const v

We can use equational reasoning with the notation that Bartosz Milewski uses to prove that both functor laws hold, starting with the first:

  fmap id (Const x)
= { definition of fmap }
  Const x

Clearly, there's not much to that part. What about the second functor law?

  fmap (g . f) (Const x)
= { definition of fmap }
  Const x
= { definition of fmap }
  fmap g (Const x)
= { definition of fmap }
  fmap g (fmap f (Const x))
= { definition of composition }
  (fmap g . fmap f) (Const x)

While that proof takes a few more steps, most are as trivial as the first proof.

Conclusion #

The Const functor is hardly a programming construct you'll use in your day-to-day work, but the fact that it exists can be used to generalize some results that involve functors. Now, whenever you have a result that involves a functor, you know that it also generalizes to constant values, just like the Identity functor teaches us that 'naked' type parameters can be thought of as functors.

To give a few examples, we may already know that Tree<T> (C# syntax) is a functor, but a 'naked' generic type parameter T also gives rise to a functor (Identity), as does a non-generic type (such as int or MyCustomClass).

Thus, if you have a function that operates on any functor, it may also, conceivably, operate on data structures that have non-generic types. This may for example be interesting when we begin to consider how functors compose.

Next: The State functor.

This blog is totally free, but if you like it, please consider supporting it.

Das verflixte Hunde-Spiel

2024-10-03T17:41:00+00:00

A puzzle kata, and a possible solution.

When I was a boy I had a nine-piece puzzle that I'd been gifted by the Swizz branch of my family. It's called Das verflixte Hunde-Spiel, which means something like the confounded dog game in English. And while a puzzle with nine pieces doesn't sound like much, it is, in fact, incredibly difficult.

It's just a specific incarnation of a kind of game that you've almost certainly encountered, too.

There are nine tiles, each with two dog heads and two dog ends. A dog may be coloured in one of four different patterns. The object of the game is to lay out the nine tiles in a 3x3 square so that all dog halves line up.

Game details #

The game is from 1979. Two of the tiles are identical, and, according to the information on the back of the box, two possible solutions exist. Described from top clockwise, the tiles are the following:

Brown head, grey head, umber tail, spotted tail
Brown head, spotted head, brown tail, umber tail
Brown head, spotted head, grey tail, umber tail
Brown head, spotted head, grey tail, umber tail
Brown head, umber head, spotted tail, grey tail
Grey head, brown head, spotted tail, umber tail
Grey head, spotted head, brown tail, umber tail
Grey head, umber head, brown tail, spotted tail
Grey head, umber head, grey tail, spotted tail

I've taken the liberty of using a shorthand for the patterns. The grey dogs are actually also spotted, but since there's only one grey pattern, the grey label is unambiguous. The dogs I've named umber are actually rather burnt umber, but that's too verbose for my tastes, so I just named them umber. Finally, the label spotted indicates dogs that are actually burnt umber with brown blotches.

Notice that there are two tiles with a brown head, a spotted head, a grey tail, and an umber tail.

The object of the game is to lay down the tiles in a 3x3 square so that all dogs fit. For further reference, I've numbered each position from one to nine like this:

What makes the game hard? There are nine cards, so if you start with the upper left corner, you have nine choices. If you just randomly put down the tiles, you now have eight left for the top middle position, and so on. Standard combinatorics indicate that there are at least 9! = 362,880 permutations.

That's not the whole story, however, since you can rotate each tile in four different ways. You can rotate the first tile four ways, the second tile four ways, etc. for a total of 4⁹ = 262,144 ways. Multiply these two numbers together, and you get 4⁹9! = 95,126,814,720 combinations. No wonder this puzzle is hard if there's only two solutions.

When analysed this way, however, there are actually 16 solutions, but that still makes it incredibly unlikely to arrive at a solution by chance. I'll get back to why there are 16 solutions later. For now, you should have enough information to try your hand with this game, if you'd like.

I found that the game made for an interesting kata: Write a program that finds all possible solutions to the puzzle.

If you'd like to try your hand at this exercise, I suggest that you pause reading here.

In the rest of the article, I'll outline my first attempt. Spoiler alert: I'll also show one of the solutions.

Types #

When you program in Haskell, it's natural to start by defining some types.

data Half = Head | Tail deriving (Show, Eq)

data Pattern = Brown | Grey | Spotted | Umber deriving (Show, Eq)

data Tile = Tile {
  top :: (Pattern, Half),
  right :: (Pattern, Half),
  bottom :: (Pattern, Half),
  left :: (Pattern, Half) }
  deriving (Show, Eq)

Each tile describes what you find on its top, right side, bottom, and left side.

We're also going to need a function to evaluate whether two halves match:

matches :: (Pattern, Half) -> (Pattern, Half) -> Bool
matches (p1, h1) (p2, h2) = p1 == p2 && h1 /= h2

This function demands that the patterns match, but that the halves are opposites.

You can use the Tile type and its constituents to define the nine tiles of the game:

tiles :: [Tile]
tiles =
  [
    Tile (Brown, Head) (Grey, Head) (Umber, Tail) (Spotted, Tail),
    Tile (Brown, Head) (Spotted, Head) (Brown, Tail) (Umber, Tail),
    Tile (Brown, Head) (Spotted, Head) (Grey, Tail) (Umber, Tail),
    Tile (Brown, Head) (Spotted, Head) (Grey, Tail) (Umber, Tail),
    Tile (Brown, Head) (Umber, Head) (Spotted, Tail) (Grey, Tail),
    Tile (Grey, Head) (Brown, Head) (Spotted, Tail) (Umber, Tail),
    Tile (Grey, Head) (Spotted, Head) (Brown, Tail) (Umber, Tail),
    Tile (Grey, Head) (Umber, Head) (Brown, Tail) (Spotted, Tail),
    Tile (Grey, Head) (Umber, Head) (Grey, Tail) (Spotted, Tail)
  ]

Because I'm the neatnik that I am, I've sorted the tiles in lexicographic order, but the solution below doesn't rely on that.

Brute force doesn't work #

Before I started, I cast around the internet to see if there was an appropriate algorithm for the problem. While I found a few answers on Stack Overflow, none of them gave me indication that any sophisticated algorithm was available. (Even so, there may be, and I just didn't find it.)

It seems clear, however, that you can implement some kind of recursive search-tree algorithm that cuts a branch off as soon as it realizes that it doesn't work. I'll get back to that later, so let's leave that for now.

Since I'd planned on writing the code in Haskell, I decided to first try something that might look like brute force. Because Haskell is lazily evaluated, you can sometimes get away with techniques that look wasteful when you're used to strict/eager evaluation. In this case, it turned out to not work, but it's often quicker to just make the attempt than trying to analyze the problem.

As already outlined, I first attempted a purely brute-force solution, betting that Haskell's lazy evaluation would be enough to skip over the unnecessary calculations:

allRotationsOf9 = replicateM 9 [0..3]

allRotations :: [Tile] -> [[Tile]]
allRotations ts = fmap (\rs -> (\(r, t) -> rotations t !! r) <$> zip rs ts) allRotationsOf9

allConfigurations :: [[Tile]]
allConfigurations = permutations tiles >>= allRotations

solutions = filter isSolution allConfigurations

My idea with the allConfigurations value was that it's supposed to enumerate all 95 billion combinations. Whether it actually does that, I was never able to verify, because if I try to run that code, my poor laptop runs for a couple of hours before it eventually runs out of memory. In other words, the GHCi process crashes.

I haven't shown isSolution or rotations, because I consider the implementations irrelevant. This attempt doesn't work anyway.

Now that I look at it, it's quite clear why this isn't a good strategy. There's little to be gained from lazy evaluation when the final attempt just attempts to filter a list. Even with lazy evaluation, the code still has to run through all 95 billion combinations.

Things might have been different if I just had to find one solution. With a little luck, it might be that the first solution appears after, say, a hundred million iterations, and lazy evaluation would then had meant that the remaining combinations would never run. Not so here, but hindsight is 20-20.

Search tree #

Back to the search tree idea. It goes like this: Start from the top left position and pick a random tile and rotation. Now pick an arbitrary tile that fits and place it to the right of it, and so on. As far as I can tell, you can always place the first four cards, but from there, you can easily encounter a combination that allows no further tiles. Here's an example:

None of the remaining five tiles fit in the fifth position. This means that we don't have to do any permutations that involve these four tiles in that combination. While the algorithm has to search through all five remaining tiles and rotations to discover that none fit in position 5, once it knows that, it doesn't have to go through the remaining four positions. That's 4⁴4! = 6,144 combinations that it can skip every time it discovers an impossible beginning. That doesn't sound like that much, but if we assume that this happens more often than not, it's still an improvement by orders of magnitude.

We may think of this algorithm as constructing a search tree, but immediately pruning all branches that aren't viable, as close to the root as possible.

Matches #

Before we get to the algorithm proper we need a few simple helper functions. One kind of function is a predicate that determines if a particular tile can occupy a given position. Since we may place any tile in any rotation in the first position, we don't need to write a predicate for that, but if we wanted to generalize, const True would do.

Whether or not we can place a given tile in the second position depends exclusively on the tile in the first position:

tile2Matches :: Tile -> Tile -> Bool
tile2Matches t1 t2 = right t1 `matches` left t2

If the right dog part of the first tile matches the left part of the second tile, the return value is True; otherwise, it's False. Note that I'm using infix notation for matches. I could also have written the function as

tile2Matches :: Tile -> Tile -> Bool
tile2Matches t1 t2 = matches (right t1) (left t2)

but it doesn't read as well.

In any case, the corresponding matching functions for the third and forth tile look similar:

tile3Matches :: Tile -> Tile -> Bool
tile3Matches t2 t3 = right t2 `matches` left t3

tile4Matches :: Tile -> Tile -> Bool
tile4Matches t1 t4 = bottom t1 `matches` top t4

Notice that tile4Matches compares the fourth tile with the first tile rather than the third tile, because position 4 is directly beneath position 1, rather than to the right of position 3 (cf. the grid above). For that reason it also compares the bottom of tile 1 to the top of the fourth tile.

The matcher for the fifth tile is different:

tile5Matches :: Tile -> Tile -> Tile -> Bool
tile5Matches t2 t4 t5 = bottom t2 `matches` top t5 && right t4 `matches` left t5

This is the first predicate that depends on two, rather than one, previous tiles. In position 5 we need to examine both the tile in position 2 and the one in position 4.

The same is true for position 6:

tile6Matches :: Tile -> Tile -> Tile -> Bool
tile6Matches t3 t5 t6 = bottom t3 `matches` top t6 && right t5 `matches` left t6

but then the matcher for position 7 looks like the predicate for position 4:

tile7Matches :: Tile -> Tile -> Bool
tile7Matches t4 t7 = bottom t4 `matches` top t7

This is, of course, because the tile in position 7 only has to consider the tile in position 4. Finally, not surprising, the two remaining predicates look like something we've already seen:

tile8Matches :: Tile -> Tile -> Tile -> Bool
tile8Matches t5 t7 t8 = bottom t5 `matches` top t8 && right t7 `matches` left t8

tile9Matches :: Tile -> Tile -> Tile -> Bool
tile9Matches t6 t8 t9 = bottom t6 `matches` top t9 && right t8 `matches` left t9

You may suggest that it'd be possible to reduce the number of predicates. After all, there's effectively only three different predicates: One that only looks at the tile to the left, one that only looks at the tile above, and one that looks both to the left and above.

Indeed, I could have boiled it down to just three functions:

matchesHorizontally :: Tile -> Tile -> Bool
matchesHorizontally x y = right x `matches` left y

matchesVertically :: Tile -> Tile -> Bool
matchesVertically x y = bottom x `matches` top y

matchesBoth :: Tile -> Tile -> Tile -> Bool
matchesBoth x y z = matchesVertically x z && matchesHorizontally y z

but I now run the risk of calling the wrong predicate from my implementation of the algorithm. As you'll see, I'll call each predicate by name at each appropriate step, but if I had only these three functions, there's a risk that I might mistakenly use matchesHorizontally when I should have used matchesVertically, or vice versa. Reducing eight one-liners to three one-liners doesn't really seem to warrant the risk.

Rotations #

In addition to examining whether a given tile fits in a given position, we also need to be able to rotate any tile:

rotateClockwise :: Tile -> Tile
rotateClockwise (Tile t r b l) = Tile l t r b

rotateCounterClockwise :: Tile -> Tile
rotateCounterClockwise (Tile t r b l) = Tile r b l t

upend :: Tile -> Tile
upend (Tile t r b l) = Tile b l t r

What is really needed, it turns out, is to enumerate all four rotations of a tile:

rotations :: Tile -> [Tile]
rotations t = [t, rotateClockwise t, upend t, rotateCounterClockwise t]

Since this, like everything else here, is a pure function, I experimented with defining a 'memoized tile' type that embedded all four rotations upon creation, so that the algorithm doesn't need to call the rotations function millions of times, but I couldn't measure any discernable performance improvement from it. There's no reason to make things more complicated than they need to be, so I didn't keep that change. (Since I do, however, use Git tactically i did, of course, stash the experiment.)

Permutations #

While I couldn't make things work by enumerating all 95 billion combinations, enumerating all 362,880 permutations of non-rotated tiles is well within the realm of the possible:

allPermutations :: [(Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)]
allPermutations =
  (\[t1, t2, t3, t4, t5, t6, t7, t8, t9] -> (t1, t2, t3, t4, t5, t6, t7, t8, t9))
  <$> permutations tiles

Doing this in GHCi on my old laptop takes 300 milliseconds, which is good enough compared to what comes next.

This list value uses permutations to enumerate all the permutations. You may already have noticed that it converts the result into a nine-tuple. The reason for that is that this enables the algorithm to pattern-match into specific positions without having to resort to the index operator, which is both partial and requires iteration of the list to reach the indexed element. Granted, the list is only nine elements long, and often the algorithm will only need to index to the fourth or fifth element. On the other hand, it's going to do it a lot. Perhaps it's a premature optimization, but if it is, it's at least one that makes the code more, rather than less, readable.

Algorithm #

I found it easiest to begin at the 'bottom' of what is effectively a recursive algorithm, even though I didn't implement it that way. At the 'bottom', I imagine that I'm almost done: That I've found eight tiles that match, and now I only need to examine if I can rotate the final tile so that it matches:

solve9th ::  (a, b, c, d, e, Tile, g, Tile, Tile)
         -> [(a, b, c, d, e, Tile, g, Tile, Tile)]
solve9th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = do
  match <- filter (tile9Matches t6 t8) $ rotations t9
  return (t1, t2, t3, t4, t5, t6, t7, t8, match)

Recalling that Haskell functions compose from right to left, the function starts by enumerating the four rotations of the ninth and final tile t9. It then filters those four rotations by the tile9Matches predicate.

The match value is a rotation of t9 that matches t6 and t8. Whenever solve9th finds such a match, it returns the entire nine-tuple, because the assumption is that the eight first tiles are already valid.

Notice that the function uses do notation in the list monad, so it's quite possible that the first filter expression produces no match. In that case, the second line of code never runs, and instead, the function returns the empty list.

How do we find a tuple where the first eight elements are valid? Well, if we have seven valid tiles, we may consider the eighth and subsequently call solve9th:

solve8th ::  (a, b, c, d, Tile, Tile, Tile, Tile, Tile)
         -> [(a, b, c, d, Tile, Tile, Tile, Tile, Tile)]
solve8th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = do
  match <- filter (tile8Matches t5 t7) $ rotations t8
  solve9th (t1, t2, t3, t4, t5, t6, t7, match, t9)

This function looks a lot like solve9th, but it instead enumerates the four rotations of the eighth tile t8 and filters with the tile8Matches predicate. Due to the do notation, it'll only call solve9th if it finds a match.

Once more, this function assumes that the first seven tiles are already in a legal constellation. How do we find seven valid tiles? The same way we find eight: By assuming that we have six valid tiles, and then finding the seventh, and so on:

solve7th ::  (a, b, c, Tile, Tile, Tile, Tile, Tile, Tile)
         -> [(a, b, c, Tile, Tile, Tile, Tile, Tile, Tile)]
solve7th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = do
  match <- filter (tile7Matches t4) $ rotations t7
  solve8th (t1, t2, t3, t4, t5, t6, match, t8, t9)

solve6th ::  (a, b, Tile, Tile, Tile, Tile, Tile, Tile, Tile)
         -> [(a, b, Tile, Tile, Tile, Tile, Tile, Tile, Tile)]
solve6th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = do
  match <- filter (tile6Matches t3 t5) $ rotations t6
  solve7th (t1, t2, t3, t4, t5, match, t7, t8, t9)

solve5th ::  (a, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)
         -> [(a, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)]
solve5th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = do
  match <- filter (tile5Matches t2 t4) $ rotations t5
  solve6th (t1, t2, t3, t4, match, t6, t7, t8, t9)

solve4th ::  (Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)
         -> [(Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)]
solve4th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = do
  match <- filter (tile4Matches t1) $ rotations t4
  solve5th (t1, t2, t3, match, t5, t6, t7, t8, t9)

solve3rd ::  (Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)
         -> [(Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)]
solve3rd (t1, t2, t3, t4, t5, t6, t7, t8, t9) = do
  match <- filter (tile3Matches t2) $ rotations t3
  solve4th (t1, t2, match, t4, t5, t6, t7, t8, t9)

solve2nd ::  (Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)
         -> [(Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)]
solve2nd (t1, t2, t3, t4, t5, t6, t7, t8, t9) = do
  match <- filter (tile2Matches t1) $ rotations t2
  solve3rd (t1, match, t3, t4, t5, t6, t7, t8, t9)

You'll observe that solve7th down to solve2nd are very similar. The only things that really vary are the predicates, and the positions of the tile being examined, as well as its neighbours. Clearly I can generalize this code, but I'm not sure it's worth it. I wrote a few of these in the order I've presented them here, because it helped me think the problem through, and to be honest, once I had two or three of them, GitHub Copilot picked up on the pattern and wrote the remaining functions for me.

Granted, typing isn't a programming bottleneck, so we should rather ask if this kind of duplication looks like a maintenance problem. Given that this is a one-time exercise, I'll just leave it be and move on.

Particularly, if you're struggling to understand how this implements the 'truncated search tree', keep in mind that e..g solve5th is likely to produce no valid match, in which case it'll never call solve6th. The same may happen in solve6th, etc.

The 'top' function is a bit different because it doesn't need to filter anything:

solve1st ::  (Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)
         -> [(Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)]
solve1st (t1, t2, t3, t4, t5, t6, t7, t8, t9) = do
  match <- rotations t1
  solve2nd (match, t2, t3, t4, t5, t6, t7, t8, t9)

In the first position, any tile in any rotation is legal, so solve1st only enumerates all four rotations of t1 and calls solve2nd for each.

The final step is to compose allPermutations with solve1st:

solutions :: [(Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile, Tile)]
solutions = allPermutations >>= solve1st

Running this in GHCi on my 4½-year old laptop produces all 16 solutions in approximately 22 seconds.

Evaluation #

Is that good performance? Well, it turns out that it's possible to substantially improve on the situation. As I've mentioned a couple of times, so far I've been running the program from GHCi, the Haskell REPL. Most of the 22 seconds are spent interpreting or compiling the code.

If I compile the code with some optimizations turned on, the executable runs in approximately 300 ms. That seems quite decent, if I may say so.

I can think of a few tweaks to the code that might conceivably improve things even more, but when I test, there's no discernable difference. Thus, I'll keep the code as shown here.

Here's one of the solutions:

The information on the box claims that there's two solutions. Why does the code shown here produce 16 solutions?

There's a good explanation for that. Recall that two of the tiles are identical. In the above solution picture, it's tile 1 and 3, although they're rotated 90° in relation to each other. This implies that you could take tile 1, rotate it counter-clockwise and put it in position 3, while simultaneously taking tile 3, rotating it clockwise, and putting it in position 1. Visually, you can't tell the difference, so they don't count as two distinct solutions. The algorithm, however, doesn't make that distinction, so it enumerates what is effectively the same solution twice.

Not surprising, it turns out that all 16 solutions are doublets in that way. We can confirm that by evaluating length $ nub solutions, which returns 8.

Eight solutions are, however, still four times more than two. Can you figure out what's going on?

The algorithm also enumerates four rotations of each solution. Once we take this into account, there's only two visually distinct solutions left. One of them is shown above. I also have a picture of the other one, but I'm not going to totally spoil things for you.

Conclusion #

When I was eight, I might have had the time and the patience to actually lay the puzzle. Despite the incredibly bad odds, I vaguely remember finally solving it. There must be some more holistic processing going on in the brain, if even a kid can solve the puzzle, because it seems inconceivable that it should be done as described here.

Today, I don't care for that kind of puzzle in analog form, but I did, on the other hand, find it an interesting programming exercise.

The code could be smaller, but I like it as it is. While a bit on the verbose side, I think that it communicates well what's going on.

I was pleasantly surprised that I managed to get execution time down to 300 ms. I'd honestly not expected that when I started.

Comments

Andreas Källberg #

Thanks for a nice blog post! I found the challange interesting, so I have written my own version of the code that both tries to be faster and also remove the redundant solutions, so it only generates two solutions in total. The code is available here. It executes in roughly 8 milliseconds both in ghci and compiled (and takes a second to compile and run using runghc) on my laptop.

In order to improve the performance, I start with a blank grid and one-by-one add tiles until it is no longer possible to do so, and then bactrack, kind of like how you would do it by hand. As a tiny bonus, that I haven't actually measured if it makes any practical difference, I also selected the order of filling in the grid so that they can constrain each other as much as possible, by filling 2-by-2 squares as early as possible. I have however calculated the number of boards explored in each of the two variations. With a spiral order, 6852 boards are explored, while with a linear order, 9332 boards are explored.

In order to eliminate rotational symmetry, I start by filling the center square and fixing its rotation, rather than trying all rotations for it, since we could view any initial rotation of the center square as equivalent to rotating the whole board. In order to eliminate the identical solutions from the two identical tiles, I changed the encoding to use a number next to the tile to say how many copies are left of it, so when we choose a tile, there is only a single way to choose each tile, even if there are multiple copies of it. Both of these would also in theory make the code slightly faster if the time wasn't already dominated by general IO and other unrelated things.

I also added various pretty printing and tracing utilites to the code, so you can see exactly how it executes and which partial solutions it explores.

2024-10-16 00:32 UTC

Mark Seemann #

Thank you for writing. I did try filling the two-by-two square first, as you suggest, but in isolation it makes no discernable difference.

I haven't tried your two other optimizations. The one to eliminate rotations should, I guess, reduce the search space to a fourth of mine, unless I'm mistaken. That would reduce my 300 ms to approximately 75 ms.

I can't easily guess how much time the other optimization shaves off, but it could be the one that makes the bigger difference.

2024-10-19 08:21 UTC

This blog is totally free, but if you like it, please consider supporting it.

FSZipper in C#

2024-09-23T06:13:00+00:00

Another functional model of a file system, with code examples in C#.

This article is part of a series about Zippers. In this one, I port the FSZipper data structure from the Learn You a Haskell for Great Good! article Zippers.

A word of warning: I'm assuming that you're familiar with the contents of that article, so I'll skip the pedagogical explanations; I can hardly do it better that it's done there. Additionally, I'll make heavy use of certain standard constructs to port Haskell code, most notably Church encoding to model sum types in languages that don't natively have them. Such as C#. In some cases, I'll implement the Church encoding using the data structure's catamorphism. Since the cyclomatic complexity of the resulting code is quite low, you may be able to follow what's going on even if you don't know what Church encoding or catamorphisms are, but if you want to understand the background and motivation for that style of programming, you can consult the cited resources.

The code shown in this article is available on GitHub.

File system item initialization and structure #

If you haven't already noticed, Haskell (and other statically typed functional programming languages like F#) makes heavy use of sum types, and the FSZipper example is no exception. It starts with a one-liner to define a file system item, which may be either a file or a folder. In C# we must instead use a class:

public sealed class FSItem

Contrary to the two previous examples, the FSItem class has no generic type parameter. This is because I'm following the Haskell example code as closely as possible, but as I've previously shown, you can model a file hierarchy with a general-purpose rose tree.

Staying consistent with the two previous articles, I'll use Church encoding to model a sum type, and as discussed in the previous article I use a private implementation for that.

private readonly IFSItem imp;
 
private FSItem(IFSItem imp)
{
    this.imp = imp;
}
 
public static FSItem CreateFile(string name, string data)
{
    return new(new File(name, data));
}
 
public static FSItem CreateFolder(string name, IReadOnlyCollection<FSItem> items)
{
    return new(new Folder(name, items));
}

Two static creation methods enable client developers to create a single FSItem object, or an entire tree, like the example from the Haskell code, here ported to C#:

private static readonly FSItem myDisk =
    FSItem.CreateFolder("root",
    [
        FSItem.CreateFile("goat_yelling_like_man.wmv", "baaaaaa"),
        FSItem.CreateFile("pope_time.avi", "god bless"),
        FSItem.CreateFolder("pics",
        [
            FSItem.CreateFile("ape_throwing_up.jpg", "bleargh"),
            FSItem.CreateFile("watermelon_smash.gif", "smash!!"),
            FSItem.CreateFile("skull_man(scary).bmp", "Yikes!")
        ]),
        FSItem.CreateFile("dijon_poupon.doc", "best mustard"),
        FSItem.CreateFolder("programs",
        [
            FSItem.CreateFile("fartwizard.exe", "10gotofart"),
            FSItem.CreateFile("owl_bandit.dmg", "mov eax, h00t"),
            FSItem.CreateFile("not_a_virus.exe", "really not a virus"),
            FSItem.CreateFolder("source code",
            [
                FSItem.CreateFile("best_hs_prog.hs", "main = print (fix error)"),
                FSItem.CreateFile("random.hs", "main = print 4")
            ])
        ])
    ]);

Since the imp class field is just a private implementation detail, a client developer needs a way to query an FSItem object about its contents.

File system item catamorphism #

Just like the previous article, I'll start with the catamorphism. This is essentially the rose tree catamorphism, just less generic, since FSItem doesn't have a generic type parameter.

public TResult Aggregate<TResult>(
    Func<string, string, TResult> whenFile,
    Func<string, IReadOnlyCollection<TResult>, TResult> whenFolder)
{
    return imp.Aggregate(whenFile, whenFolder);
}

The Aggregate method delegates to its internal implementation class field, which is defined as the private nested interface IFSItem:

private interface IFSItem
{
    TResult Aggregate<TResult>(
        Func<string, string, TResult> whenFile,
        Func<string, IReadOnlyCollection<TResult>, TResult> whenFolder);
}

As discussed in the previous article, the interface is hidden away because it's only a vehicle for polymorphism. It's not intended for client developers to be used (although that would be benign) or implemented (which could break encapsulation). There are only, and should ever only be, two implementations. The one that represents a file is the simplest:

private sealed record File(string Name, string Data) : IFSItem
{
    public TResult Aggregate<TResult>(
        Func<string, string, TResult> whenFile,
        Func<string, IReadOnlyCollection<TResult>, TResult> whenFolder)
    {
        return whenFile(Name, Data);
    }
}

The File record's Aggregate method unconditionally calls the supplied whenFile function argument with the Name and Data that was originally supplied via its constructor.

The Folder implementation is a bit trickier, mostly due to its recursive nature, but also because I wanted it to have structural equality.

private sealed class Folder : IFSItem
{
    private readonly string name;
    private readonly IReadOnlyCollection<FSItem> items;
 
    public Folder(string Name, IReadOnlyCollection<FSItem> Items)
    {
        name = Name;
        items = Items;
    }
 
    public TResult Aggregate<TResult>(
        Func<string, string, TResult> whenFile,
        Func<string, IReadOnlyCollection<TResult>, TResult> whenFolder)
    {
        return whenFolder(
            name,
            items.Select(i => i.Aggregate(whenFile, whenFolder)).ToList());
    }
 
    public override bool Equals(object? obj)
    {
        return obj is Folder folder &&
               name == folder.name &&
               items.SequenceEqual(folder.items);
    }
 
    public override int GetHashCode()
    {
        return HashCode.Combine(name, items);
    }
}

It, too, unconditionally calls one of the two functions passed to its Aggregate method, but this time whenFolder. It does that, however, by first recursively calling Aggregate within a Select expression. It needs to do that because the whenFolder function expects the subtree to have been already converted to values of the TResult return type. This is a common pattern with catamorphisms, and takes a bit of time getting used to. You can see similar examples in the articles Tree catamorphism, Rose tree catamorphism, Full binary tree catamorphism, as well as the previous one in this series.

I also had to make Folder a class rather than a record, because I wanted the type to have structural equality, and you can't override Equals on records (and if the base class library has any collection type with structural equality, I'm not aware of it).

File system item Church encoding #

True to the structure of the previous article, the catamorphism doesn't look quite like a Church encoding, but it's possible to define the latter from the former.

public TResult Match<TResult>(
    Func<string, string, TResult> whenFile,
    Func<string, IReadOnlyCollection<FSItem>, TResult> whenFolder)
{
    return Aggregate(
        whenFile: (name, data) =>
            (item: CreateFile(name, data), result: whenFile(name, data)),
        whenFolder: (name, pairs) =>
        {
            var items = pairs.Select(i => i.item).ToList();
            return (CreateFolder(name, items), whenFolder(name, items));
        }).result;
}

The trick is the same as in the previous article: Build up an intermediate tuple that contains both the current item as well as the result being accumulated. Once the Aggregate method returns, the Match method returns only the result part of the resulting tuple.

I implemented the whenFolder expression as a code block, because both tuple elements needed the items collection. You can inline the Select expression, but that would cause it to run twice. That's probably a premature optimization, but it also made the code a bit shorter, and, one may hope, a bit more readable.

Fily system breadcrumb #

Finally, things seem to be becoming a little easier. The port of FSCrumb is straightforward.

public sealed class FSCrumb
{
    public FSCrumb(
        string name,
        IReadOnlyCollection<FSItem> left,
        IReadOnlyCollection<FSItem> right)
    {
        Name = name;
        Left = left;
        Right = right;
    }
 
    public string Name { get; }
    public IReadOnlyCollection<FSItem> Left { get; }
    public IReadOnlyCollection<FSItem> Right { get; }
 
    public override bool Equals(object? obj)
    {
        return obj is FSCrumb crumb &&
               Name == crumb.Name &&
               Left.SequenceEqual(crumb.Left) &&
               Right.SequenceEqual(crumb.Right);
    }
 
    public override int GetHashCode()
    {
        return HashCode.Combine(Name, Left, Right);
    }
}

The only reason this isn't a record is, once again, that I want to override Equals so that the type can have structural equality. Visual Studio wants me to convert to a primary constructor. That would simplify the code a bit, but actually not that much.

(I'm still somewhat conservative in my choice of new C# language features. Not that I have anything against primary constructors which, after all, F# has had forever. The reason I'm holding back is for didactic reasons. Not every reader is on the latest language version, and some readers may be using another programming language entirely. On the other hand, primary constructors seem natural and intuitive, so I may start using them here on the blog as well. I don't think that they're going to be much of a barrier to understanding.)

Now that we have both the data type we want to zip, as well as the breadcrumb type we need, we can proceed to add the Zipper.

File system Zipper #

The FSZipper C# class fills the position of the eponymous Haskell type alias. Data structure and initialization is straightforward.

public sealed class FSZipper
{
    private FSZipper(FSItem fSItem, IReadOnlyCollection<FSCrumb> breadcrumbs)
    {
        FSItem = fSItem;
        Breadcrumbs = breadcrumbs;
    }
 
    public FSZipper(FSItem fSItem) : this(fSItem, [])
    {
    }
 
    public FSItem FSItem { get; }
    public IReadOnlyCollection<FSCrumb> Breadcrumbs { get; }
 
    // Methods follow here...

True to the style I've already established, I've made the master constructor private in order to highlight that the Breadcrumbs are the responsibility of the FSZipper class itself. It's not something client code need worry about.

Going down #

The Haskell Zippers article introduces fsUp before fsTo, but if we want to see some example code, we need to navigate to somewhere before we can navigate up. Thus, I'll instead start with the function that navigates to a child node.

public FSZipper? GoTo(string name)
{
    return FSItem.Match(
        (_, _) => null,
        (folderName, items) =>
        {
            FSItem? item = null;
            var ls = new List<FSItem>();
            var rs = new List<FSItem>();
            foreach (var i in items)
            {
                if (item is null && i.IsNamed(name))
                    item = i;
                else if (item is null)
                    ls.Add(i);
                else
                    rs.Add(i);
            }
 
            if (item is null)
                return null;
 
            return new FSZipper(
                item,
                Breadcrumbs.Prepend(new FSCrumb(folderName, ls, rs)).ToList());
        });
}

This is by far the most complicated navigation we've seen so far, and I've even taken the liberty of writing an imperative implementation. It's not that I don't know how I could implement it in a purely functional fashion, but I've chosen this implementation for a couple of reasons. The first of which is that, frankly, it was easier this way.

This stems from the second reason: That the .NET base class library, as far as I know, offers no functionality like Haskell's break function. I could have written such a function myself, but felt that it was too much of a digression, even for me. Maybe I'll do that another day. It might make for a nice little exercise.

The third reason is that C# doesn't afford pattern matching on sequences, in the shape of destructuring the head and the tail of a list. (Not that I know of, anyway, but that language changes rapidly at the moment, and it does have some pattern-matching features now.) This means that I have to check item for null anyway.

In any case, while the implementation is imperative, an external caller can't tell. The GoTo method is still referentially transparent. Which means that it fits in your head.

You may have noticed that the implementation calls IsNamed, which is also new.

public bool IsNamed(string name)
{
    return Match((n, _) => n == name, (n, _) => n == name);
}

This is an instance method I added to FSItem.

In summary, the GoTo method enables client code to navigate down in the file hierarchy, as this unit test demonstrates:

[Fact]
public void GoToSkullMan()
{
    var sut = new FSZipper(myDisk);
 
    var actual = sut.GoTo("pics")?.GoTo("skull_man(scary).bmp");
 
    Assert.NotNull(actual);
    Assert.Equal(
        FSItem.CreateFile("skull_man(scary).bmp", "Yikes!"),
        actual.FSItem);
}

The example is elementary. First go to the pics folder, and from there to the skull_man(scary).bmp.

Going up #

Going back up the hierarchy isn't as complicated.

public FSZipper? GoUp()
{
    if (Breadcrumbs.Count == 0)
        return null;
 
    var head = Breadcrumbs.First();
    var tail = Breadcrumbs.Skip(1);
 
    return new FSZipper(
        FSItem.CreateFolder(head.Name, [.. head.Left, FSItem, .. head.Right]),
        tail.ToList());
}

If the Breadcrumbs collection is empty, we're already at the root, in which case we can't go further up. In that case, the GoUp method returns null, as does the GoTo method if it can't find an item with the desired name. This possibility is explicitly indicated by the FSZipper? return type; notice the question mark, which indicates that the value may be null. If you're working in a context or language where that feature isn't available, you may instead consider taking advantage of the Maybe monad (which is also what you'd idiomatically do in Haskell).

If Breadcrumbs is not empty, it means that there's a place to go up to. It also implies that the previous operation navigated down, and the only way that's possible is if the previous node was a folder. Thus, the GoUp method knows that it needs to reconstitute a folder, and from the head breadcrumb, it knows that folder's name, and what was originally to the Left and Right of the Zipper's FSItem property.

This unit test demonstrates how client code may use the GoUp method:

[Fact]
public void GoUpFromSkullMan()
{
    var sut = new FSZipper(myDisk);
    // This is the same as the GoToSkullMan test
    var newFocus = sut.GoTo("pics")?.GoTo("skull_man(scary).bmp");
 
    var actual = newFocus?.GoUp()?.GoTo("watermelon_smash.gif");
 
    Assert.NotNull(actual);
    Assert.Equal(
        FSItem.CreateFile("watermelon_smash.gif", "smash!!"),
        actual.FSItem);
}

This test first repeats the navigation also performed by the other test, then uses GoUp to go one level up, which finally enables it to navigate to the watermelon_smash.gif file.

Renaming a file or folder #

A Zipper enables you to navigate a data structure, but you can also use it to modify the element in focus. One option is to rename a file or folder.

public FSZipper Rename(string newName)
{
    return new FSZipper(
        FSItem.Match(
            (_, dat) => FSItem.CreateFile(newName, dat),
            (_, items) => FSItem.CreateFolder(newName, items)),
        Breadcrumbs);
}

The Rename method 'pattern-matches' on the 'current' FSItem and in both cases creates a new file or folder with the new name. Since it doesn't need the old name for anything, it uses the wildcard pattern to ignore that value. This operation is always possible, so the return type is FSZipper, without a question mark, indicating that the method never returns null.

The following unit test replicates the Haskell article's example by renaming the pics folder to cspi.

[Fact]
public void RenamePics()
{
    var sut = new FSZipper(myDisk);
 
    var actual = sut.GoTo("pics")?.Rename("cspi").GoUp();
 
    Assert.NotNull(actual);
    Assert.Empty(actual.Breadcrumbs);
    Assert.Equal(
        FSItem.CreateFolder("root",
        [
            FSItem.CreateFile("goat_yelling_like_man.wmv", "baaaaaa"),
            FSItem.CreateFile("pope_time.avi", "god bless"),
            FSItem.CreateFolder("cspi",
            [
                FSItem.CreateFile("ape_throwing_up.jpg", "bleargh"),
                FSItem.CreateFile("watermelon_smash.gif", "smash!!"),
                FSItem.CreateFile("skull_man(scary).bmp", "Yikes!")
            ]),
            FSItem.CreateFile("dijon_poupon.doc", "best mustard"),
            FSItem.CreateFolder("programs",
            [
                FSItem.CreateFile("fartwizard.exe", "10gotofart"),
                FSItem.CreateFile("owl_bandit.dmg", "mov eax, h00t"),
                FSItem.CreateFile("not_a_virus.exe", "really not a virus"),
                FSItem.CreateFolder("source code",
                [
                    FSItem.CreateFile("best_hs_prog.hs", "main = print (fix error)"),
                    FSItem.CreateFile("random.hs", "main = print 4")
                ])
            ])
        ]),
        actual.FSItem);
}

Since the test uses GoUp after Rename, the actual value contains the entire tree, while the Breadcrumbs collection is empty.

Adding a new file #

Finally, we can add a new file to a folder.

public FSZipper? Add(FSItem item)
{
    return FSItem.Match<FSZipper?>(
        whenFile: (_, _) => null,
        whenFolder: (name, items) => new FSZipper(
            FSItem.CreateFolder(name, items.Prepend(item).ToList()),
            Breadcrumbs));
}

This operation may fail, since we can't add a file to a file. This is, again, clearly indicated by the return type, which allows null.

This implementation adds the file to the start of the folder, but it would also be possible to add it at the end. I would consider that slightly more idiomatic in C#, but here I've followed the Haskell example code, which conses the new item to the beginning of the list. As is idiomatic in Haskell.

The following unit test reproduces the Haskell article's example.

[Fact]
public void AddPic()
{
    var sut = new FSZipper(myDisk);
 
    var actual = sut.GoTo("pics")?.Add(FSItem.CreateFile("heh.jpg", "lol"))?.GoUp();
 
    Assert.NotNull(actual);
    Assert.Equal(
        FSItem.CreateFolder("root",
        [
            FSItem.CreateFile("goat_yelling_like_man.wmv", "baaaaaa"),
            FSItem.CreateFile("pope_time.avi", "god bless"),
            FSItem.CreateFolder("pics",
            [
                FSItem.CreateFile("heh.jpg", "lol"),
                FSItem.CreateFile("ape_throwing_up.jpg", "bleargh"),
                FSItem.CreateFile("watermelon_smash.gif", "smash!!"),
                FSItem.CreateFile("skull_man(scary).bmp", "Yikes!")
            ]),
            FSItem.CreateFile("dijon_poupon.doc", "best mustard"),
            FSItem.CreateFolder("programs",
            [
                FSItem.CreateFile("fartwizard.exe", "10gotofart"),
                FSItem.CreateFile("owl_bandit.dmg", "mov eax, h00t"),
                FSItem.CreateFile("not_a_virus.exe", "really not a virus"),
                FSItem.CreateFolder("source code",
                [
                    FSItem.CreateFile("best_hs_prog.hs", "main = print (fix error)"),
                    FSItem.CreateFile("random.hs", "main = print 4")
                ])
            ])
        ]),
        actual.FSItem);
    Assert.Empty(actual.Breadcrumbs);
}

This example also follows the edit with a GoUp call, with the effect that the Zipper is once more focused on the entire tree. The assertion verifies that the new heh.jpg file is the first file in the pics folder.

Conclusion #

The code for FSZipper is actually a bit simpler than for the binary tree. This, I think, is mostly attributable to the FSZipper having fewer constituent sum types. While sum types are trivial, and extraordinarily useful in languages that natively support them, they require a lot of boilerplate in a language like C#.

Do you need something like FSZipper in C#? Probably not. As I've already discussed, this article series mostly exists as a programming exercise.

This blog is totally free, but if you like it, please consider supporting it.

Functor products

2024-09-16T06:08:00+00:00

A tuple or class of functors is also a functor. An article for object-oriented developers.

This article is part of a series of articles about functor relationships. In this one you'll learn about a universal composition of functors. In short, if you have a product type of functors, that data structure itself gives rise to a functor.

Together with other articles in this series, this result can help you answer questions such as: Does this data structure form a functor?

Since functors tend to be quite common, and since they're useful enough that many programming languages have special support or syntax for them, the ability to recognize a potential functor can be useful. Given a type like Foo<T> (C# syntax) or Bar<T1, T2>, being able to recognize it as a functor can come in handy. One scenario is if you yourself have just defined such a data type. Recognizing that it's a functor strongly suggests that you should give it a Select method in C#, a map function in F#, and so on.

Not all generic types give rise to a (covariant) functor. Some are rather contravariant functors, and some are invariant.

If, on the other hand, you have a data type which is a product of two or more (covariant) functors with the same type parameter, then the data type itself gives rise to a functor. You'll see some examples in this article.

Abstract shape #

Before we look at some examples found in other code, it helps if we know what we're looking for. Most (if not all?) languages support product types. In canonical form, they're just tuples of values, but in an object-oriented language like C#, such types are typically classes.

Imagine that you have two functors F and G, and you're now considering a data structure that contains a value of both types.

public sealed class FAndG<T>
{
    public FAndG(F<T> f, G<T> g)
    {
        F = f;
        G = g;
    }
 
    public F<T> F { get; }
    public G<T> G { get; }
 
    // Methods go here...

The name of the type is FAndG<T> because it contains both an F<T> object and a G<T> object.

Notice that it's an essential requirement that the individual functors (here F and G) are parametrized by the same type parameter (here T). If your data structure contains F<T1> and G<T2>, the following 'theorem' doesn't apply.

The point of this article is that such an FAndG<T> data structure forms a functor. The Select implementation is quite unsurprising:

public FAndG<TResult> Select<TResult>(Func<T, TResult> selector)
{
    return new FAndG<TResult>(F.Select(selector), G.Select(selector));
}

Since we've assumed that both F and G already are functors, they must come with some projection function. In C# it's idiomatically called Select, while in F# it'd typically be called map:

// ('a -> 'b) -> FAndG<'a> -> FAndG<'b>
let map f fandg = { F = F.map f fandg.F; G = G.map f fandg.G }

assuming a record type like

type FAndG<'a> = { F : F<'a>; G : G<'a> }

In both the C# Select example and the F# map function, the composed functor passes the function argument (selector or f) to both F and G and uses it to map both constituents. It then composes a new product from these individual results.

I'll have more to say about how this generalizes to a product of more than two functors, but first, let's consider some examples.

List Zipper #

One of the simplest example I can think of is a List Zipper, which in Haskell is nothing but a type alias of a tuple of lists:

type ListZipper a = ([a],[a])

In the article A List Zipper in C# you saw how the ListZipper<T> class composes two IEnumerable<T> objects.

private readonly IEnumerable<T> values;
public IEnumerable<T> Breadcrumbs { get; }
 
private ListZipper(IEnumerable<T> values, IEnumerable<T> breadcrumbs)
{
    this.values = values;
    Breadcrumbs = breadcrumbs;
}

Since we already know that sequences like IEnumerable<T> form functors, we now know that so must ListZipper<T>. And indeed, the Select implementation looks similar to the above 'shape outline'.

public ListZipper<TResult> Select<TResult>(Func<T, TResult> selector)
{
    return new ListZipper<TResult>(values.Select(selector), Breadcrumbs.Select(selector));
}

It passes the selector function to the Select method of both values and Breadcrumbs, and composes the results into a new ListZipper<TResult>.

While this example is straightforward, it may not be the most compelling, because ListZipper<T> composes two identical functors: IEnumerable<T>. The knowledge that functors compose is more general than that.

Non-empty collection #

Next after the above List Zipper, the simplest example I can think of is a non-empty list. On this blog I originally introduced it in the article Semigroups accumulate, but here I'll use the variant from NonEmpty catamorphism. It composes a single value of the type T with an IReadOnlyCollection<T>.

public NonEmptyCollection(T head, params T[] tail)
{
    if (head == null)
        throw new ArgumentNullException(nameof(head));
 
    this.Head = head;
    this.Tail = tail;
}
 
public T Head { get; }
 
public IReadOnlyCollection<T> Tail { get; }

The Tail, being an IReadOnlyCollection<T>, easily forms a functor, since it's a kind of list. But what about Head, which is a 'naked' T value? Does that form a functor? If so, which one?

Indeed, a 'naked' T value is isomorphic to the Identity functor. This situation is an example of how knowing about the Identity functor is useful, even if you never actually write code that uses it. Once you realize that T is equivalent with a functor, you've now established that NonEmptyCollection<T> composes two functors. Therefore, it must itself form a functor, and you realize that you can give it a Select method.

public NonEmptyCollection<TResult> Select<TResult>(Func<T, TResult> selector)
{
    return new NonEmptyCollection<TResult>(selector(Head), Tail.Select(selector).ToArray());
}

Notice that even though we understand that T is equivalent to the Identity functor, there's no reason to actually wrap Head in an Identity<T> container just to call Select on it and unwrap the result. Rather, the above Select implementation directly invokes selector with Head. It is, after all, a function that takes a T value as input and returns a TResult object as output.

Ranges #

It's hard to come up with an example that's both somewhat compelling and realistic, and at the same time prototypically pure. Stripped of all 'noise' functor products are just tuples, but that hardly makes for a compelling example. On the other hand, most other examples I can think of combine results about functors where they compose in more than one way. Not only as products, but also as sums of functors, as well as nested compositions. You'll be able to read about these in future articles, but for the next examples, you'll have to accept some claims about functors at face value.

In Range as a functor you saw how both Endpoint<T> and Range<T> are functors. The article shows functor implementations for each, in both C#, F#, and Haskell. For now we'll ignore the deeper underlying reason why Endpoint<T> forms a functor, and instead focus on Range<T>.

In Haskell I never defined an explicit Range type, but rather just treated ranges as tuples. As stated repeatedly already, tuples are the essential products, so if you accept that Endpoint gives rise to a functor, then a 'range tuple' does, too.

In F# Range is defined like this:

type Range<'a> = { LowerBound : Endpoint<'a>; UpperBound : Endpoint<'a> }

Such a record type is also easily identified as a product type. In a sense, we can think of a record type as a 'tuple with metadata', where the metadata contains names of elements.

In C# Range<T> is a class with two Endpoint<T> fields.

private readonly Endpoint<T> min;
private readonly Endpoint<T> max;
 
public Range(Endpoint<T> min, Endpoint<T> max)
{
    this.min = min;
    this.max = max;
}

In a sense, you can think of such an immutable class as equivalent to a record type, only requiring substantial ceremony. The point is that because a range is a product of two functors, it itself gives rise to a functor. You can see all the implementations in Range as a functor.

Binary tree Zipper #

In A Binary Tree Zipper in C# you saw that the BinaryTreeZipper<T> class has two class fields:

public BinaryTree<T> Tree { get; }
public IEnumerable<Crumb<T>> Breadcrumbs { get; }

Both have the same generic type parameter T, so the question is whether BinaryTreeZipper<T> may form a functor? We now know that the answer is affirmative if BinaryTree<T> and IEnumerable<Crumb<T>> are both functors.

For now, believe me when I claim that this is the case. This means that you can add a Select method to the class:

public BinaryTreeZipper<TResult> Select<TResult>(Func<T, TResult> selector)
{
    return new BinaryTreeZipper<TResult>(
        Tree.Select(selector),
        Breadcrumbs.Select(c => c.Select(selector)));
}

By now, this should hardly be surprising: Call Select on each constituent functor and create a proper return value from the results.

Higher arities #

All examples have involved products of only two functors, but the result generalizes to higher arities. To gain an understanding of why, consider that it's always possible to rewrite tuples of higher arities as nested pairs. As an example, a triple like (42, "foo", True) can be rewritten as (42, ("foo", True)) without loss of information. The latter representation is a pair (a two-tuple) where the first element is 42, but the second element is another pair. These two representations are isomorphic, meaning that we can go back and forth without losing data.

By induction you can generalize this result to any arity. The point is that the only data type you need to describe a product is a pair.

Haskell's base library defines a specialized container called Product for this very purpose: If you have two Functor instances, you can Pair them up, and they become a single Functor.

Let's start with a Pair of Maybe and a list:

ghci> Pair (Just "foo") ["bar", "baz", "qux"]
Pair (Just "foo") ["bar","baz","qux"]

This is a single 'object', if you will, that composes those two Functor instances. This means that you can map over it:

ghci> elem 'b' <$> Pair (Just "foo") ["bar", "baz", "qux"]
Pair (Just False) [True,True,False]

Here I've used the infix <$> operator as an alternative to fmap. By composing with elem 'b', I'm asking every value inside the container whether or not it contains the character b. The Maybe value doesn't, while the first two list elements do.

If you want to compose three, rather than two, Functor instances, you just nest the Pairs, just like you can nest tuples:

ghci> elem 'b' <$> Pair (Identity "quux") (Pair (Just "foo") ["bar", "baz", "qux"])
Pair (Identity False) (Pair (Just False) [True,True,False])

This example now introduces the Identity container as a third Functor instance. I could have used any other Functor instance instead of Identity, but some of them are more awkward to create or display. For example, the Reader or State functors have no Show instances in Haskell, meaning that GHCi doesn't know how to print them as values. Other Functor instances didn't work as well for the example, since they tend to be more awkward to create. As an example, any non-trivial Tree requires substantial editor space to express.

Conclusion #

A product of functors may itself be made a functor. The examples shown in this article are all constrained to two functors, but if you have a product of three, four, or more functors, that product still gives rise to a functor.

This is useful to know, particularly if you're working in a language with only partial support for functors. Mainstream languages aren't going to automatically turn such products into functors, in the way that Haskell's Product container almost does. Thus, knowing when you can safely give your generic types a Select method or map function may come in handy.

There are more rules like this one. The next article examines another.

Next: Functor sums.

This blog is totally free, but if you like it, please consider supporting it.

A Binary Tree Zipper in C#

2024-09-09T06:09:00+00:00

A port of another Haskell example, still just because.

This article is part of a series about Zippers. In this one, I port the Zipper data structure from the Learn You a Haskell for Great Good! article also called Zippers.

The code shown in this article is available on GitHub.

Binary tree initialization and structure #

In the Haskell code, the binary Tree type is a recursive sum type, defined on a single line of code. C#, on the other hand, has no built-in language construct that supports sum types, so a more elaborate solution is required. At least two options are available to us. One is to model a sum type as a Visitor. Another is to use Church encoding. In this article, I'll do the latter.

I find the type name (Tree) used in the Zippers article a bit too vague, and since I consider explicit better than implicit, I'll use a more precise class name:

public sealed class BinaryTree<T>

Even so, there are different kinds of binary trees. In a previous article I've shown a catamorphism for a full binary tree. This variation is not as strict, since it allows a node to have zero, one, or two children. Or, strictly speaking, a node always has exactly two children, but both, or one of them, may be empty. BinaryTree<T> uses Church encoding to distinguish between the two, but we'll return to that in a moment.

First, we'll examine how the class allows initialization:

private readonly IBinaryTree root;
 
private BinaryTree(IBinaryTree root)
{
    this.root = root;
}
 
public BinaryTree() : this(Empty.Instance)
{
}
 
public BinaryTree(T value, BinaryTree<T> left, BinaryTree<T> right)
    : this(new Node(value, left.root, right.root))
{
}

The class uses a private root object to implement behaviour, and constructor chaining for initialization. The master constructor is private, since the IBinaryTree interface is private. The parameterless constructor implicitly indicates an empty node, whereas the other public constructor indicates a node with a value and two children. Yes, I know that I just wrote that explicit is better than implicit, but it turns out that with the target-typed new operator feature in C#, constructing trees in code becomes easier with this design choice:

BinaryTree<int> sut = new(
    42,
    new(),
    new(2, new(), new()));

As the variable name suggests, I've taken this code example from a unit test.

Private interface #

The class delegates method calls to the root field, which is an instance of the private, nested IBinaryTree interface:

private interface IBinaryTree
{
    TResult Aggregate<TResult>(
        Func<TResult> whenEmpty,
        Func<T, TResult, TResult, TResult> whenNode);
}

Why is IBinaryTree a private interface? Why does that interface even exist?

To be frank, I could have chosen another implementation strategy. Since there's only two mutually exclusive alternatives (node or empty), I could also have indicated which is which with a Boolean flag. You can see an example of that implementation tactic in the Table class in the sample code that accompanies Code That Fits in Your Head.

Using a Boolean flag, however, only works when there are exactly two choices. If you have three or more, things because more complicated. You could try to use an enum, but in most languages, these tend to be nothing but glorified integers, and are typically not type-safe. If you define a three-way enum, there's no guarantee that a value of that type takes only one of these three values, and a good compiler will typically insist that you check for any other value as well. The C# compiler certainly does.

Church encoding offers a better alternative, but since it makes use of polymorphism, the most idiomatic choice in C# is either an interface or a base class. Since I favour interfaces over base classes, that's what I've chosen here, but for the purposes of this little digression, it makes no difference: The following argument applies to base classes as well.

An interface (or base class) suggests to users of an API that they can implement it in order to extend behaviour. That's an impression I don't wish to give client developers. The purpose of the interface is exclusively to enable double dispatch to work. There's only two implementations of the IBinaryTree interface, and under no circumstances should there be more.

The interface is an implementation detail, which is why both it, and its implementations, are private.

Binary tree catamorphism #

The IBinaryTree interface defines a catamorphism for the BinaryTree<T> class. Since we may often view a catamorphism as a sort of 'generalized fold', and since these kinds of operations in C# are typically called Aggregate, that's what I've called the method.

An aggregate function affords a way to traverse a data structure and collect information into a single value, here of type TResult. The return type may, however, be a complex type, including another BinaryTree<T>. You'll see examples of complex return values later in this article.

As already discussed, there are exactly two implementations of IBinaryTree. The one representing an empty node is the simplest:

private sealed class Empty : IBinaryTree
{
    public readonly static Empty Instance = new();
 
    private Empty()
    {
    }
 
    public TResult Aggregate<TResult>(
        Func<TResult> whenEmpty,
        Func<T, TResult, TResult, TResult> whenNode)
    {
        return whenEmpty();
    }
}

The Aggregate implementation unconditionally calls the supplied whenEmpty function, which returns some TResult value unknown to the Empty class.

Although not strictly necessary, I've made the class a Singleton. Since I like to take advantage of structural equality to write better tests, it was either that, or overriding Equals and GetHashCode.

The other implementation gets around that problem by being a record:

private sealed record Node(T Value, IBinaryTree Left, IBinaryTree Right) : IBinaryTree
{
    public TResult Aggregate<TResult>(
        Func<TResult> whenEmpty,
        Func<T, TResult, TResult, TResult> whenNode)
    {
        return whenNode(
            Value,
            Left.Aggregate(whenEmpty, whenNode),
            Right.Aggregate(whenEmpty, whenNode));
    }
}

It, too, unconditionally calls one of the two functions passed to its Aggregate method, but this time whenNode. It does that, however, by first recursively calling Aggregate on both Left and Right. It needs to do that because the whenNode function expects the subtrees to have been already converted to values of the TResult return type. This is a common pattern with catamorphisms, and takes a bit of time getting used to. You can see similar examples in the articles Tree catamorphism, Rose tree catamorphism, and Full binary tree catamorphism.

The BinaryTree<T> class defines a public Aggregate method that delegates to its root field:

public TResult Aggregate<TResult>(
    Func<TResult> whenEmpty,
    Func<T, TResult, TResult, TResult> whenNode)
{
    return root.Aggregate(whenEmpty, whenNode);
}

The astute reader may now remark that the Aggregate method doesn't look like a Church encoding.

Binary tree Church encoding #

A Church encoding will typically have a Match method that enables client code to match on all the alternative cases in the sum type, without those confusing already-converted TResult values. It turns out that you can implement the desired Match method with the Aggregate method.

One of the advantages of doing meaningless coding exercises like this one is that you can pursue various ideas that interest you. One idea that interests me is the potential universality of catamorphisms. I conjecture that a catamorphism is an algebraic data type's universal API, and that you can implement all other methods or functions with it. I admit that I haven't done much research in the form of perusing existing literature, but at least it seems to be the case conspicuously often.

As it is here.

public TResult Match<TResult>(
    Func<TResult> whenEmpty,
    Func<T, BinaryTree<T>, BinaryTree<T>, TResult> whenNode)
{
    return root
        .Aggregate(
            () => (tree: new BinaryTree<T>(), result: whenEmpty()),
            (x, l, r) => (
                new BinaryTree<T>(x, l.tree, r.tree),
                whenNode(x, l.tree, r.tree)))
        .result;
}

Now, I readily admit that it took me a couple of hours tossing and turning in my bed before this solution came to me. I don't find it intuitive at all, but it works.

The Aggregate method requires that the whenNode function's left and right values are of the same TResult type as the return type. How do we consolidate that requirement with the Match method's variation, where its whenNode function requires the left and right values to be BinaryTree<T> values, but the return type still TResult?

The way out of this conundrum, it turns out, is to combine both in a tuple. Thus, when Match calls Aggregate, the implied TResult type is not the TResult visible in the Match method declaration. Rather, it's inferred to be of the type (BinaryTree<T>, TResult). That is, a tuple where the first element is a BinaryTree<T> value, and the second element is a TResult value. The C# compiler's type inference engine then figures out that (BinaryTree<T>, TResult) must also be the return type of the Aggregate method call.

That's not what Match should return, but the second tuple element contains a value of the correct type, so it returns that. Since I've given the tuple elements names, the Match implementation accomplishes that by returning the result tuple field.

Breadcrumbs #

That's just the tree that we want to zip. So far, we can only move from root to branches, but not the other way. Before we can define a Zipper for the tree, we need a data structure to store breadcrumbs (the navigation log, if you will).

In Haskell it's just another one-liner, but in C# this requires another full-fledged class:

public sealed class Crumb<T>

It's another sum type, so once more, I make the constructor private and use a private class field for the implementation:

private readonly ICrumb imp;
 
private Crumb(ICrumb imp)
{
    this.imp = imp;
}
 
internal static Crumb<T> Left(T value, BinaryTree<T> right)
{
    return new(new LeftCrumb(value, right));
}
 
internal static Crumb<T> Right(T value, BinaryTree<T> left)
{
    return new(new RightCrumb(value, left));
}

To stay consistent throughout the code base, I also use Church encoding to distinguish between a Left and Right breadcrumb, and the technique is similar. First, define a private interface:

private interface ICrumb
{
    TResult Match<TResult>(
        Func<T, BinaryTree<T>, TResult> whenLeft,
        Func<T, BinaryTree<T>, TResult> whenRight);
}

Then, use private nested types to implement the interface.

private sealed record LeftCrumb(T Value, BinaryTree<T> Right) : ICrumb
{
    public TResult Match<TResult>(
        Func<T, BinaryTree<T>, TResult> whenLeft,
        Func<T, BinaryTree<T>, TResult> whenRight)
    {
        return whenLeft(Value, Right);
    }
}

The RightCrumb record is essentially just the 'mirror image' of the LeftCrumb record, and just as was the case with BinaryTree<T>, the Crumb<T> class exposes an externally accessible Match method that just delegates to the private class field:

public TResult Match<TResult>(
    Func<T, BinaryTree<T>, TResult> whenLeft,
    Func<T, BinaryTree<T>, TResult> whenRight)
{
    return imp.Match(whenLeft, whenRight);
}

Finally, all the building blocks are ready for the actual Zipper.

Zipper data structure and initialization #

In the Haskell code, the Zipper is another one-liner, and really just a type alias. In C#, once more, we're going to need a full class.

public sealed class BinaryTreeZipper<T>

The Haskell article simply calls this type alias Zipper, but I find that name too general, since there's more than one kind of Zipper. I think I understand that the article chooses that name for didactic reasons, but here I've chosen a more consistent disambiguation scheme, so I've named the class BinaryTreeZipper<T>.

The Haskell example is just a type alias for a tuple, and the C# class is similar, although with significantly more ceremony:

public BinaryTree<T> Tree { get; }
public IEnumerable<Crumb<T>> Breadcrumbs { get; }
 
private BinaryTreeZipper(
    BinaryTree<T> tree,
    IEnumerable<Crumb<T>> breadcrumbs)
{
    Tree = tree;
    Breadcrumbs = breadcrumbs;
}
 
public BinaryTreeZipper(BinaryTree<T> tree) : this(tree, [])
{
}

I've here chosen to add an extra bit of encapsulation by making the master constructor private. This prevents client code from creating an arbitrary object with breadcrumbs without having navigated through the tree. To be honest, I don't think it violates any contract even if we allow this, but it at least highlights that the Breadcrumbs role is to keep a log of what previously happened to the object.

Navigation #

We can now reproduce the navigation functions from the Haskell article.

public BinaryTreeZipper<T>? GoLeft()
{
    return Tree.Match<BinaryTreeZipper<T>?>(
        whenEmpty: () => null,
        whenNode: (x, l, r) => new BinaryTreeZipper<T>(
            l,
            Breadcrumbs.Prepend(Crumb.Left(x, r))));
}

Going left 'pattern-matches' on the Tree and, if not empty, constructs a new BinaryTreeZipper object with the left tree, and a Left breadcrumb that stores the 'current' node value and the right subtree. If the 'current' node is empty, on the other hand, the method returns null. This possibility is explicitly indicated by the BinaryTreeZipper<T>? return type; notice the question mark, which indicates that the value may be null. If you're working in a context or language where that feature isn't available, you may instead consider taking advantage of the Maybe monad (which is also what you'd idiomatically do in Haskell).

The GoRight method is similar to GoLeft.

We may also attempt to navigate up in the tree, undoing our last downward move:

public BinaryTreeZipper<T>? GoUp()
{
    if (!Breadcrumbs.Any())
        return null;
    var head = Breadcrumbs.First();
 
    var tail = Breadcrumbs.Skip(1);
    return head.Match(
        whenLeft: (x, r) => new BinaryTreeZipper<T>(
            new BinaryTree<T>(x, Tree, r),
            tail),
        whenRight: (x, l) => new BinaryTreeZipper<T>(
            new BinaryTree<T>(x, l, Tree),
            tail));
}

This is another operation that may fail. If we're already at the root of the tree, there are no Breadcrumbs, in which case the only option is to return a value indicating that the operation failed; here, null, but in other languages perhaps None or Nothing.

If, on the other hand, there's at least one breadcrumb, the GoUp method uses the most recent one (head) to construct a new BinaryTreeZipper<T> object that reconstitutes the opposite (sibling) subtree and the parent node. It does that by 'pattern-matching' on the head breadcrumb, which enables it to distinguish a left breadcrumb from a right breadcrumb.

Finally, we may keep trying to GoUp until we reach the root:

public BinaryTreeZipper<T> TopMost()
{
    return GoUp()?.TopMost() ?? this;
}

You'll see an example of that a little later.

Modifications #

Continuing the port of the Haskell code, we can Modify the current node with a function:

public BinaryTreeZipper<T> Modify(Func<T, T> f)
{
    return new BinaryTreeZipper<T>(
        Tree.Match(
            whenEmpty: () => new BinaryTree<T>(),
            whenNode: (x, l, r) => new BinaryTree<T>(f(x), l, r)),
        Breadcrumbs);
}

This operation always succeeds, since it chooses to ignore the change if the tree is empty. Thus, there's no question mark on the return type, indicating that the method never returns null.

Finally, we may replace a node with a new subtree:

public BinaryTreeZipper<T> Attach(BinaryTree<T> tree)
{
    return new BinaryTreeZipper<T>(tree, Breadcrumbs);
}

The following unit test demonstrates a combination of several of the methods shown above:

[Fact]
public void AttachAndGoTopMost()
{
    var sut = new BinaryTreeZipper<char>(freeTree);

    var farLeft = sut.GoLeft()?.GoLeft()?.GoLeft()?.GoLeft();
    var actual = farLeft?.Attach(new('Z', new(), new())).TopMost();
 
    Assert.NotNull(actual);
    Assert.Equal(
        new('P',
            new('O',
                new('L',
                    new('N',
                        new('Z', new(), new()),
                        new()),
                    new('T', new(), new())),
                new('Y',
                    new('S', new(), new()),
                    new('A', new(), new()))),
            new('L',
                new('W',
                    new('C', new(), new()),
                    new('R', new(), new())),
                new('A',
                    new('A', new(), new()),
                    new('C', new(), new())))),
        actual.Tree);
    Assert.Empty(actual.Breadcrumbs);
}

The test starts with freeTree (not shown) and first navigates to the leftmost empty node. Here it uses Attach to add a new 'singleton' subtree with the value 'Z'. Finally, it uses TopMost to return to the root node.

In the Assert phase, the test verifies that the actual object contains the expected values.

Conclusion #

The Tree Zipper shown here is a port of the example given in the Haskell Zippers article. As I've already discussed in the introduction article, this data structure doesn't make much sense in C#, where you can easily implement a navigable tree with two-way links. Even if this requires state mutation, you can package such a data structure in a proper object with good encapsulation, so that operations don't leave any dangling pointers or the like.

As far as I can tell, the code shown in this article isn't useful in production code, but I hope that, at least, you still learned something from it. I always learn a new thing or two from doing programming exercises and writing about them, and this was no exception.

In the next article, I continue with the final of the Haskell article's three examples.

Next: FSZipper in C#.

This blog is totally free, but if you like it, please consider supporting it.

Keeping cross-cutting concerns out of application code

2024-09-02T06:19:00+00:00

Don't inject third-party dependencies. Use Decorators.

I recently came across a Stack Overflow question that reminded me of a topic I've been meaning to write about for a long time: Cross-cutting concerns.

When it comes to the usual suspects, logging, fault tolerance, caching, the best solution is usually to apply the Decorator pattern.

I often see code that uses Dependency Injection (DI) to inject, say, a logging interface into application code. You can see an example of that in Repeatable execution, as well as a suggestion for a better design. Not surprisingly, the better design involves logging Decorators.

The Stack Overflow question isn't about logging, but rather about fault tolerance; Circuit Breaker, retry policies, timeouts, etc.

Injected concern #

The question does a good job of presenting a minimal, reproducible example. At the outset, the code looks like this:

public class MyApi
{
    private readonly ResiliencePipeline pipeline;
    private readonly IOrganizationService service;
 
    public MyApi(ResiliencePipelineProvider<string> provider, IOrganizationService service)
    {
        this.pipeline = provider.GetPipeline("retry-pipeline");
        this.service = service;
    }
 
    public List<string> GetSomething(QueryByAttribute query)
    {
        var result = this.pipeline.Execute(() => service.RetrieveMultiple(query));
        return result.Entities.Cast<string>().ToList();
    }
}

The Stack Overflow question asks how to test this implementation, but I'd rather take the example as an opportunity to discuss design alternatives. Not surprisingly, it turns out that with a more decoupled design, testing becomes easier, too.

Before we proceed, a few words about this example code. I assume that this isn't Andy Cooke's actual production code. Rather, I interpret it as a reduced example that highlights the actual question. This is important because you might ask: Why bother testing two lines of code?

Indeed, as presented, the GetSomething method is so simple that you may consider not testing it. Thus, I interpret the second line of code as a stand-in for more complicated production code. Hold on to that thought, because once I'm done, that's all that's going to be left, and you may then think that it's so simple that it really doesn't warrant all this hoo-ha.

Coupling #

As shown, the MyApi class is coupled to Polly, because ResiliencePipeline is defined by that library. To be clear, all I've heard is that Polly is a fine library. I've used it for a few projects myself, but I also admit that I haven't that much experience with it. I'd probably use it again the next time I need a Circuit Breaker or similar, so the following discussion isn't a denouncement of Polly. Rather, it applies to all third-party dependencies, or perhaps even dependencies that are part of your language's base library.

Coupling is a major cause of spaghetti code and code rot in general. To write sustainable code, you should be cognizant of coupling. The most decoupled code is code that you can easily delete.

This doesn't mean that you shouldn't use high-quality third-party libraries like Polly. Among myriads of software engineering heuristics, we know that we should be aware of the not-invented-here syndrome.

When it comes to classic cross-cutting concerns, the Decorator pattern is usually a better design than injecting the concern into application code. The above example clearly looks innocuous, but imagine injecting both a ResiliencePipeline, a logger, and perhaps a caching service, and your real application code eventually disappears in 'infrastructure code'.

It's not that we don't want to have these third-party dependencies, but rather that we want to move them somewhere else.

Resilient Decorator #

The concern in the above example is the desire to make the IOrganizationService dependency more resilient. The MyApi class only becomes more resilient as a transitive effect. The first refactoring step, then, is to introduce a resilient Decorator.

public sealed class ResilientOrganizationService(
    ResiliencePipeline pipeline,
    IOrganizationService inner) : IOrganizationService
{
    public QueryResult RetrieveMultiple(QueryByAttribute query)
    {
        return pipeline.Execute(() => inner.RetrieveMultiple(query));
    }
}

As Decorators must, this class composes another IOrganizationService while also implementing that interface itself. It does so by being an Adapter over the Polly API.

I've applied Nikola Malovic's 4th law of DI:

"Every constructor of a class being resolved should not have any implementation other then accepting a set of its own dependencies."

Inversion Of Control, Single Responsibility Principle and Nikola’s laws of dependency injection, Nikola Malovic, 2009

Instead of injecting a ResiliencePipelineProvider<string> only to call GetPipeline on it, it just receives a ResiliencePipeline and saves the object for use in the RetrieveMultiple method. It does that via a primary constructor, which is a recent C# language addition. It's just syntactic sugar for Constructor Injection, and as usual F# developers should feel right at home.

Simplifying MyApi #

Now that you have a resilient version of IOrganizationService you don't need to have any Polly code in MyApi. Remove it and simplify:

public class MyApi
{
    private readonly IOrganizationService service;
 
    public MyApi(IOrganizationService service)
    {
        this.service = service;
    }
 
    public List<string> GetSomething(QueryByAttribute query)
    {
        var result = service.RetrieveMultiple(query);
        return result.Entities.Cast<string>().ToList();
    }
}

As promised, there's almost nothing left of it now, but I'll remind you that I consider the second line of GetSomething as a stand-in for something more complicated that you might need to test. As it is now, though, testing it is trivial:

[Theory]
[InlineData("foo", "bar", "baz")]
[InlineData("qux", "quux", "corge")]
[InlineData("grault", "garply", "waldo")]
public void GetSomething(params string[] expected)
{
    var service = new Mock<IOrganizationService>();
    service
        .Setup(s => s.RetrieveMultiple(new QueryByAttribute()))
        .Returns(new QueryResult(expected));
    var sut = new MyApi(service.Object);
 
    var actual = sut.GetSomething(new QueryByAttribute());
 
    Assert.Equal(expected, actual);
}

The larger point, however, is that not only have you now managed to keep third-party dependencies out of your application code, you've also simplified it and made it easier to test.

Composition #

You can still create a resilient MyApi object in your Composition Root:

var service = new ResilientOrganizationService(pipeline, inner);
var myApi = new MyApi(service);

Decomposing the problem in this way, you decouple your application code from third-party dependencies. You can define ResilientOrganizationService in the application's Composition Root, which also keeps the Polly dependency there. Even so, you can implement MyApi as part of your application layer.

I usually illustrate Ports and Adapters, or, if you will, Clean Architecture as concentric circles, but in this diagram I've skewed the circles to make space for the boxes. In other words, the diagram is 'not to scale'. Ideally, the outermost layer is much smaller and thinner than any of the the other layers. I've also included an inner green layer which indicates the architecture's Domain Model, but since I assume that MyApi is part of some application layer, I've left the Domain Model empty.

Reasons to decouple #

Why is it important to decouple application code from Polly? First, keep in mind that in this discussion Polly is just a stand-in for any third-party dependency. It's up to you as a software architect to decide how you'll structure your code, but third-party dependencies are one of the first things I look for. A third-party component changes with time, and often independently of your base platform. You may have to deal with breaking changes or security patches at inopportune times. The organization that maintains the component may cease to operate. This happens to commercial entities and open-source contributors alike, although for different reasons.

Second, even a top-tier library like Polly will undergo changes. If your time horizon is five to ten years, you'll be surprised how much things change. You may protest that no-one designs software systems with such a long view, but I think that if you ask the business people involved with your software, they most certainly expect your system to last a long time.

I believe that I heard on a podcast that some Microsoft teams had taken a dependency on Polly. Assuming, for the sake of argument, that this is true, while we may not wish to depend on some random open-source component, depending on Polly is safe, right? In the long run, it isn't. Five years ago, you had the same situation with Json.NET, but then Microsoft hired James Newton-King and had him make a JSON API as part of the .NET base library. While Json.NET isn't dead by any means, now you have two competing JSON libraries, and Microsoft uses their own in the frameworks and libraries that they release.

Deciding to decouple your application code from a third-party component is ultimately a question of risk management. It's up to you to make the bet. Do you pay the up-front cost of decoupling, or do you postpone it, hoping it'll never be necessary?

I usually do the former, because the cost is low, and there are other benefits as well. As I've already touched on, unit testing becomes easier.

Configuration #

Since Polly only lives in the Composition Root, you'll also need to define the ResiliencePipeline there. You can write the code that creates that pieline wherever you like, but it might be natural to make it a creation function on the ResilientOrganizationService class:

public static ResiliencePipeline CreatePipeline()
{
    return new ResiliencePipelineBuilder()
        .AddRetry(new RetryStrategyOptions
        {
            MaxRetryAttempts = 4
        })
        .AddTimeout(TimeSpan.FromSeconds(1))
        .Build();
}

That's just an example, and perhaps not what you'd like to do. Perhaps you rather want some of these values to be defined in a configuration file. Thus, this isn't what you have to do, but rather what you could do.

If you use this option, however, you could take the return value of this method and inject it into the ResilientOrganizationService constructor.

Conclusion #

Cross-cutting concerns, like caching, logging, security, or, in this case, fault tolerance, are usually best addressed with the Decorator pattern. In this article, you saw an example of using the Decorator pattern to decouple the concern of fault tolerance from the consumer of the service that you need to handle in a fault-tolerant manner.

The specific example dealt with the Polly library, but the point isn't that Polly is a particularly nasty third-party component that you need to protect yourself against. Rather, it just so happened that I came across a Stack Overflow question that used Polly, and I though it was a a nice example.

As far as I can tell, Polly is actually one of the top .NET open-source packages, so this article is not a denouncement of Polly. It's just a sketch of how to move useful dependencies around in your code base to make sure that they impact your application code as little as possible.

This blog is totally free, but if you like it, please consider supporting it.

A List Zipper in C#

2024-08-26T13:19:00+00:00

A port of a Haskell example, just because.

This article is part of a series about Zippers. In this one, I port the ListZipper data structure from the Learn You a Haskell for Great Good! article also called Zippers.

A word of warning: I'm assuming that you're familiar with the contents of that article, so I'll skip the pedagogical explanations; I can hardly do it better that it's done there.

The code shown in this article is available on GitHub.

Initialization and structure #

In the Haskell code, ListZipper is just a type alias, but C# doesn't have that, so instead, we'll have to introduce a class.

public sealed class ListZipper<T> : IEnumerable<T>

Since it implements IEnumerable<T>, it may be used like any other sequence, but it also comes with some special operations that enable client code to move forward and backward, as well as inserting and removing values.

The class has the following fields, properties, and constructors:

private readonly IEnumerable<T> values;
public IEnumerable<T> Breadcrumbs { get; }
 
private ListZipper(IEnumerable<T> values, IEnumerable<T> breadcrumbs)
{
    this.values = values;
    Breadcrumbs = breadcrumbs;
}
 
public ListZipper(IEnumerable<T> values) : this(values, [])
{
}
 
public ListZipper(params T[] values) : this(values.AsEnumerable())
{
}

It uses constructor chaining to initialize a ListZipper object with proper encapsulation. Notice that the master constructor is private. This prevents client code from initializing an object with arbitrary Breadcrumbs. Rather, the Breadcrumbs (the log, if you will) is going to be the result of various operations performed by client code, and only the ListZipper class itself can use this constructor.

You may consider the constructor that takes a single IEnumerable<T> as the 'main' public constructor, and the other one as a convenience that enables a client developer to write code like new ListZipper<string>("foo", "bar", "baz").

The class' IEnumerable<T> implementation only enumerates the values:

public IEnumerator<T> GetEnumerator()
{
    return values.GetEnumerator();
}

In other words, when enumerating a ListZipper, you only get the 'forward' values. Client code may still examine the Breadcrumbs, since this is a public property, but it should have little need for that.

(I admit that making Breadcrumbs public is a concession to testability, since it enabled me to write assertions against this property. It's a form of structural inspection, which is a technique that I use much less than I did a decade ago. Still, in this case, while you may argue that it violates information hiding, it at least doesn't allow client code to put an object in an invalid state. Had the ListZipper class been a part of a reusable library, I would probably have hidden that data, too, but since this is exercise code, I found this an acceptable compromise. Notice, too, that in the original Haskell code, the breadcrumbs are available to client code.)

Regular readers of this blog may be aware that I usually favour IReadOnlyCollection<T> over IEnumerable<T>. Here, on the other hand, I've allowed values to be any IEnumerable<T>, which includes infinite sequences. I decided to do that because Haskell lists, too, may be infinite, and as far as I can tell, ListZipper actually does work with infinite sequences. I have, at least, written a few tests with infinite sequences, and they pass. (I may still have missed an edge case or two. I can't rule that out.)

Movement #

It's not much fun just being able to initialize an object. You also want to be able to do something with it, such as moving forward:

public ListZipper<T>? GoForward()
{
    var head = values.Take(1);
    if (!head.Any())
        return null;
 
    var tail = values.Skip(1);
    return new ListZipper<T>(tail, head.Concat(Breadcrumbs));
}

You can move forward through any IEnumerable, so why make things so complicated? The benefit of this GoForward method (function, really) is that it records where it came from, which means that moving backwards becomes an option:

public ListZipper<T>? GoBack()
{
    var head = Breadcrumbs.Take(1);
    if (!head.Any())
        return null;
 
    var tail = Breadcrumbs.Skip(1);
    return new ListZipper<T>(head.Concat(values), tail);
}

This test may serve as an example of client code that makes use of those two operations:

[Fact]
public void GoBack1()
{
    var sut = new ListZipper<int>(1, 2, 3, 4);
 
    var actual = sut.GoForward()?.GoForward()?.GoForward()?.GoBack();
 
    Assert.Equal([3, 4], actual);
    Assert.Equal([2, 1], actual?.Breadcrumbs);
}

Going forward takes the first element off values and adds it to the front of Breadcrumbs. Going backwards is nearly symmetrical: It takes the first element off the Breadcrumbs and adds it back to the front of the values. Used in this way, Breadcrumbs works as a stack.

Notice that both GoForward and GoBack admit the possibility of failure. If values is empty, you can't go forward. If Breadcrumbs is empty, you can't go back. In both cases, the functions return null, which are also indicated by the ListZipper<T>? return types; notice the question mark, which indicates that the value may be null. If you're working in a context or language where that feature isn't available, you may instead consider taking advantage of the Maybe monad (which is also what you'd idiomatically do in Haskell).

To be clear, the Zippers article does discuss handling failures using Maybe, but only applies it to its binary tree example. Thus, the error handling shown here is my own addition.

Modifications #

In addition to moving back and forth in the list, we can also modify it. The following operations are also not in the Zippers article, but are rather my own contributions. Adding a new element is easy:

public ListZipper<T> Insert(T value)
{
    return new ListZipper<T>(values.Prepend(value), Breadcrumbs);
}

Notice that this operation is always possible. Even if the list is empty, we can Insert a value. In that case, it just becomes the list's first and only element.

A simple test demonstrates usage:

[Fact]
public void InsertAtFocus()
{
    var sut = new ListZipper<string>("foo", "bar");
 
    var actual = sut.GoForward()?.Insert("ploeh").GoBack();
 
    Assert.NotNull(actual);
    Assert.Equal(["foo", "ploeh", "bar"], actual);
    Assert.Empty(actual.Breadcrumbs);
}

Likewise, we may attempt to remove an element from the list:

public ListZipper<T>? Remove()
{
    if (!values.Any())
        return null;
 
    return new ListZipper<T>(values.Skip(1), Breadcrumbs);
}

Contrary to Insert, the Remove operation will fail if values is empty. Notice that this doesn't necessarily imply that the list as such is empty, but only that the focus is at the end of the list (which, of course, never happens if values is infinite):

[Fact]
public void RemoveAtEnd()
{
    var sut = new ListZipper<string>("foo", "bar").GoForward()?.GoForward();
 
    var actual = sut?.Remove();
 
    Assert.Null(actual);
    Assert.NotNull(sut);
    Assert.Empty(sut);
    Assert.Equal(["bar", "foo"], sut.Breadcrumbs);
}

In this example, the focus is at the end of the list, so there's nothing to remove. The list, however, is not empty, but all the data currently reside in the Breadcrumbs.

Finally, we can combine insertion and removal to implement a replacement operation:

public ListZipper<T>? Replace(T newValue)
{
    return Remove()?.Insert(newValue);
}

As the name implies, this operation replaces the value currently in focus with a completely different value. Here's an example:

[Fact]
public void ReplaceAtFocus()
{
    var sut = new ListZipper<string>("foo", "bar", "baz");
 
    var actual = sut.GoForward()?.Replace("qux")?.GoBack();
 
    Assert.NotNull(actual);
    Assert.Equal(["foo", "qux", "baz"], actual);
    Assert.Empty(actual.Breadcrumbs);
}

Once more, this may fail if the current focus is empty, so Replace also returns a nullable value.

Conclusion #

For a C# developer, the ListZipper<T> class looks odd. Why would you ever want to use this data structure? Why not just use List<T>?

As I hope I've made clear in the introduction article, I can't, indeed, think of a good reason.

I've gone through this exercise to hone my skills, and to prepare myself for the more intimidating exercise it is to implement a binary tree Zipper.

Next: A Binary Tree Zipper in C#.

This blog is totally free, but if you like it, please consider supporting it.

Zippers

2024-08-19T14:13:00+00:00

Some functional programming examples ported to C#, just because.

Many algorithms rely on data structures that enable the implementation to move in more than one way. A simple example is a doubly-linked list, where an algorithm can move both forward and backward from a given element. Other examples are various tree-based algorithms, such as red-black trees where certain operations trigger reorganization of the tree. Yet other data structures, such as Fibonacci heaps, combine doubly-linked lists with trees that allow navigation in more than one direction.

In an imperative programming language, you can easily implement such data structures, as long as the language allows data mutation. Here's a simple example:

var node1 = new Node<string>("foo");
var node2 = new Node<string>("bar") { Previous = node1 };
node1.Next = node2;

It's possible to double-link node1 to node2 by first creating node1. At that point, node2 still doesn't exist, so you can't yet assign node1.Next, but once you've initialized node2, you can mutate the state of node1 by changing its Next property.

When data structures are immutable (as they must be in functional programming) this is no longer possible. How may you get around that limitation?

Alternatives #

Some languages get around this problem in various ways. Haskell, because of its lazy evaluation, enables a technique called tying the knot that, frankly, makes my head hurt.

Even though I write a decent amount of Haskell code, that's not something that I make use of. Usually, it turns out, you can solve most problems by thinking about them differently. By choosing another perspective, and another data structure, you can often arrive at a good, functional solution to your problem.

One family of general-purpose data structures are called Zippers. The general idea is that the data structure has a natural 'focus' (e.g. the head of a list), but it also keeps a record of 'breadcrumbs', that is, where the caller has previously been. This enables client code to 'go back' or 'go up', if the natural direction is to 'go forward' or 'go down'. It's a bit like Event Sourcing, in that every operation leaves a log entry that can later be used to reconstruct what happened. Repeatable Execution also comes to mind, although it's not quite the same.

For an introduction to Zippers, I recommend the excellent and highly readable article Zippers. In this article series, I'm going to assume that you're familiar with the contents of that article.

C# ports #

While I may add more articles to this series in the future, as I'm writing this, I have nothing more planned than writing about how it's possible to implement the article's three Zippers in C#.

Why would you want to do this?

To be honest, for production code, I can't think of a good reason. I did it for a few reasons, most of them didactic. Additionally, writing code for exercise helps you improve. If you know enough Haskell to understand what's going on in the Zippers article, you may consider porting some of it to your favourite language, as an exercise.

It may help you grokking functional programming.

That's really it, though. There's no reason to use Zippers in a language like C#, which idiomatically makes use of mutation. If you want a doubly-linked list, you can just write code as shown in the beginning of this article.

If you're interested in an F# perspective on Zippers, Tomas Petricek has a cool article: Processing trees with F# zipper computation.

Conclusion #

Zippers constitute a family of data structures that enables you to move in multiple directions. Left and right in a list. Up or down in a tree. For an imperative programmer, that's literally just another day at the office, but in disciplined functional programming, making cyclic graphs can be surprisingly tricky.

Even in functional programming, I rarely reach for a Zipper, since I can often find a library with a higher level of abstraction that does what I need it to do. Still, learning of new ways to solve problems never seems a waste to me.

In the next three articles, I'll go through the examples from the Zipper article and show how I ported them to C#. While that article starts with a binary tree, I'll instead begin with the doubly-linked list, since it's the simplest of the three.

Next: A List Zipper in C#.

This blog is totally free, but if you like it, please consider supporting it.

Using only a Domain Model to persist restaurant table configurations

2024-08-12T12:57:00+00:00

A data architecture example in C# and ASP.NET.

This is part of a small article series on data architectures. In this, the third instalment, you'll see an alternative way of modelling data in a server-based application. One that doesn't rely on statically typed classes to model data. As the introductory article explains, the example code shows how to create a new restaurant table configuration, or how to display an existing resource. The sample code base is an ASP.NET 8.0 REST API.

Keep in mind that while the sample code does store data in a relational database, the term table in this article mainly refers to physical tables, rather than database tables.

The idea is to use 'raw' serialization APIs to handle communication with external systems. For the presentation layer, the example even moves representation concerns to middleware, so that it's nicely abstracted away from the application layer.

An architecture diagram like this attempts to capture the design:

Here, the arrows indicate mappings, not dependencies.

Like in the DTO-based Ports and Adapters architecture, the goal is to being able to design Domain Models unconstrained by serialization concerns, but also being able to format external data unconstrained by Reflection-based serializers. Thus, while this architecture is centred on a Domain Model, there are no Data Transfer Objects (DTOs) to represent JSON, XML, or database rows.

HTTP interaction #

To establish the context of the application, here's how HTTP interactions may play out. The following is a copy of the identically named section in the article Using Ports and Adapters to persist restaurant table configurations, repeated here for your convenience.

A client can create a new table with a POST HTTP request:

POST /tables HTTP/1.1
content-type: application/json

{ "communalTable": { "capacity": 16 } }

Which might elicit a response like this:

HTTP/1.1 201 Created
Location: https://example.com/Tables/844581613e164813aa17243ff8b847af

Clients can later use the address indicated by the Location header to retrieve a representation of the resource:

GET /Tables/844581613e164813aa17243ff8b847af HTTP/1.1
accept: application/json

Which would result in this response:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{"communalTable":{"capacity":16}}

By default, ASP.NET handles and returns JSON. Later in this article you'll see how well it deals with other data formats.

Boundary #

ASP.NET supports some variation of the model-view-controller (MVC) pattern, and Controllers handle HTTP requests. At the outset, the action method that handles the POST request looks like this:

[HttpPost]
public async Task<IActionResult> Post(Table table)
{
    var id = Guid.NewGuid();
    await repository.Create(id, table).ConfigureAwait(false);
 
    return new CreatedAtActionResult(
        nameof(Get),
        null,
        new { id = id.ToString("N") },
        null);
}

While this looks identical to the Post method for the Shared Data Model architecture, it's not, because it's not the same Table class. Not by a long shot. The Table class in use here is the one originally introduced in the article Serializing restaurant tables in C#, with a few inconsequential differences.

How does a Controller action method receive an input parameter directly in the form of a Domain Model, keeping in mind that this particular Domain Model is far from serialization-friendly? The short answer is middleware, which we'll get to in a moment. Before we look at that, however, let's also look at the Get method that supports HTTP GET requests:

[HttpGet("{id}")]
public async Task<IActionResult> Get(string id)
{
    if (!Guid.TryParseExact(id, "N", out var guid))
        return new BadRequestResult();
    Table? table = await repository.Read(guid).ConfigureAwait(false);
    if (table is null)
        return new NotFoundResult();
    return new OkObjectResult(table);
}

This, too, looks exactly like the Shared Data Model architecture, again with the crucial difference that the Table class is completely different. The Get method just takes the table object and wraps it in an OkObjectResult and returns it.

The Table class is, in reality, extraordinarily opaque, and not at all friendly to serialization, so how do the service turn it into JSON?

JSON middleware #

Most web frameworks come with extensibility points where you can add middleware. A common need is to be able to add custom serializers. In ASP.NET they're called formatters, and can be added at application startup:

builder.Services.AddControllers(opts =>
{
    opts.InputFormatters.Insert(0, new TableJsonInputFormatter());
    opts.OutputFormatters.Insert(0, new TableJsonOutputFormatter());
});

As the names imply, TableJsonInputFormatter deserializes JSON input, while TableJsonOutputFormatter serializes strongly typed objects to JSON.

We'll look at each in turn, starting with TableJsonInputFormatter, which is responsible for deserializing JSON documents into Table objects, as used by, for example, the Post method.

JSON input formatter #

You create an input formatter by implementing the IInputFormatter interface, although in this example code base, inheriting from TextInputFormatter is enough:

internal sealed class TableJsonInputFormatter : TextInputFormatter

You can use the constructor to define which media types and encodings the formatter will support:

public TableJsonInputFormatter()
{
    SupportedMediaTypes.Add(MediaTypeHeaderValue.Parse("application/json"));
 
    SupportedEncodings.Add(Encoding.UTF8);
    SupportedEncodings.Add(Encoding.Unicode);
}

You'll also need to tell the formatter, which .NET type it supports:

protected override bool CanReadType(Type type)
{
    return type == typeof(Table);
}

As far as I can tell, the ASP.NET framework will first determine which action method (that is, which Controller, and which method on that Controller) should handle a given HTTP request. For a POST request, as shown above, it'll determine that the appropriate action method is the Post method.

Since the Post method takes a Table object as input, the framework then goes through the registered formatters and asks them whether they can read from an HTTP request into that type. In this case, the TableJsonInputFormatter answers true only if the type is Table.

When CanReadType answers true, the framework then invokes a method to turn the HTTP request into an object:

public override async Task<InputFormatterResult> ReadRequestBodyAsync(
    InputFormatterContext context,
    Encoding encoding)
{
    using var rdr = new StreamReader(context.HttpContext.Request.Body, encoding);
    var json = await rdr.ReadToEndAsync().ConfigureAwait(false);
 
    var table = TableJson.Deserialize(json);
    if (table is { })
        return await InputFormatterResult.SuccessAsync(table).ConfigureAwait(false);
    else
        return await InputFormatterResult.FailureAsync().ConfigureAwait(false);
}

The ReadRequestBodyAsync method reads the HTTP request body into a string value called json, and then passes the value to TableJson.Deserialize. You can see the implementation of the Deserialize method in the article Serializing restaurant tables in C#. In short, it uses the default .NET JSON parser to probe a document object model. If it can turn the JSON document into a Table value, it does that. Otherwise, it returns null.

The above ReadRequestBodyAsync method then checks if the return value from TableJson.Deserialize is null. If it's not, it wraps the result in a value that indicates success. If it's null, it uses FailureAsync to indicate a deserialization failure.

With this input formatter in place as middleware, any action method that takes a Table parameter will automatically receive a deserialized JSON object, if possible.

JSON output formatter #

The TableJsonOutputFormatter class works much in the same way, but instead derives from the TextOutputFormatter base class:

internal sealed class TableJsonOutputFormatter : TextOutputFormatter

The constructor looks just like the TableJsonInputFormatter, and instead of a CanReadType method, it has a CanWriteType method that also looks identical.

The WriteResponseBodyAsync serializes a Table object to JSON:

public override Task WriteResponseBodyAsync(
    OutputFormatterWriteContext context,
    Encoding selectedEncoding)
{
    if (context.Object is Table table)
        return context.HttpContext.Response.WriteAsync(table.Serialize(), selectedEncoding);
 
    throw new InvalidOperationException("Expected a Table object.");
}

If context.Object is, in fact, a Table object, the method calls table.Serialize(), which you can also see in the article Serializing restaurant tables in C#. In short, it pattern-matches on the two possible kinds of tables and builds an appropriate abstract syntax tree or document object model that it then serializes to JSON.

Data access #

While the application stores data in SQL Server, it uses no object-relational mapper (ORM). Instead, it simply uses ADO.NET, as also outlined in the article Do ORMs reduce the need for mapping?

At first glance, the Create method looks simple:

public async Task Create(Guid id, Table table)
{
    using var conn = new SqlConnection(connectionString);
    using var cmd = table.Accept(new SqlInsertCommandVisitor(id));
    cmd.Connection = conn;
 
    await conn.OpenAsync().ConfigureAwait(false);
    await cmd.ExecuteNonQueryAsync().ConfigureAwait(false);
}

The main work, however, is done by the nested SqlInsertCommandVisitor class:

private sealed class SqlInsertCommandVisitor(Guid id) : ITableVisitor<SqlCommand>
{
    public SqlCommand VisitCommunal(NaturalNumber capacity)
    {
        const string createCommunalSql = @"
            INSERT INTO [dbo].[Tables] ([PublicId], [Capacity])
            VALUES (@PublicId, @Capacity)";
        var cmd = new SqlCommand(createCommunalSql);
        cmd.Parameters.AddWithValue("@PublicId", id);
        cmd.Parameters.AddWithValue("@Capacity", (int)capacity);
        return cmd;
    }
 
    public SqlCommand VisitSingle(NaturalNumber capacity, NaturalNumber minimalReservation)
    {
        const string createSingleSql = @"
            INSERT INTO [dbo].[Tables] ([PublicId], [Capacity], [MinimalReservation])
            VALUES (@PublicId, @Capacity, @MinimalReservation)";
        var cmd = new SqlCommand(createSingleSql);
        cmd.Parameters.AddWithValue("@PublicId", id);
        cmd.Parameters.AddWithValue("@Capacity", (int)capacity);
        cmd.Parameters.AddWithValue("@MinimalReservation", (int)minimalReservation);
        return cmd;
    }
}

It 'pattern-matches' on the two possible kinds of table and returns an appropriate SqlCommand that the Create method then executes. Notice that no 'Entity' class is needed. The code works straight on SqlCommand.

The same is true for the repository's Read method:

public async Task<Table?> Read(Guid id)
{
    const string readByIdSql = @"
        SELECT [Capacity], [MinimalReservation]
        FROM [dbo].[Tables]
        WHERE[PublicId] = @id";
 
    using var conn = new SqlConnection(connectionString);
    using var cmd = new SqlCommand(readByIdSql, conn);
    cmd.Parameters.AddWithValue("@id", id);
 
    await conn.OpenAsync().ConfigureAwait(false);
    using var rdr = await cmd.ExecuteReaderAsync().ConfigureAwait(false);
    if (!await rdr.ReadAsync().ConfigureAwait(false))
        return null;
 
    var capacity = (int)rdr["Capacity"];
    var mimimalReservation = rdr["MinimalReservation"] as int?;
    if (mimimalReservation is null)
        return Table.TryCreateCommunal(capacity);
    else
        return Table.TryCreateSingle(capacity, mimimalReservation.Value);
}

It works directly on SqlDataReader. Again, no extra 'Entity' class is required. If the data in the database makes sense, the Read method return a well-encapsulated Table object.

XML formats #

That covers the basics, but how well does this kind of architecture stand up to changing requirements?

One axis of variation is when a service needs to support multiple representations. In this example, I'll imagine that the service also needs to support not just one, but two, XML formats.

Granted, you may not run into that particular requirement that often, but it's typical of a kind of change that you're likely to run into. In REST APIs, for example, you should use content negotiation for versioning, and that's the same kind of problem.

To be fair, application code also changes for a variety of other reasons, including new features, changes to business logic, etc. I can't possibly cover all, though, and many of these are much better described than changes in wire formats.

As described in the introduction article, ideally the XML should support a format implied by these examples:

<communal-table>
  <capacity>12</capacity>
</communal-table>

<single-table>
  <capacity>4</capacity>
  <minimal-reservation>3</minimal-reservation>
</single-table>

Notice that while these two examples have different root elements, they're still considered to both represent a table. Although at the boundaries, static types are illusory we may still, loosely speaking, consider both of those XML documents as belonging to the same 'type'.

With both of the previous architectures described in this article series, I've had to give up on this schema. The present data architecture, finally, is able to handle this requirement.

HTTP interactions with element-biased XML #

The service should support the new XML format when presented with the the "application/xml" media type, either as a content-type header or accept header. An initial POST request may look like this:

POST /tables HTTP/1.1
content-type: application/xml

<communal-table><capacity>12</capacity></communal-table>

Which produces a reply like this:

HTTP/1.1 201 Created
Location: https://example.com/Tables/a77ac3fd221e4a5caaca3a0fc2b83ffc

And just like before, a client can later use the address in the Location header to request the resource. By using the accept header, it can indicate that it wishes to receive the reply formatted as XML:

GET /Tables/a77ac3fd221e4a5caaca3a0fc2b83ffc HTTP/1.1
accept: application/xml

Which produces this response with XML content in the body:

HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8

<communal-table><capacity>12</capacity></communal-table>

How do you add support for this new format?

Element-biased XML formatters #

Not surprisingly, you can add support for the new format by adding new formatters.

opts.InputFormatters.Add(new ElementBiasedTableXmlInputFormatter());
opts.OutputFormatters.Add(new ElementBiasedTableXmlOutputFormatter());

Importantly, and in stark contrast to the DTO-based Ports and Adapters example, you don't have to change the existing code to add XML support. If you're concerned about design heuristics such as the Single Responsibility Principle, you may consider this a win. Apart from the two lines of code adding the formatters, all other code to support this new feature is in new classes.

Both of the new formatters support the "application/xml" media type.

Deserializing element-biased XML #

The constructor and CanReadType implementation of ElementBiasedTableXmlInputFormatter is nearly identical to code you've already seen here, so I'll skip the repetition. The ReadRequestBodyAsync implementation is also conceptually similar, but of course differs in the details.

public override async Task<InputFormatterResult> ReadRequestBodyAsync(
    InputFormatterContext context,
    Encoding encoding)
{
    var xml = await XElement
        .LoadAsync(context.HttpContext.Request.Body, LoadOptions.None, CancellationToken.None)
        .ConfigureAwait(false);
 
    var table = TableXml.TryParseElementBiased(xml);
    if (table is { })
        return await InputFormatterResult.SuccessAsync(table).ConfigureAwait(false);
    else
        return await InputFormatterResult.FailureAsync().ConfigureAwait(false);
}

As is also the case with the JSON input formatter, the ReadRequestBodyAsync method really only implements an Adapter over a more specialized parser function:

internal static Table? TryParseElementBiased(XElement xml)
{
    if (xml.Name == "communal-table")
    {
        var capacity = xml.Element("capacity")?.Value;
        if (capacity is { })
        {
            if (int.TryParse(capacity, out var c))
                return Table.TryCreateCommunal(c);
        }
    }
 
    if (xml.Name == "single-table")
    {
        var capacity = xml.Element("capacity")?.Value;
        var minimalReservation = xml.Element("minimal-reservation")?.Value;
        if (capacity is { } && minimalReservation is { })
        {
            if (int.TryParse(capacity, out var c) &&
                int.TryParse(minimalReservation, out var mr))
                return Table.TryCreateSingle(c, mr);
        }
    }
 
    return null;
}

In keeping with the common theme of the Domain Model Only data architecture, it deserialized by examining an Abstract Syntax Tree (AST) or document object model (DOM), specifically making use of the XElement API. This class is really part of the LINQ to XML API, but you'll probably agree that the above code example makes little use of LINQ.

Serializing element-biased XML #

Hardly surprising, turning a Table object into element-biased XML involves steps similar to converting it to JSON. The ElementBiasedTableXmlOutputFormatter class' WriteResponseBodyAsync method contains this implementation:

public override Task WriteResponseBodyAsync(
    OutputFormatterWriteContext context,
    Encoding selectedEncoding)
{
    if (context.Object is Table table)
        return context.HttpContext.Response.WriteAsync(
            table.GenerateElementBiasedXml(),
            selectedEncoding);
 
    throw new InvalidOperationException("Expected a Table object.");
}

Again, the heavy lifting is done by a specialized function:

internal static string GenerateElementBiasedXml(this Table table)
{
    return table.Accept(new ElementBiasedTableVisitor());
}
 
private sealed class ElementBiasedTableVisitor : ITableVisitor<string>
{
    public string VisitCommunal(NaturalNumber capacity)
    {
        var xml = new XElement(
            "communal-table",
            new XElement("capacity", (int)capacity));
        return xml.ToString(SaveOptions.DisableFormatting);
    }
 
    public string VisitSingle(
        NaturalNumber capacity,
        NaturalNumber minimalReservation)
    {
        var xml = new XElement(
            "single-table",
            new XElement("capacity", (int)capacity),
            new XElement("minimal-reservation", (int)minimalReservation));
        return xml.ToString(SaveOptions.DisableFormatting);
    }
}

True to form, GenerateElementBiasedXml assembles an appropriate AST for the kind of table in question, and finally converts it to a string value.

Attribute-biased XML #

I was curious how far I could take this kind of variation, so for the sake of exploration, I invented yet another XML format to support. Instead of making exclusive use of XML elements, this format uses XML attributes for primitive values.

<communal-table capacity="12" />
        
<single-table capacity="4" minimal-reservation="3" />

In order to distinguish this XML format from the other, I invented the vendor media type "application/vnd.ploeh.table+xml". The new formatters only handle this media type.

There's not much new to report. The new formatters work like the previous. In order to parse the new format, a new function does that, still based on XElement:

internal static Table? TryParseAttributeBiased(XElement xml)
{
    if (xml.Name == "communal-table")
    {
        var capacity = xml.Attribute("capacity")?.Value;
        if (capacity is { })
        {
            if (int.TryParse(capacity, out var c))
                return Table.TryCreateCommunal(c);
        }
    }
 
    if (xml.Name == "single-table")
    {
        var capacity = xml.Attribute("capacity")?.Value;
        var minimalReservation = xml.Attribute("minimal-reservation")?.Value;
        if (capacity is { } && minimalReservation is { })
        {
            if (int.TryParse(capacity, out var c) &&
                int.TryParse(minimalReservation, out var mr))
                return Table.TryCreateSingle(c, mr);
        }
    }
 
    return null;
}

Likewise, converting a Table object to this format looks like code you've already seen:

internal static string GenerateAttributeBiasedXml(this Table table)
{
    return table.Accept(new AttributedBiasedTableVisitor());
}
 
private sealed class AttributedBiasedTableVisitor : ITableVisitor<string>
{
    public string VisitCommunal(NaturalNumber capacity)
    {
        var xml = new XElement(
            "communal-table",
            new XAttribute("capacity", (int)capacity));
        return xml.ToString(SaveOptions.DisableFormatting);
    }
 
    public string VisitSingle(
        NaturalNumber capacity,
        NaturalNumber minimalReservation)
    {
        var xml = new XElement(
            "single-table",
            new XAttribute("capacity", (int)capacity),
            new XAttribute("minimal-reservation", (int)minimalReservation));
        return xml.ToString(SaveOptions.DisableFormatting);
    }
}

Consistent with adding the first XML support, I didn't have to touch any of the existing Controller or data access code.

Evaluation #

If you're concerned with separation of concerns, the Domain Model Only architecture gracefully handles variation in external formats without impacting application logic, Domain Model, or data access. You deal with each new format in a consistent and independent manner. The architecture offers the ultimate data representation flexibility, since everything you can write as a stream of bytes you can implement.

Since at the boundary, static types are illusory this architecture is congruent with reality. For a REST service, at least, reality is what goes on the wire. While static types can also be used to model what wire formats look like, there's always a risk that you can use your IDE's refactoring tools to change a DTO in such a way that the code still compiles, but you've now changed the wire format. This could easily break existing clients.

When wire compatibility is important, I test-drive enough self-hosted tests that directly use and verify the wire format to give me a good sense of stability. Without DTO classes, it becomes increasingly important to cover externally visible behaviour with a trustworthy test suite, but really, if compatibility is important, you should be doing that anyway.

It almost goes without saying that a requirement for this architecture is that your chosen web framework supports it. As you've seen here, ASP.NET does, but that's not a given in general. Most web frameworks worth their salt will come with mechanisms that enable you to add new wire formats, but the question is how opinionated such extensibility points are. Do they expect you to work with DTOs, or are they more flexible than that?

You may consider the pure Domain Model Only data architecture too specialized for everyday use. I may do that, too. As I wrote in the introduction article, I don't intent these walk-throughs to be prescriptive. Rather, they explore what's possible, so that you and I have a bigger set of alternatives to choose from.

Hybrid architectures #

In the code base that accompanies Code That Fits in Your Head, I use a hybrid data architecture that I've used for years. ADO.NET for data access, as shown here, but DTOs for external JSON serialization. As demonstrated in the article Using Ports and Adapters to persist restaurant table configurations, using DTOs for the presentation layer may cause trouble if you need to support multiple wire formats. On the other hand, if you don't expect that this is a concern, you may decide to run that risk. I often do that.

When presenting these three architectures to a larger audience, one audience member told me that his team used another hybrid architecture: DTOs for the presentation layer, and separate DTOs for data access, but no Domain Model. I can see how this makes sense in a mostly CRUD-heavy application where nonetheless you need to be able to vary user interfaces independently from the database schema.

Finally, I should point out that the Domain Model Only data architecture is, in reality, also a kind of Ports and Adapters architecture. It just uses more low-level Adapter implementations than you idiomatically see.

Conclusion #

The Domain Model Only data architecture emphasises modelling business logic as a strongly-typed, well-encapsulated Domain Model, while eschewing using statically-typed DTOs for communication with external processes. What I most like about this alternative is that it leaves little confusion as to where functionality goes.

When you have, say, TableDto, Table, and TableEntity classes, you need a sophisticated and mature team to trust all developers to add functionality in the right place. If there's only a single Table Domain Model, it may be more evident to developers that only business logic belongs there, and other concerns ought to be addressed in different ways.

Even so, you may consider all the low-level parsing code not to your liking, and instead decide to use DTOs. I may too, depending on context.

Comments

Jes Hansen #

In this version of the data archictecture, let's suppose that the controller that now accepts a Domain Object directly is part of a larger REST API. How would you handle discoverability of the API, as the usual OpenAPI (Swagger et.al.) tools probably takes offence at this type of request object?

2024-08-19 12:10 UTC

Mark Seemann #

Jes, thank you for writing. If by discoverability you mean 'documentation', I would handle that the same way I usually handle documentation requirements for REST APIs: by writing one or my documents that explain how the API works. If there are other possible uses of OpenAPI than that, and the GUI to perform ad-hoc experiments, I'm going to need to be taken to task, because then I'm not aware of them.

I've recently discussed my general misgivings about OpenAPI, and they apply here as well. I'm aware that other people feel differently about this, and that's okay too.

"the usual OpenAPI (Swagger et.al.) tools probably takes offence at this type of request object"

You may be right, but I haven't tried, so I don't know if this is the case.

2024-08-22 16:55 UTC

This blog is totally free, but if you like it, please consider supporting it.

Using a Shared Data Model to persist restaurant table configurations

2024-08-05T06:14:00+00:00

A data architecture example in C# and ASP.NET.

This is part of a small article series on data architectures. In this, the second instalment, you'll see a common attempt at addressing the mapping issue that I mentioned in the previous article. As the introductory article explains, the example code shows how to create a new restaurant table configuration, or how to display an existing resource. The sample code base is an ASP.NET 8.0 REST API.

Keep in mind that while the sample code does store data in a relational database, the term table in this article mainly refers to physical tables, rather than database tables.

The idea in this data architecture is to use a single, shared data model for each business object in the service. This is in contrast to the Ports and Adapters architecture, where you typically have a Data Transfer Object (DTO) for (JSON or XML) serialization, another class for the Domain Model, and a third to support an object-relational mapper.

An architecture diagram may attempt to illustrate the idea like this:

While ostensibly keeping alive the idea of application layers, data models are allowed to cross layers to be used both for database persistence, business logic, and in the presentation layer.

Data model #

Since the goal is to use a single class to model all application concerns, it means that we also need to use it for database persistence. The most commonly used ORM in .NET is Entity Framework, so I'll use that for the example. It's not something that I normally do, so it's possible that I could have done it better than what follows.

Still, assume that the database schema defines the Tables table like this:

CREATE TABLE [dbo].[Tables] (
    [Id]                 INT                NOT NULL IDENTITY PRIMARY KEY,
    [PublicId]           UNIQUEIDENTIFIER   NOT NULL UNIQUE,
    [Capacity]           INT                NOT NULL,
    [MinimalReservation] INT                NULL
)

I used a scaffolding tool to generate Entity Framework code from the database schema and then modified what it had created. This is the result:

public partial class Table
{
    [JsonIgnore]
    public int Id { get; set; }
 
    [JsonIgnore]
    public Guid PublicId { get; set; }
 
    public string Type => MinimalReservation.HasValue ? "single" : "communal";
 
    public int Capacity { get; set; }
 
    public int? MinimalReservation { get; set; }
}

Notice that I added [JsonIgnore] attributes to two of the properties, since I didn't want to serialize them to JSON. I also added the calculated property Type to include a discriminator in the JSON documents.

HTTP interaction #

A client can create a new table with a POST HTTP request:

POST /tables HTTP/1.1
content-type: application/json

{"type":"communal","capacity":12}

Notice that the JSON document doesn't follow the desired schema described in the introduction article. It can't, because the data architecture is bound to the shared Table class. Or at least, if it's possible to attain the desired format with a single class and only some strategically placed attributes, I'm not aware of it. As the article Using only a Domain Model to persist restaurant table configurations will show, it is possible to attain that goal with the appropriate middleware, but I consider doing that to be an example of the third architecture, so not something I will cover in this article.

The service will respond to the above request like this:

HTTP/1.1 201 Created
Location: https://example.com/Tables/777779466d2549d69f7e30b6c35bde3c

Clients can later use the address indicated by the Location header to retrieve a representation of the resource:

GET /Tables/777779466d2549d69f7e30b6c35bde3c HTTP/1.1
accept: application/json

Which elicits this response:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{"type":"communal","capacity":12}

The JSON format still doesn't conform to the desired format because the Controller in question deals exclusively with the shared Table data model.

Boundary #

At the boundary of the application, Controllers handle HTTP requests with action methods (an ASP.NET term). The framework matches requests by a combination of naming conventions and attributes. The Post action method handles incoming POST requests.

[HttpPost]
public async Task<IActionResult> Post(Table table)
{
    var id = Guid.NewGuid();
    await repository.Create(id, table).ConfigureAwait(false);
 
    return new CreatedAtActionResult(
        nameof(Get),
        null,
        new { id = id.ToString("N") },
        null);
}

Notice that the input parameter isn't a separate DTO, but rather the shared Table object. Since it's shared, the Controller can pass it directly to the repository without any mapping.

The same simplicity is on display in the Get method:

[HttpGet("{id}")]
public async Task<IActionResult> Get(string id)
{
    if (!Guid.TryParseExact(id, "N", out var guid))
        return new BadRequestResult();
    Table? table = await repository.Read(guid).ConfigureAwait(false);
    if (table is null)
        return new NotFoundResult();
    return new OkObjectResult(table);
}

Once the Get method has parsed the id it goes straight to the repository, retrieves the table and returns it if it's there. No mapping is required by the Controller. What about the repository?

Data access #

The SqlTablesRepository class reads and writes data from SQL Server using Entity Framework. The Create method is as simple as this:

public async Task Create(Guid id, Table table)
{
    ArgumentNullException.ThrowIfNull(table);
 
    table.PublicId = id;
    await context.Tables.AddAsync(table).ConfigureAwait(false);
    await context.SaveChangesAsync().ConfigureAwait(false);
}

The Read method is even simpler - a one-liner:

public async Task<Table?> Read(Guid id)
{
    return await context.Tables
        .SingleOrDefaultAsync(t => t.PublicId == id).ConfigureAwait(false);
}

Again, no mapping. Just return the database Entity.

XML serialization #

Simple, so far, but how does this data architecture handle changing requirements?

One axis of variation is when a service needs to support multiple representations. In this example, I'll imagine that the service also needs to support XML.

As was also the case in the previous article, it quickly turns out that it's not possible to support any of the desired XML formats described in the introduction article. Instead, for the sake of exploring what is possible, I'll compromise and support XML documents like these examples:

<table>
  <type>communal</type>
  <capacity>12</capacity>
</table>

<table>
  <type>single</type>
  <capacity>4</capacity>
  <minimal-reservation>3</minimal-reservation>
</table>

This schema, it turns out, is the same as the element-biased format from the previous article. I could, instead, have chosen to support the attribute-biased format, but, because of the shared data model, not both.

Notice how using statically typed classes, attributes, and Reflection to guide serialization leads toward certain kinds of formats. You can't easily support any arbitrary JSON or XML schema, but are rather nudged into a more constrained subset of possible formats. There's nothing too bad about this. As usual, there are trade-offs involved. You concede flexibility, but gain convenience: Just slab some attributes on your DTO, and it works well enough for most purposes. I mostly point it out because this entire article series is about awareness of choice. There's always some cost to be paid.

That said, supporting that XML format is surprisingly easy:

[XmlRoot("table")]
public partial class Table
{
    [JsonIgnore, XmlIgnore]
    public int Id { get; set; }
 
    [JsonIgnore, XmlIgnore]
    public Guid PublicId { get; set; }
 
    [XmlElement("type"), NotMapped]
    public string? Type { get; set; }
 
    [XmlElement("capacity")]
    public int Capacity { get; set; }
 
    [XmlElement("minimal-reservation")]
    public int? MinimalReservation { get; set; }
 
    public bool ShouldSerializeMinimalReservation() =>
        MinimalReservation.HasValue;
 
    internal void InferType()
    {
        Type = MinimalReservation.HasValue ? "single" : "communal";
    }
}

Most of the changes are simple additions of the XmlRoot, XmlElement, and XmlIgnore attributes. In order to serialize the <type> element, however, I also had to convert the Type property to a read/write property, which had some ripple effects.

For one, I had to add the NotMapped attribute to tell Entity Framework that it shouldn't try to save the value of that property in the database. As you can see in the above SQL schema, the Tables table has no Type column.

This also meant that I had to change the Read method in SqlTablesRepository to call the new InferType method:

public async Task<Table?> Read(Guid id)
{
    var table = await context.Tables
        .SingleOrDefaultAsync(t => t.PublicId == id).ConfigureAwait(false);
    table?.InferType();
    return table;
}

I'm not happy with this kind of sequential coupling, but to be honest, this data architecture inherently has an appalling lack of encapsulation. Having to call InferType is just par for the course.

That said, despite a few stumbling blocks, adding XML support turned out to be surprisingly easy in this data architecture. Granted, I had to compromise on the schema, and could only support one XML schema, so we shouldn't really take this as an endorsement. To paraphrase Gerald Weinberg, if it doesn't have to work, it's easy to implement.

Evaluation #

There's no denying that the Shared Data Model architecture is simple. There's no mapping between different layers, and it's easy to get started. Like the DTO-based Ports and Adapters architecture, you'll find many documentation examples and getting-started guides that work like this. In a sense, you can say that it's what the ASP.NET framework, or, perhaps particularly the Entity Framework (EF), 'wants you to do'. To be fair, I find ASP.NET to be reasonably unopinionated, so what inveigling you may run into may be mostly attributable to EF.

While it may feel nice that it's easy to get started, instant gratification often comes at a cost. Consider the Table class shown here. Because of various constraints imposed by EF and the JSON and XML serializers, it has no encapsulation. One thing is that the sophisticated Visitor-encoded Table class introduced in the article Serializing restaurant tables in C# is completely out of the question, but you can't even add a required constructor like this one:

public Table(int capacity)
{
    Capacity = capacity;
}

Granted, it seems to work with both EF and the JSON serializer, which I suppose is a recent development, but it doesn't work with the XML serializer, which requires that

"A class must have a parameterless constructor to be serialized by XmlSerializer."

XML serialization, Microsoft documentation, 2023-04-05, retrieved 2024-07-27, their emphasis

Even if this, too, changes in the future, DTO-based designs are at odds with encapsulation. If you doubt the veracity of that statement, I challenge you to complete the Priority Collection kata with serializable DTOs.

Another problem with the Shared Data Model architecture is that it so easily decays to a Big Ball of Mud. Even though the above architecture diagram hollowly insists that layering is still possible, a Shared Data Model is an attractor of behaviour. You'll soon find that a class like Table has methods that serve presentation concerns, others that implement business logic, and others again that address persistence issues. It has become a God Class.

From these problems it doesn't follow that the architecture doesn't have merit. If you're developing a CRUD-heavy application with a user interface (UI) that's merely a glorified table view, this could be a good choice. You would be coupling the UI to the database, so that if you need to change how the UI works, you might also have to modify the database schema, or vice versa.

This is still not an indictment, but merely an observation of consequences. If you can live with them, then choose the Shared Data Model architecture. I can easily imagine application types where that would be the case.

Conclusion #

In the Shared Data Model architecture you use a single model (here, a class) to handle all application concerns: UI, business logic, data access. While this shows a blatant disregard for the notion of separation of concerns, no law states that you must, always, separate concerns.

Sometimes it's okay to mix concerns, and then the Shared Data Model architecture is dead simple. Just make sure that you know when it's okay.

While this architecture is the ultimate in simplicity, it's also quite constrained. The third and final data architecture I'll cover, on the other hand, offers the ultimate in flexibility, at the expense (not surprisingly) of some complexity.

Next: Using only a Domain Model to persist restaurant table configurations.

This blog is totally free, but if you like it, please consider supporting it.

Using Ports and Adapters to persist restaurant table configurations

2024-07-29T08:05:00+00:00

A data architecture example in C# and ASP.NET.

This is part of a small article series on data architectures. In the first instalment, you'll see the outline of a Ports and Adapters implementation. As the introductory article explains, the example code shows how to create a new restaurant table configuration, or how to display an existing resource. The sample code base is an ASP.NET 8.0 REST API.

Keep in mind that while the sample code does store data in a relational database, the term table in this article mainly refers to physical tables, rather than database tables.

While Ports and Adapters architecture diagrams are usually drawn as concentric circles, you can also draw (subsets of) it as more traditional layered diagrams:

Here, the arrows indicate mappings, not dependencies.

HTTP interaction #

A client can create a new table with a POST HTTP request:

POST /tables HTTP/1.1
content-type: application/json

{ "communalTable": { "capacity": 16 } }

Which might elicit a response like this:

HTTP/1.1 201 Created
Location: https://example.com/Tables/844581613e164813aa17243ff8b847af

Clients can later use the address indicated by the Location header to retrieve a representation of the resource:

GET /Tables/844581613e164813aa17243ff8b847af HTTP/1.1
accept: application/json

Which would result in this response:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{"communalTable":{"capacity":16}}

By default, ASP.NET handles and returns JSON. Later in this article you'll see how well it deals with other data formats.

Boundary #

ASP.NET supports some variation of the model-view-controller (MVC) pattern, and Controllers handle HTTP requests. At the outset, the action method that handles the POST request looks like this:

[HttpPost]
public async Task<IActionResult> Post(TableDto dto)
{
    ArgumentNullException.ThrowIfNull(dto);
 
    var id = Guid.NewGuid();
    await repository.Create(id, dto.ToTable()).ConfigureAwait(false);
 
    return new CreatedAtActionResult(nameof(Get), null, new { id = id.ToString("N") }, null);
}

As is idiomatic in ASP.NET, input and output are modelled by data transfer objects (DTOs), in this case called TableDto. I've already covered this little object model in the article Serializing restaurant tables in C#, so I'm not going to repeat it here.

The ToTable method, on the other hand, is a good example of how trying to cut corners lead to more complicated code:

internal Table ToTable()
{
    var candidate =
        Table.TryCreateSingle(SingleTable?.Capacity ?? -1, SingleTable?.MinimalReservation ?? -1);
    if (candidate is { })
        return candidate.Value;
 
    candidate = Table.TryCreateCommunal(CommunalTable?.Capacity ?? -1);
    if (candidate is { })
        return candidate.Value;
 
    throw new InvalidOperationException("Invalid TableDto.");
}

Compare it to the TryParse method in the Serializing restaurant tables in C# article. That one is simpler, and less error-prone.

I think that I wrote ToTable that way because I didn't want to deal with error handling in the Controller, and while I test-drove the code, I never wrote a test that supply malformed input. I should have, and so should you, but this is demo code, and I never got around to it.

Enough about that. The other action method handles GET requests:

[HttpGet("{id}")]
public async Task<IActionResult> Get(string id)
{
    if (!Guid.TryParseExact(id, "N", out var guid))
        return new BadRequestResult();
    var table = await repository.Read(guid).ConfigureAwait(false);
    if (table is null)
        return new NotFoundResult();
    return new OkObjectResult(TableDto.From(table.Value));
}

The static TableDto.From method is identical to the ToDto method from the Serializing restaurant tables in C# article, just with a different name.

To summarize so far: At the boundary of the application, Controller methods receive or return TableDto objects, which are mapped to and from the Domain Model named Table.

Domain Model #

The Domain Model Table is also identical to the code shown in Serializing restaurant tables in C#. In order to comply with the Dependency Inversion Principle (DIP), mapping to and from TableDto is defined on the latter. The DTO, being an implementation detail, may depend on the abstraction (the Domain Model), but not the other way around.

In the same spirit, conversions to and from the database are defined entirely within the repository implementation.

Data access layer #

Keeping the example consistent, the code base also models data access with C# classes. It uses Entity Framework to read from and write to SQL Server. The class that models a row in the database is also a kind of DTO, even though here it's idiomatically called an entity:

public partial class TableEntity
{
    public int Id { get; set; }
 
    public Guid PublicId { get; set; }
 
    public int Capacity { get; set; }
 
    public int? MinimalReservation { get; set; }
}

I had a command-line tool scaffold the code for me, and since I don't usually live in that world, I don't know why it's a partial class. It seems to be working, though.

The SqlTablesRepository class implements the mapping between Table and TableEntity. For instance, the Create method looks like this:

public async Task Create(Guid id, Table table)
{
    var entity = table.Accept(new TableToEntityConverter(id));
    await context.Tables.AddAsync(entity).ConfigureAwait(false);
    await context.SaveChangesAsync().ConfigureAwait(false);
}

That looks simple, but is only because all the work is done by the TableToEntityConverter, which is a nested class:

private sealed class TableToEntityConverter : ITableVisitor<TableEntity>
{
    private readonly Guid id;
 
    public TableToEntityConverter(Guid id)
    {
        this.id = id;
    }
 
    public TableEntity VisitCommunal(NaturalNumber capacity)
    {
        return new TableEntity
        {
            PublicId = id,
            Capacity = (int)capacity,
        };
    }
 
    public TableEntity VisitSingle(
        NaturalNumber capacity,
        NaturalNumber minimalReservation)
    {
        return new TableEntity
        {
            PublicId = id,
            Capacity = (int)capacity,
            MinimalReservation = (int)minimalReservation,
        };
    }
}

Mapping the other way is easier, so the SqlTablesRepository does it inline in the Read method:

public async Task<Table?> Read(Guid id)
{
    var entity = await context.Tables
        .SingleOrDefaultAsync(t => t.PublicId == id).ConfigureAwait(false);
    if (entity is null)
        return null;
 
    if (entity.MinimalReservation is null)
        return Table.TryCreateCommunal(entity.Capacity);
    else
        return Table.TryCreateSingle(
            entity.Capacity,
            entity.MinimalReservation.Value);
}

Similar to the case of the DTO, mapping between Table and TableEntity is the responsibility of the SqlTablesRepository class, since data persistence is an implementation detail. According to the DIP it shouldn't be part of the Domain Model, and it isn't.

XML formats #

That covers the basics, but how well does this kind of architecture stand up to changing requirements?

One axis of variation is when a service needs to support multiple representations. In this example, I'll imagine that the service also needs to support not just one, but two, XML formats.

As described in the introduction article, ideally the XML should support a format implied by these examples:

<communal-table>
  <capacity>12</capacity>
</communal-table>

<single-table>
  <capacity>4</capacity>
  <minimal-reservation>3</minimal-reservation>
</single-table>

To be honest, if there's a way to support this kind of schema by defining DTOs to be serialized and deserialized, I don't know what it looks like. That's not meant to imply that it's impossible. There's often an epistemological problem associated with proving things impossible, so I'll just leave it there.

To be clear, it's not that I don't know how to support that kind of schema at all. I do, as the article Using only a Domain Model to persist restaurant table configurations will show. I just don't know how to do it with DTO-based serialisation.

Element-biased XML #

Instead of the above XML schema, I will, instead explore how hard it is to support a variant schema, implied by these two examples:

<table>
  <type>communal</type>
  <capacity>12</capacity>
</table>

<table>
  <type>single</type>
  <capacity>4</capacity>
  <minimal-reservation>3</minimal-reservation>
</table>

This variation shares the same <table> root element and instead distinguishes between the two kinds of table with a <type> discriminator.

This kind of schema we can define with a DTO:

[XmlRoot("table")]
public class ElementBiasedTableXmlDto
{
    [XmlElement("type")]
    public string? Type { get; set; }
 
    [XmlElement("capacity")]
    public int Capacity { get; set; }
 
    [XmlElement("minimal-reservation")]
    public int? MinimalReservation { get; set; }
 
    public bool ShouldSerializeMinimalReservation() =>
        MinimalReservation.HasValue;
 
    // Mapping methods not shown here...
}

As you may have already noticed, however, this isn't the same type as TableJsonDto, so how are we going to implement the Controller methods that receive and send objects of this type?

Posting XML #

The service should still accept JSON as shown above, but now, additionally, it should also support HTTP requests like this one:

POST /tables HTTP/1.1
content-type: application/xml

<table><type>communal</type><capacity>12</capacity></table>

How do you implement this new feature?

My first thought was to add a Post overload to the Controller:

[HttpPost]
public async Task<IActionResult> Post(ElementBiasedTableXmlDto dto)
{
    ArgumentNullException.ThrowIfNull(dto);
 
    var id = Guid.NewGuid();
    await repository.Create(id, dto.ToTable()).ConfigureAwait(false);
 
    return new CreatedAtActionResult(
        nameof(Get),
        null,
        new { id = id.ToString("N") },
        null);
}

I just copied and pasted the original Post method and changed the type of the dto parameter. I also had to add a ToTable conversion to ElementBiasedTableXmlDto:

internal Table ToTable()
{
    if (Type == "single")
    {
        var t = Table.TryCreateSingle(Capacity, MinimalReservation ?? 0);
        if (t is { })
            return t.Value;
    }
 
    if (Type == "communal")
    {
        var t = Table.TryCreateCommunal(Capacity);
        if (t is { })
            return t.Value;
    }
 
    throw new InvalidOperationException("Invalid Table DTO.");
}

While all of that compiles, it doesn't work.

When you attempt to POST a request against the service, the ASP.NET framework now throws an AmbiguousMatchException indicating that "The request matched multiple endpoints". Which is understandable.

This lead me to the first round of Framework Whac-A-Mole. What I'd like to do is to select the appropriate action method based on content-type or accept headers. How does one do that?

After some web searching, I came across a Stack Overflow answer that seemed to indicate a way forward.

Selecting the right action method #

One way to address the issue is to implement a custom ActionMethodSelectorAttribute:

public sealed class SelectTableActionMethodAttribute : ActionMethodSelectorAttribute
{
    public override bool IsValidForRequest(RouteContext routeContext, ActionDescriptor action)
    {
        if (action is not ControllerActionDescriptor cad)
            return false;
 
        if (cad.Parameters.Count != 1)
            return false;
        var dtoType = cad.Parameters[0].ParameterType;
 
        // Demo code only. This doesn't take into account a possible charset
        // parameter. See RFC 9110, section 8.3
        // (https://www.rfc-editor.org/rfc/rfc9110#field.content-type) for more
        // information.
        if (routeContext?.HttpContext.Request.ContentType == "application/json")
            return dtoType == typeof(TableJsonDto);
        if (routeContext?.HttpContext.Request.ContentType == "application/xml")
            return dtoType == typeof(ElementBiasedTableXmlDto);
 
        return false;
    }
}

As the code comment suggests, this isn't as robust as it should be. A content-type header may also look like this:

Content-Type: application/json; charset=utf-8

The exact string equality check shown above would fail in such a scenario, suggesting that a more sophisticated implementation is warranted. I'll skip that for now, since this demo code already compromises on the overall XML schema. For an example of more robust content negotiation implementations, see Using only a Domain Model to persist restaurant table configurations.

Adorn both Post action methods with this custom attribute, and the service now handles both formats:

[HttpPost, SelectTableActionMethod]
public async Task<IActionResult> Post(TableJsonDto dto)
    // ...
 
[HttpPost, SelectTableActionMethod]
public async Task<IActionResult> Post(ElementBiasedTableXmlDto dto)
    // ...

While that handles POST requests, it doesn't implement content negotiation for GET requests.

Getting XML #

In order to GET an XML representation, clients can supply an accept header value:

GET /Tables/153f224c91fb4403988934118cc14024 HTTP/1.1
accept: application/xml

which will reply with

HTTP/1.1 200 OK
Content-Length: 59
Content-Type: application/xml; charset=utf-8

<table><type>communal</type><capacity>12</capacity></table>

How do we implement that?

Keep in mind that since this data-architecture variation uses two different DTOs to model JSON and XML, respectively, an action method can't just return an object of a single type and hope that the ASP.NET framework takes care of the rest. Again, I'm aware of middleware that'll deal nicely with this kind of problem, but not in this architecture; see Using only a Domain Model to persist restaurant table configurations for such a solution.

The best I could come up with, given the constraints I'd imposed on myself, then, was this:

[HttpGet("{id}")]
public async Task<IActionResult> Get(string id)
{
    if (!Guid.TryParseExact(id, "N", out var guid))
        return new BadRequestResult();
    var table = await repository.Read(guid).ConfigureAwait(false);
    if (table is null)
        return new NotFoundResult();
 
    // Demo code only. This doesn't take into account quality values.
    var accept =
        httpContextAccessor?.HttpContext?.Request.Headers.Accept.ToString();
    if (accept == "application/json")
        return new OkObjectResult(TableJsonDto.From(table.Value));
    if (accept == "application/xml")
        return new OkObjectResult(ElementBiasedTableXmlDto.From(table.Value));
 
    return new StatusCodeResult((int)HttpStatusCode.NotAcceptable);
}

As the comment suggests, this is once again code that barely passes the few tests that I have, but really isn't production-ready. An accept header may also look like this:

accept: application/xml; q=1.0,application/json; q=0.5

Given such an accept header, the service ought to return an XML representation with the application/xml content type, but instead, this Get method returns 406 Not Acceptable.

As I've already outlined, I'm not going to fix this problem, as this is only an exploration. It seems that we can already conclude that this style of architecture is ill-suited to deal with this kind of problem. If that's the conclusion, then why spend time fixing outstanding problems?

Attribute-biased XML #

Even so, just to punish myself, apparently, I also tried to add support for an alternative XML format that use attributes to record primitive values. Again, I couldn't make the schema described in the introductory article work, but I did manage to add support for XML documents like these:

<table type="communal" capacity="12" />

<table type="single" capacity="4" minimal-reservation="3" />

The code is similar to what I've already shown, so I'll only list the DTO:

[XmlRoot("table")]
public class AttributeBiasedTableXmlDto
{
    [XmlAttribute("type")]
    public string? Type { get; set; }
 
    [XmlAttribute("capacity")]
    public int Capacity { get; set; }
 
    [XmlAttribute("minimal-reservation")]
    public int MinimalReservation { get; set; }
 
    public bool ShouldSerializeMinimalReservation() => 0 < MinimalReservation;
 
    // Mapping methods not shown here...
}

This DTO looks a lot like the ElementBiasedTableXmlDto class, only it adorns properties with XmlAttribute rather than XmlElement.

Evaluation #

Even though I had to compromise on essential goals, I wasted an appalling amount of time and energy on yak shaving and Framework Whac-A-Mole. The DTO-based approach to modelling external resources really doesn't work when you need to do substantial content negotiation.

Even so, a DTO-based Ports and Adapters architecture may be useful when that's not a concern. If, instead of a REST API, you're developing a web site, you'll typically not need to vary representation independently of resource. In other words, a web page is likely to have at most one underlying model.

Compared to other large frameworks I've run into, ASP.NET is fairly unopinionated. Even so, the idiomatic way to use it is based on DTOs. DTOs to represent external data. DTOs to represent UI components. DTOs to represent database rows (although they're often called entities in that context). You'll find a ton of examples using this data architecture, so it's incredibly well-described. If you run into problems, odds are that someone has blazed a trail before you.

Even outside of .NET, this kind of architecture is well-known. While I've learned a thing or two from experience, I've picked up a lot of my knowledge about software architecture from people like Martin Fowler and Robert C. Martin.

When you also apply the Dependency Inversion Principle, you'll get good separations of concerns. This aspect of Ports and Adapters is most explicitly described in Clean Architecture. For example, a change to the UI generally doesn't affect the database. You may find that example ridiculous, because why should it, but consult the article Using a Shared Data Model to persist restaurant table configurations to see how this may happen.

The major drawbacks of the DTO-based data architecture is that much mapping is required. With three different DTOs (e.g. JSON DTO, Domain Model, and ORM Entity), you need four-way translation as indicated in the above figure. People often complain about all that mapping, and no: ORMs don't reduce the need for mapping.

Another problem is that this style of architecture is complicated. As I've argued elsewhere, Ports and Adapters often constitute an unstable equilibrium. While you can make it work, it requires a level of sophistication and maturity among team members that is not always present. And when it goes wrong, it may quickly deteriorate into a Big Ball of Mud.

Conclusion #

A DTO-based Ports and Adapters architecture is well-described and has good separation of concerns. In this article, however, we've seen that it doesn't deal successfully with realistic content negotiation. While that may seem like a shortcoming, it's a drawback that you may be able to live with. Particularly if you don't need to do content negotiation at all.

This way of organizing code around data is quite common. It's often the default data architecture, and I sometimes get the impression that a development team has 'chosen' to use it without considering alternatives.

It's not a bad architecture despite evidence to the contrary in this article. The scenario examined here may not be relevant. The main drawback of having all these objects playing different roles is all the mapping that's required.

The next data architecture attempts to address that concern.

Next: Using a Shared Data Model to persist restaurant table configurations.

This blog is totally free, but if you like it, please consider supporting it.

Three data architectures for the server

2024-07-25T18:30:00+00:00

A comparison, for educational purposes.

Use the right tool for the job. How often have you encountered that phrase when discussing software architecture?

There's nothing wrong with the sentiment per se, but it's almost devoid of meaning. It doesn't pass the 'not test'. Try to negate it and imagine if anyone would seriously hold that belief: Don't use the right tool for the job, said no-one ever.

Even so, the underlying idea is that there are better and worse ways to solve problems. In software architecture too. It follows that you should choose the better solution.

How to do that requires skill and experience. When planning a good software architecture, an important consideration is how it'll handle future requirements. This seems to indicate that an architect should be able to predict the future in order to pick the best architecture. Which is, in general, not possible. Predicting the future is not the topic of this article.

There is, however, a more practical issue related to the notion of using the right tool for the job. One that we can address.

Choice #

In order to choose the better solution, you need to be aware of alternatives. You can't choose if there's nothing to choose from. This seems obvious, but a flowchart may drive home the point in an even stronger fashion.

On the other hand, if you have options, you're now in a position to choose.

In order to make a decision, you must be able to identify alternatives. This is hardly earth-shattering, but perhaps a bit abstract. To make it concrete, in this article, I'll look at a particular example.

Default data architecture #

Many applications need some sort of persistent storage. Particularly when it comes to (relational) database-based systems, I've seen more than one organization defaulting to a single data architecture: A presentation layer with View Models, a business logic layer with Domain Models, and a data access layer with ORM objects. A few decades ago, you'd typically see that model illustrated with horizontal layers. This is no longer en vogue. Today, most organizations that I consult with will tell me that they've decided on Ports and Adapters. Even so, if you do it right, it's the same architecture.

Reusing a diagram from a recent article, we may draw it like this:

The architect or senior developer who made that decision is obviously aware of some of the lore in the industry. He or she can often name that data architecture as either Ports and Adapters, Hexagonal Architecture, Clean Architecture, or, more rarely, Onion Architecture.

I still get the impression that this way of arranging code was chosen by default, without much deliberation. I see it so often that it strikes me as a 'default architecture'. Are architects aware of alternatives? Can they compare the benefits and drawbacks of each alternative?

Three alternatives #

As an example, I'll explore three alternative data architectures, one of them being Ports and Adapters. My goal with this is only to raise awareness. Since I rarely (if ever) see my customers use anything other than Ports and Adapters, I think some readers may benefit from seeing some alternatives.

I'll show three ways to organize data with code, but that doesn't imply that these are the only three options. At the very least, some hybrid combinations are also possible. It's also possible that a fourth or fifth alternative exists, and I'm just not aware of it.

In three articles, you'll see each data architecture explored in more detail.

As the titles suggest, all three examples will attempt to address the same problem: How to persist restaurant table configuration for a restaurant. The scenario is the same as already outlined in the article Serialization with and without Reflection, and the example code base also attempts to follow the external data format of those articles.

Data formats #

In JSON, a table may be represented like this:

{
  "singleTable": {
    "capacity": 16,
    "minimalReservation": 10
  }
}

Or like this:

{ "communalTable": { "capacity": 10 } }

But I'll also explore what happens if you need to support multiple external formats, such as XML. Generally speaking, a given XML specification may lean towards favouring a verbose style based on elements, or a terser style based on attributes. An example of the former could be:

<communal-table>
  <capacity>12</capacity>
</communal-table>

<single-table>
  <capacity>4</capacity>
  <minimal-reservation>3</minimal-reservation>
</single-table>

while examples of the latter style include

<communal-table capacity="12" />

and

<single-table capacity="4" minimal-reservation="3" />

As it turns out, only one of the three data architectures is flexible enough to fully address such requirements.

Comparisons #

A REST API is the kind of application where data representation flexibility is most likely to be an issue. Thus, that only one of the three alternative architectures is able to exhibit enough expressive power in that dimension doesn't disqualify the other two. Each come with their own benefits and drawbacks.

	Ports and Adapters	Shared Data Model	Domain Model only
Advantages	Separation of concerns Well-described	Simple No mapping	Flexible Congruent with reality
Disadvantages	Much mapping Easy to get wrong	Inflexible God Class attractor	Requires non-opinionated framework Requires more testing

I'll discuss each alternative's benefits and drawbacks in their individual articles.

An important point of all this is that none of these articles are meant to be prescriptive. While I do have favourites, my biases are shaped by the kind of work I typically do. In other contexts, another alternative may prevail.

Example code #

As usual, example code is in C#. Of the three languages in which I'm most proficient (the other two being F# and Haskell), this is the most easily digestible for a larger audience.

All three alternatives are written with ASP.NET 8.0, and it's unavoidable that there will be some framework-specific details. In Code That Fits in Your Head, I made it an explicit point that while the examples in the book are in C#, the book (and the code in it) should be understandable by developers who normally use Java, C++, TypeScript, or similar C-based languages.

The book is, for that reason, light on .NET-specific details. Instead, I published an article that collects all the interesting .NET things I ran into while writing the book.

Not so here. The three articles cover enough ASP.NET particulars that readers who don't care about that framework are encouraged to skim-read.

I've developed the three examples as three branches of the same Git repository. The code is available upon request against a small support donation of 10 USD (or more). If you're one of my regular supporters, you have my gratitude and can get the code without further donation. Send me an email in both cases.

Conclusion #

There's more than one way to organize a code base to deal with data. Depending on context, one may be a better choice than another. Thus, it pays to be aware of alternatives.

In the remaining articles in this series, you'll see three examples of how to deal with persistent data from a database. In order to establish a baseline, the first covers the well-known Ports and Adapters architecture.

Next: Using Ports and Adapters to persist restaurant table configurations.

This blog is totally free, but if you like it, please consider supporting it.

The end of trust?

2024-07-15T19:07:00+00:00

Software development in a globalized, hostile world.

Imagine that you're perusing the thriller section in an airport book store and come across a book with the following back cover blurb:

Programmers are dying.

Holly-Ann Kerr works as a data scientist for an NGO that fights workplace discrimination. While scrubbing input, she discovers an unusual pattern in the data. Some employees seem to have an unusually high fatal accident rate. Programmers are dying in traffic accidents, falling on stairs, defect electrical wiring, smoking in bed. They work for a variety of companies. Some for Big Tech, others for specialized component vendors, some for IT-related NGOs, others again for utility companies. The deaths seem to have nothing in common, until H-A uncovers a disturbing pattern.

All victims had recently started in a new position. And all were of Iranian descent.

Is a racist killer on the loose? But if so, why is he only targeting new hires? And why only software developers?

When H-A shares her discovery with the wrong people, she soon discovers that she'll be the next victim.

Okay, I'm not a professional editor, so this could probably do with a bit of polish. Does it sound like an exiting piece of fiction, though?

I'm going to spoil the plot, since the book doesn't exist anyway.

An international plot #

(Apologies to Iranian readers. I have nothing against Iranians, but find the regime despicable. In any case, nothing in the following hinges on the ICC. You can replace it with another adversarial intelligence agency that you don't like, including, but not limited to RGB, FSB, or a clandestine Chinese intelligence organization. You could probably even swap the roles and make CIA, MI5, or Mossad be the bad guys, if your loyalties lie elsewhere.)

In the story, it turns out that clandestine Iranian special operations are attempting to recruit moles in software organizations that constitute the supply chain of Western digital infrastructure.

Intelligence bureaus and software organizations that directly develop sensitive software tend to have good security measures. Planting a mole in such an organization is difficult. The entire supply chain of software dependencies, on the other hand, is much more vulnerable. If you can get an employee to install a backdoor in left-pad, chances are that you may attain remote execution capabilities on an ostensibly secure system.

In my hypothetical thriller, the Iranians kill those software developers that they fail to recruit. After all, one can't run a clandestine operation if people notify the police that they've been approached by a foreign power.

Long game #

Does that plot sound far-fetched?

I admit that I did turn to 11 some plot elements. This is, after all, supposed to be a thriller.

The story is, however, 'loosely based on real events'. Earlier this year, a Microsoft developer revealed a backdoor that someone had intentionally planted in xz Utils. That version of the software was close to being merged into Debian and Red Hat Linux distributions. It would have enabled an attacker to execute arbitrary code on an infected machine.

The attack was singularly sophisticated. It also looks as though it was initiated years ago by one or more persons who contributed real, useful work to an open-source project, apparently (in hindsight) with the sole intention of gaining the trust of the rest of the community.

This is such a long game that it reeks of an adversarial state actor. The linked article speculates on which foreign power may be behind the attack. No, not the Iranians, after all.

If you think about it, it's an entirely rational gambit for a foreign intelligence agency to make. It's not that the NSA hasn't already tried something comparable. If anything, the xz hack mostly seems far-fetched because it's so unnecessarily sophisticated.

Usually, the most effective hacking techniques utilize human trust or gullibility. Why spend enormous effort developing sophisticated buffer overrun exploits if you can get a (perhaps unwitting) insider to run arbitrary code for you?

It'd be much cheaper, and much more reliable, to recruit moles on the inside of software companies, and get them to add the backdoors you need. It doesn't necessary have to be new hires, but perhaps (I'm speculating) it's easier to recruit people before they've developed any loyalty to their new team mates.

The soft underbelly #

Which software organizations are the most promising targets? If it were me, I'd particularly try to go after various component vendors. One category may be companies that produce RAD tools such as grid GUIs, but also service providers that offer free SDKs to, say, send email, generate invoices, send SMS, charge credit cards, etc.

I'm not implying that any such company has ill intent, but since such software run on many machines, it's a juicy target if you can sneak a backdoor into one.

Why not open-source software (OSS)? Many OSS libraries run on even more machines, so wouldn't that be an even more attractive target for an adversary? Yes, but on the other hand, most popular open-source code is also scrutinized by many independent agents, so it's harder to sneak in a backdoor. As the attempted xz hack demonstrates, even a year-long sophisticated attack is at risk of being discovered.

Doesn't commercial or closed-source code receive the same level of scrutiny?

In my experience, not always. Of course, some development organizations use proper shared-code-ownership techniques like code reviews or pair programming, but others rely on siloed solo development. Programmers just check in code that no-one else ever looks at.

In such an organization, imagine how easy it'd be for a mole to add a backdoor to a widely-distributed library. He or she wouldn't even have to resort to sophisticated ways to obscure the backdoor, because no colleague would be likely to look at the code. Particularly not if you bury it in seven levels of nested for loops and call the class MonitorManager or similar. As long as the reusable library ships as compiled code, it's unlikely that customers will discover the backdoor before its too late.

Trust #

Last year I published an article on trust in software development. The point of that piece wasn't that you should suspect your colleagues of ill intent, but rather that you can trust neither yourself nor your co-workers for the simple reason that people make mistakes.

Since then, I've been doing some work in the digital security space, and I've been forced to think about concerns like supply-chain attacks. The implications are, unfortunately, that you can't automatically trust that your colleague has benign intentions.

This, obviously, will vary with context. If you're only writing a small web site for your HR department to use, it's hard to imagine how an adversarial state actor could take advantage of a backdoor in your code. If so, it's unlikely that anyone will go to the trouble of planting a mole in your organization.

On the other hand, if you're writing any kind of reusable library or framework, you just might be an interesting target. If so, you can no longer entirely trust your team mates.

As a Dane, that bothers me deeply. Denmark, along with the other Nordic countries, exhibit the highest levels of inter-societal trust in the world. I was raised to trust strangers, and so far, it's worked well enough for me. A business transaction in Denmark is often just a short email exchange. It's a great benefit to the business environment, and the economy in general, that we don't have to waste a lot of resources filling out formulas, contracts, agreements, etc. Trust is grease that makes society run smoother.

Even so, Scandinavians aren't naive. We don't believe that we can trust everyone. To a large degree, we rely on a lot of subtle social cues to assess a given situation. Some people shouldn't be trusted, and we're able to identify those situations, too.

What remains is that insisting that you can trust your colleague, just because he or she is your colleague, would be descending into teleology. I'm not a proponent of wishful thinking if good arguments suggest the contrary.

Shared code ownership #

Perhaps you shouldn't trust your colleagues. How does that impact software development?

The good news is that this is yet another argument to practice the beneficial practices of shared code ownership. Crucially, what this should entail is not just that everyone is allowed to edit any line of code, but rather that all team members take responsibility for the entire code base. No-one should be allowed to write code in splendid isolation.

There are several ways to address this concern. I often phrase it as follows: There should be at least two pair of eyes on every line of code before a merge to master.

As I describe in Code That Fits in Your Head, you can achieve that goal with pair programming, ensemble programming, or code reviews (including agile pull request reviews). That's a broad enough palette that it should be possible for every developer in every organization to find a modus vivendi that fits any personality and context.

Just looking at each others' code could significantly raise the bar for a would-be mole to add a backdoor to the code base. As an added benefit, it might also raise the general code quality.

What this does suggest to me, however, is that a too simplistic notion of running on trunk may be dangerous. Letting everyone commit to master and trusting that everyone means well no longer strikes me as a good idea (again, given the context, and all that).

Or, if you do, you should consider having some sort of systematic posterior ~~post mortem~~ review process. I've read of organizations that do that, but specific sources escape me at the moment. With Git, however, it's absolutely within the realms of the possible to make a diff of all change since the last ex-post review, and then go through those changes.

Conclusion #

The world is changed. I feel it in the OWASP top 10. I sense it in the shifting geopolitical climate. I smell it on the code I review.

Much that once was, is lost. The dream of a global computer network with boundless trust is no more. There are countries whose interests no longer align with ours. Who pay full-time salaries to people whose job it is to wage 'cyber warfare' against us. We can't rule out that parts of such campaigns include planting moles in our midsts. Moles whose task it is to weaken the foundations of our digital infrastructure.

In that light, should you always trust your colleagues?

Despite the depressing thought that I probably shouldn't, I'm likely to bounce back to my usual Danish most-people-are-to-be-trusted attitude tomorrow. On the other hand, I'll still insist that more than one person is involved with every line of code. Not only because every other person may be a foreign agent, but mostly, still, because humans are fallible, and two brains think better than one.

Comments

Tyson Williams #

Or, if you do, you should consider having some sort of systematic post mortem review process. I've read of organizations that do that, but specific sources escape me at the moment.

My company has a Google Docs template for postmortem analysis that we use when something goes especially wrong. The primary focus is stating what went wrong according to the "five whys technique". Our template links to this post by Eric Ries. There is alsothis Wikipedia article on the subject. The section heading are "What happened" (one sentence), "Impact on Customers" (duration and severity), "What went wrong (5 Whys)", "What went right (optional)", "Corrective Actions" (and all of the content so far should be short enough to fit on one page), "Timeline" (a bulleted list asking for "Event beginning", "Time to Detect (monitoring)", "Time to Notify (alerting)", "Time to Respond (devops)", "Time to Troubleshoot (devops)", "Time to Mitigate (devops)", "Event end"), "Logs (optional)".

2024-07-21 15:37 UTC

Mark Seemann #

Tyson, thank you for writing. I now realize that 'post mortem' was a poor choice of words on my part, since it implies that something went wrong. I should have written 'posterior' instead. I'll update the article.

I've been digging around a bit to see if I can find the article that originally made me aware of that option. I'm fairly sure that it wasn't Optimizing the Software development process for continuous integration and flow of work, but that article, on the other hand, seems to be the source that other articles cite. It's fairly long, and also discusses other concepts; the relevant section here is the one called Non-blocking reviews.

An shorter overview of this kind of process can be found in Non-Blocking, Continuous Code Reviews - a case study.

2024-07-26 08:04 UTC

Jiehong #

In change management/risk control, your There should be at least two pair of eyes on every line of code is called four eye principle, and is a standard practice in my industry (IT services provider for the travel industry).

It goes further, and requires 2 more pair of eyes for any changes from the code review, to the load of a specific software in production.

I has a nice side-effect during code reviews: it's an automatic way to dessiminate knowledge in the team, so the bus factor is never 1.

I think that real people can mostly be trusted. But, software is not always run by people. Even when it is, a single non-trust-worthy person's action is amplified by software being run by mindless computers. It's like one rotten apple is enough to poison the full bag.

In the end, and a bit counter-intuitively, trusting people less now is leading to being able to trust more soon: people are forced to say "you can trust me, and here are the proofs". (Eg: the recently announced Apple's Private Cloud Compute).

2024-07-29 14:29 UTC

Mark Seemann #

Jiehong, thank you for writing. Indeed, in Code That Fits in Your Head I discuss how shared code ownership reduces the bus factor.

From this article and previous discussions I've had, I can see that the term trust is highly charged. People really don't like the notion that trust may be misplaced, or that mistrust, even, might be appropriate. I can't tell if it's a cultural bias of which I'm not aware. While English isn't my native language, I believe that I'm sufficiently acquainted with anglo-saxon culture to know of its most obvious quirks. Still, I'm sometimes surprised.

I admit that I, too, first consider whether I'm dealing with a deliberate adversary if I'm asked whether I trust someone, but soon after, there's a secondary interpretation that originates from normal human fallibility. I've already written about that: No, I don't trust my colleagues to be infallible, as I don't trust myself to be so.

Fortunately, it seems to me that the remedies that may address such concerns are the same, regardless of the underlying reasons.

2024-08-06 05:57 UTC

This blog is totally free, but if you like it, please consider supporting it.

Should interfaces be asynchronous?

2024-07-08T13:52:00+00:00

Async and await are notorious for being contagious. Must all interfaces be Task-based, just in case?

I recently came across this question on Mastodon:

"To async or not to async?

"How would you define a library interface for a service that probably will be implemented with an in memory procedure - let's say returning a mapped value to a key you registered programmatically - and a user of your API might want to implement a decorator that needs a 'long running task' - for example you want to log a msg into your db or load additional mapping from a file?

"Would you define the interface to return a Task<string> or just a string?"

Fandermill

While seemingly a simple question, it's both fundamental and turns out to have deeper implications than you may at first realize.

Interpretation #

Before I proceed, I'll make my interpretation of the question more concrete. This is just how I interpret the question, so doesn't necessarily reflect the original poster's views.

The post itself doesn't explicitly mention a particular language, and since several languages now have async and await features, the question may be of more general interest that a question constrained to a single language. On the other hand, in order to have something concrete to discuss, it'll be useful with some real code examples. From perusing the discussion surrounding the original post, I get the impression that the language in question may be C#. That suits me well, since it's one of the languages with which I'm most familiar, and is also a language where programmers of other C-based languages should still be able to follow along.

My interpretation of the implementation, then, is this:

public sealed class NameMap
{
    private readonly Dictionary<Guid, string> knownIds = new()
    {
        { new Guid("4778CA3D-FB1B-4665-AAC1-6649CEFA4F05"), "Bob" },
        { new Guid("8D3B9093-7D43-4DD2-B317-DCEE4C72D845"), "Alice" }
    };
 
    public string GetName(Guid guid)
    {
        return knownIds.TryGetValue(guid, out var name) ? name : "Trudy";
    }
}

Nothing fancy, but, as Fandermill writes in a follow-up post:

"Used examples that first came into mind, but it could be anything really."

Fandermill

The point, as I understand it, is that the intended implementation doesn't require asynchrony. A Decorator, on the other hand, may.

Should we, then, declare an interface like the following?

public interface INameMap
{
    Task<string> GetName(Guid guid);
}

If we do, the NameMap class can't automatically implement that interface because the return types of the two GetName methods don't match. What are the options?

Conform #

While the following may not be the 'best' answer, let's get the obvious solution out of the way first. Let the implementation conform to the interface:

public sealed class NameMap : INameMap
{
    private readonly Dictionary<Guid, string> knownIds = new()
    {
        { new Guid("4778CA3D-FB1B-4665-AAC1-6649CEFA4F05"), "Bob" },
        { new Guid("8D3B9093-7D43-4DD2-B317-DCEE4C72D845"), "Alice" }
    };
 
    public Task<string> GetName(Guid guid)
    {
        return Task.FromResult(
            knownIds.TryGetValue(guid, out var name) ? name : "Trudy");
    }
}

This variation of the NameMap class conforms to the interface by making the GetName method look asynchronous.

We may even keep the synchronous implementation around as a public method if some client code might need it:

public sealed class NameMap : INameMap
{
    private readonly Dictionary<Guid, string> knownIds = new()
    {
        { new Guid("4778CA3D-FB1B-4665-AAC1-6649CEFA4F05"), "Bob" },
        { new Guid("8D3B9093-7D43-4DD2-B317-DCEE4C72D845"), "Alice" }
    };
 
    public Task<string> GetName(Guid guid)
    {
        return Task.FromResult(GetNameSync(guid));
    }
 
    public string GetNameSync(Guid guid)
    {
        return knownIds.TryGetValue(guid, out var name) ? name : "Trudy";
    }
}

Since C# doesn't support return-type-based overloading, we need to distinguish these two methods by giving them different names. In C# it might be more idiomatic to name the asynchronous method GetNameAsync and the synchronous method just GetName, but for reasons that would be too much of a digression now, I've never much liked that naming convention. In any case, I'm not going to go in this direction for much longer, so it hardly matters how we name these two methods.

Kinds of interfaces #

Another digression is, however, quite important. Before we can look at some more code, I'm afraid that we have to perform a bit of practical ontology, as it were. It starts with the question: Why do we even need interfaces?

I should also make clear, as a digression within a digression, that by 'interface' in this context, I'm really interested in any kind of mechanism that enables you to achieve polymorphism. In languages like C# or Java, we may in fact avail ourselves of the interface keyword, as in the above INameMap example, but we may equally well use a base class or perhaps just what C# calls a delegate. In other languages, we may use function or action types, or even function pointers.

Regardless of specific language constructs, there are, as far as I can tell, two kinds of interfaces:

Interfaces that enable variability or extensibility in behaviour.
Interfaces that mostly or exclusively exist to support automated testing.

While there may be some overlap between these two kinds, in my experience, the intersection between the two tends to be surprisingly small. Interfaces tend to mostly belong to one of those two categories.

Strategies and higher-order functions #

In design-patterns parlance, examples of the first kind are Builder, State, Chain of Responsibility, Template Method, and perhaps most starkly represented by the Strategy pattern. A Strategy is an encapsulated piece of behaviour that you pass around as a single 'thing' (an object).

And granted, you could also use a Strategy to access a database or make a web-service call, but that's not how the pattern was originally described. We'll return to that use case in the next section.

Rather, the first kind of interface exists to enable extensibility or variability in algorithms. Typical examples (from Design Patterns) include page layout, user interface component rendering, building a maze, finding the most appropriate help topic for a given application context, and so on. If we wish to relate this kind of interface to the SOLID principles, it mostly exists to support the Open-closed principle.

A good heuristics for identifying such interfaces is to consider the Reused Abstractions Principle (Jason Gorman, 2010, I'd link to it, but the page has fallen off the internet. Use your favourite web archive to read it.). If your code base contains multiple production-ready implementations of the same interface, you're reusing the interface, most likely to vary the behaviour of a general-purpose data structure.

And before the functional-programming (FP) crowd becomes too smug: FP uses this kind of interface all the time. In the FP jargon, however, we rather talk about higher-order functions and the interfaces we use to modify behaviour are typically modelled as functions and passed as lambda expressions. So when you write Cata((_, xs) => xs.Sum(), _ => 1) (as one does), you might as well just have passed a Visitor implementation to an Accept method.

This hints at a more quantifiable distinction: If the interface models something that's intended to be a pure function, it'd typically be part of a higher-order API in FP, while we in object-oriented design (once again) lack the terminology to distinguish these interfaces from the other kind.

These days, in C# I mostly use these kinds of interfaces for the Visitor pattern.

Seams #

The other kind of interface exists to afford automated testing. In Working Effectively with Legacy Code, Michael Feathers calls such interfaces Seams. Modern object-oriented code bases often use Dependency Injection (DI) to control which Strategies are in use in a given context. The production system may use an object that communicates with a relational database, while an automated test environment might replace that with a Test Double.

Yes, I wrote Strategies. As I suggested above, a Strategy is really a replaceable object in its purest form. When you use DI you may call all those interfaces IUserRepository, ICommandHandler, IEmailGateway, and so on, but they're really all Strategies.

Contrary to the first kind of interface, you typically only find a single production implementation of each of these interfaces. If you find more that one, the rest are usually Decorators (one that logs, one that caches, one that works as a Circuit Breaker, etc.). All other implementations will be defined in the test code as dynamic mocks or Fakes.

Code bases that rely heavily on DI in order to support testing rub many people the wrong way. In 2014 David Heinemeier Hansson published a serious criticism of such test-induced damage. For the record, I agree with the criticism, but not with the conclusion. While I still practice test-driven development, I only define interfaces for true architectural dependencies. So, yes, my code bases may have an IReservationsRepository or IEmailGateway, but no ICommandHandler or IUserManager.

The bottom line, though, is that some interfaces exist to support testing. If there's a better way to make inherently non-deterministic systems behave deterministically in a test context, I've yet to discover it.

(As an aside, it's worth looking into tests that adopt non-deterministic behaviour as a driving principle, or at least an unavoidable state of affairs. Property-based testing is one such approach, but I also found the article When I'm done, I don't clean up by Arialdo Martini interesting. You may also want to refer to my article Waiting to happen for a discussion of how to make tests independent of system time.)

Where to define interfaces #

The reason the above distinction is important is that it fundamentally determines where interfaces should be defined. In short, the first kind of interface is part of an object model's API, and should be defined together with that API. The second kind, on the other hand, is part of a particular application's architecture, and should be defined by the client code that talks to the interface.

As an example of the first kind, consider this recent example, where the IPriorityEditor<T> interface is part of the PriorityCollection<T> API. You must ship the interface together with the class, because the Edit method takes an interface implementation as an argument. It's how client code interacts with the API.

Another example is this Table class that comes with an ITableVisitor<T> interface. In both cases, we'd expect interface implementations to be deterministic. These interfaces don't exist to support automated testing, but rather to afford a flexible programming model.

For the sake of argument, imagine that you package such APIs in reusable libraries that you publish via a package manager. In that case, it's obvious that the interface is as much part of the package as the class.

Contrast this with the other kind of interface, as described in the article Decomposing CTFiYH's sample code base or showcased in the article An example of state-based testing in C#. In the latter example, the interfaces IUserReader and IUserRepository are not part of any pre-packaged library. Rather, they are defined by the application code to support application-specific needs.

This may be even more evident if you contemplate the diagram in Decomposing CTFiYH's sample code base. Interfaces like IPostOffice and IReservationsRepository only exist to support the application. Following the Dependency Inversion Principle

"clients [...] own the abstract interfaces"

Robert C. Martin, APPP, chapter 11

In these code bases, only the Controllers (or rather the tests that exercise them) need these interfaces, so the Controllers get to define them.

Should it be asynchronous, then? #

Okay, so should INameMap.GetName return string or Task<string>, then?

Hopefully, at this point, it should be clear that the answer depends on what kind of interface it is.

If it's the first kind, the return type should support the requirements of the API. If the object model doesn't need the return type to be asynchronous, it shouldn't be.

If it's the second kind of interface, the application code decides what it needs, and defines the interface accordingly.

In neither case, however, is it the concrete class' responsibility to second-guess what client code might need.

But client code may need the method to be asynchronous. What's the harm of returning Task<string>, just in case?

The problem, as you may well be aware, is that the asynchronous programming model is contagious. Once you've made an API asynchronous, you can't easily make it synchronous, whereas if you have a synchronous API, you can easily make it asynchronous. This follows from Postel's law, in this case: Be conservative with what you send.

Library API #

Imagine, for the sake of argument, that the NameMap class is defined in a reusable library, wrapped in a package and imported into your code base via a package manager (NuGet, Maven, pip, NPM, Hackage, RubyGems, etc.).

Clearly it shouldn't implement any interface in order to 'support unit testing', since such interfaces should be defined by application code.

It could implement one or more 'extensibility' interfaces, if such interfaces are part of the wider API offered by the library. In the case of the NameMap class, we don't really know if that's the case. To complete this part of the argument, then, I'd just leave it as shown in the first code example, shown above. It doesn't need to implement any interface, and GetName can just return string.

Domain Model #

What if, instead of an external library, the NameMap class is part of an application's Domain Model?

In that case, you could define application-level interfaces as part of the Domain Model. In fact, most people do. Even so, I'd recommend that you don't, at least if you're aiming for a Functional Core, Imperative Shell architecture, a functional architecture, or even a Ports and Adapters or, if you will, Clean Architecture. The interfaces that exist only to support testing are application concerns, so keep them out of the Domain Model and instead define them in the Application Model.

You don't have to follow my advice. If you want to define interfaces in the Domain Model, I can't stop you. But what if, as I recommend, you define application-specific interfaces in the Application Model? If you do that, your NameMap Domain Model can't implement your INameMap interface, because the dependencies point the other way, and most languages will not allow circular dependencies.

In that case, what do you do if, as the original toot suggested, you need to Decorate the GetName method with some asynchronous behaviour?

You can always introduce an Adapter:

public sealed class NameMapAdapter : INameMap
{
    private readonly NameMap imp;
 
    public NameMapAdapter(NameMap imp)
    {
        this.imp = imp;
    }
 
    public Task<string> GetName(Guid guid)
    {
        return Task.FromResult(imp.GetName(guid));
    }
}

Now any NameMap object can look like an INameMap. This is exactly the kind of problem that the Adapter pattern addresses.

But, you say, that's too much trouble! I don't want to have to maintain two classes that are almost identical.

I understand the concern, and it may even be appropriate. Maybe you're right. As usual, I don't really intend this article to be prescriptive. Rather, I'm offering ideas for your consideration, and you can choose to adopt them or ignore them as it best fits your context.

When it comes to whether or not an Adapter is an unwarranted extra complication, I'll return to that topic later in this article.

Application Model #

The final architectural option is when the concrete NameMap class is part of the Application Model, where you'd also define the application-specific INameMap interface. In that case, we must assume that the NameMap class implements some application-specific concern. If you want it to implement an interface so that you can wrap it in a Decorator, then do that. This means that the GetName method must conform to the interface, and if that means that it must be asynchronous, then so be it.

As Kent Beck wrote in a Facebook article that used to be accessible without a Facebook account (but isn't any longer):

"Things that change at the same rate belong together. Things that change at different rates belong apart."

Naming From the Outside In, Kent Beck, Facebook, 2012

If the concrete NameMap class and the INameMap interface are both part of the application model, it's not unreasonable to guess that they may change together. (Do be mindful of Shotgun Surgery, though. If you expect the interface and the concrete class to frequently change, then perhaps another design might be more appropriate.)

Easier Adapters #

Before concluding this article, let's revisit the topic of introducing an Adapter for the sole purpose of 'architectural purity'. Should you really go to such lengths only to 'do it right'? You decide, but

You can only be pragmatic if you know how to be dogmatic.

What to test and not to test, me

I'm presenting a dogmatic solution for your consideration, so that you know what it might look like. Would I follow my own 'dogmatic' advice? Yes, I usually do, but then, I wouldn't log the return value of a pure function, so I wouldn't introduce an interface for that purpose, at least. To be fair to Fandermill, he or she also wrote: "or load additional mapping from a file", which could be an appropriate motivation for introducing an interface. I'd probably go with an Adapter in that case.

Whether or not an Adapter is an unwarranted complication depends, however, on language specifics. In high-ceremony languages like C#, Java, or C++, adding an Adapter involves at least one new file, and dozens of lines of code.

Consider, on the other hand, a low-ceremony language like Haskell. The corresponding getName function might close over a statically defined map and have the type getName :: UUID -> String.

How do you adapt such a pure function to an API that returns IO (which is roughly comparable to task-based programming)? Trivially:

getNameM :: Monad m => UUID -> m String
getNameM = return . getName

For didactic purposes I have here shown the 'Adapter' as an explicit function, but in idiomatic Haskell I'd consider this below the Fairbairn threshold; I'd usually just inline the composition return . getName if I needed to adapt the getName function to the Kleisli category.

You can do the same in F#, where the composition would be getName >> Task.FromResult. F# compositions usually go in the (for Westerners) intuitive left-to-right directions, whereas Haskell compositions follow the mathematical right-to-left convention.

The point, however, is that there's nothing conceptually complicated about an Adapter. Unfortunately, however, some languages require substantial ceremony to implement them.

Conclusion #

Should an API return a Task-based (asynchronous) value 'just in case'? In general: No.

You can't predict all possible use cases, so don't make an API more complicated than it has to be. If you need to implement an application-specific interface, use the Adapter design pattern.

A possible exception to this rule is if the entire API (the concrete implementation and the interface) only exists to support a specific application. If the interface and its concrete implementation are both part of the Application Model, you may as well skip the Adapter step and consider the concrete implementation as its own Adapter.

This blog is totally free, but if you like it, please consider supporting it.

An immutable priority collection

2024-07-01T17:28:00+00:00

With examples in C# and F#.

This article is part of a series about encapsulation and immutability. After two attempts at an object-oriented, mutable implementation, I now turn toward immutability. As already suggested in the introductory article, immutability makes it easier to maintain invariants.

In the introductory article, I described the example problem in more details, but in short, the exercise is to develop a class that holds a collection of prioritized items, with the invariant that the priorities must always sum to 100. It should be impossible to leave the object in a state where that's not true. It's quite an illuminating exercise, so if you have the time, you should try it for yourself before reading on.

Initialization #

Once again, I begin by figuring out how to initialize the object, and how to model it. Since it's a kind of collection, and since I now plan to keep it immutable, it seems natural to implement IReadOnlyCollection<T>.

In this, the third attempt, I'll reintroduce Prioritized<T>, with one important difference. It's now an immutable record:

public sealed record Prioritized<T>(T Item, byte Priority);

If you're not on a version of C# that supports records (which is also trivially true if you're not using C# at all), you can always define an immutable class by hand. It just requires more boilerplate code.

Prioritized<T> is going to be the T in the IReadOnlyCollection<T> implementation:

public sealed class PriorityCollection<T> : IReadOnlyCollection<Prioritized<T>>

Since an invariant should always hold, it should also hold at initialization, so the PriorityCollection<T> constructor must check that all is as it should be:

private readonly Prioritized<T>[] priorities;
 
public PriorityCollection(params Prioritized<T>[] priorities)
{
    if (priorities.Sum(p => p.Priority) != 100)
        throw new ArgumentException(
            "The sum of all priorities must be 100.",
            nameof(priorities));
    this.priorities = priorities;
}

The rest of the class is just the IReadOnlyCollection<T> implementation, which just delegates to the priorities field.

That's it, really. That's the API. We're done.

Projection #

But, you may ask, how does one edit such a collection?

(Comic originally by John Muellerleile.)

Humour aside, you don't edit an immutable object, but rather make a new object from a previous one. Most modern languages now come with built-in collection-projection APIs; in .NET, it's called LINQ. Here's an example. You begin with a collection with two items:

var pc = new PriorityCollection<string>(
    new Prioritized<string>("foo", 60),
    new Prioritized<string>("bar", 40));

You'd now like to add a third item with priority 20:

var newPriority = new Prioritized<string>("baz", 20);

How should you make room for this new item? One option is to evenly reduce each of the existing priorities:

var reduction = newPriority.Priority / pc.Count;
IEnumerable<Prioritized<string>> reduced = pc
    .Select(p => p with { Priority = (byte)(p.Priority - reduction) });

Notice that while the sum of priorities in reduced no longer sum to 100, it's okay, because reduced isn't a PriorityCollection object. It's just an IEnumerable<Prioritized<string>>.

You can now Append the newPriority to the reduced sequence and repackage that in a PriorityCollection:

var adjusted = new PriorityCollection<string>(reduced.Append(newPriority).ToArray());

Like the original pc object, the adjusted object is valid upon construction, and since its immutable, it'll remain valid.

Edit #

If you think this process of unwrapping and rewrapping seems cumbersome, we can make it a bit more palatable by defining a wrapping Edit function, similar to the one in the previous article:

public PriorityCollection<T> Edit(
    Func<IReadOnlyCollection<Prioritized<T>>, IEnumerable<Prioritized<T>>> edit)
{
    return new PriorityCollection<T>(edit(this).ToArray());
}

You can now write code equivalent to the above example like this:

var adjusted = pc.Edit(col =>
{
    var reduced = col.Select(p => p with { Priority = (byte)(p.Priority - reduction) });
    return reduced.Append(newPriority);
});

I'm not sure it's much of an improvement, though.

Using the right tool for the job #

While C# over the years has gained some functional-programming features, it's originally an object-oriented language, and working with immutable values may still seem a bit cumbersome. If so, consider using a language natively designed for this style of programming. On .NET, F# is the obvious choice.

First, you define the required types:

type Prioritized<'a> = { Item: 'a; Priority: byte }
 
type PriorityList = private PriorityList of Prioritized<string> list

Notice that PriorityList has a private constructor, so that client code can't just create any value. The type should protect its invariants, since encapsulation is also relevant in functional programming. Since client code can't directly create PriorityList objects, you instead supply a function for that purpose:

module PriorityList =
    let tryCreate priorities =
        if priorities |> List.sumBy (_.Priority) = 100uy
        then Some (PriorityList priorities)
        else None

That's really it, although you also need a way to work with the data. We supply two alternatives that correspond to the above C#:

let edit f (PriorityList priorities) = f priorities |> tryCreate
 
let toList (PriorityList priorities) = priorities

These functions are also defined on the PriorityList module.

Here's the same adjustment example as shown above in C#:

let pl  =
    [ { Item = "foo"; Priority = 60uy }; { Item = "bar"; Priority = 40uy } ]
    |> PriorityList.tryCreate
let newPriority = { Item = "baz"; Priority = 20uy }
let adjusted =
    pl
    |> Option.bind (PriorityList.edit (fun l ->
        l
        |> List.map (fun p ->
            { p with Priority = p.Priority - (newPriority.Priority / byte l.Length) })
        |> List.append [ newPriority ]))

The entire F# definition is 15 lines of code, including namespace declaration and blank lines.

Conclusion #

With an immutable data structure, you only need to check the invariants upon creation. Invariants therefore become preconditions. Once a value is created in a valid state, it stays valid because it never changes state.

If you're having trouble maintaining invariants in an object-oriented design, try making the object immutable. It's likely to make it easier to attain good encapsulation.

Comments

Jiehong #

First, it's a nice series of articles.

I see that nowadays C# has a generic projection, which is a sort of wither in Java parlance. I should be usable instead of having to define the `Edit` one.

A way to make it more palatable would be to have a `tryAddAndRedistrube(Prioritized element) : PriorityCollection | None` method to `PriorityCollection` that would try to reduce priorities of elements, before adding the new one and returning a new `PriorityCollection` using that same `with` projection. This would allow the caller to have a slightly simpler method to call, at the expense of having to store the new collection and assuming this is the intended way the caller wants to insert the element.

But, it's usually not possible to anticipate all the ways the clients wants to add elements to something, so I think I prefer the open-ended way this API lets clients choose.

2024-07-29 13:53 UTC

Mark Seemann #

Thank you for writing. Whether or not a wither works in this case depends on language implementation details. For example, the F# example code doesn't allow copy-and-update expressions because the record constructor is private. This is as it should be, since otherwise, client code would be able to circumvent the encapsulation.

I haven't tried to refactor the C# class to a record, and I don't recall whether C# with expressions respect custom constructors. That's a good exercise for any reader to try out; unfortunately, I don't have time for that at the moment.

As to your other point, it's definitely conceivable that a library developer could add more convenient methods to the PriorityCollection<T> class, including one that uses a simple formula to redistribute existing priorities to make way for the new one. As far as I can tell, though, you'd be able to implement such more convenient APIs as extension methods that are implemented using the basic affordances already on display here. If so, we may consider the constructor and the IReadOnlyCollection<Prioritized<T>> interface as the fundamental API. Everything else, including the Edit method, could build off that.

2024-07-30 06:46 UTC

This blog is totally free, but if you like it, please consider supporting it.

A mutable priority collection

2024-06-24T17:59:00+00:00

An encapsulated, albeit overly complicated, implementation.

This is the second in a series of articles about encapsulation and immutability. In the next article, you'll see how immutability makes encapsulation easier, but in order to appreciate that, you should see the alternative. This article, then, shows a working, albeit overly complicated, implementation that does maintain its invariants.

Initialization #

As the previous article demonstrated, inheriting directly from a base class seems like a dead end. Once you see the direction that I go in this article, you may argue that it'd be possible to also make that design work with an inherited collection. It may be, but I'm not convinced that it would improve anything. Thus, for this iteration, I decided to eschew inheritance.

On the other hand, we need an API to query the object about its state, and I found that it made sense to implement the IReadOnlyDictionary interface.

As before, invariants are statements that are always true about an object, and that includes a newly initialized object. Thus, the PriorityCollection<T> class should require enough information to safely initialize.

public sealed class PriorityCollection<T> : IReadOnlyDictionary<T, byte> where T : notnull
{
    private readonly Dictionary<T, byte> dict;
 
    public PriorityCollection(T initial)
    {
        dict = new Dictionary<T, byte> { { initial, 100 } };
    }
 
    // IReadOnlyDictionary implemented by delegating to dict field...
}

Several design decisions are different from the previous article. This design has no Prioritized<T> class. Instead it treats the item (of type T) as a dictionary key, and the priority as the value. The most important motivation for this design decision was that this enables me to avoid the 'leaf node mutation' problem that I demonstrated in the previous article. Notice how, while the general design in this iteration will be object-oriented and mutable, I already take advantage of a bit of immutability to make the design simpler and safer.

Another difference is that you can't initialize a PriorityCollection<T> object with a list. Instead, you only need to tell the constructor what the initial item is. The constructor will then infer that, since this is the only item so far, its priority must be 100. It can't be anything else, because that would violate the invariant. Thus, no assertion is required in the constructor.

Mutation API #

So far, the code only implements the IReadOnlyDictionary API, so we need to add some methods that will enable us to add new items and so on. As a start, we can add methods to add, remove, or update items:

public void Add(T key, byte value)
{
    AssertInvariants(dict.Append(KeyValuePair.Create(key, value)));
    dict.Add(key, value);
}
 
public void Remove(T key)
{
    AssertInvariants(dict.Where(kvp => !kvp.Key.Equals(key)));
    dict.Remove(key);
}
 
public byte this[T key]
{
    get { return dict[key]; }
    set
    {
        var l = dict.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
        l[key] = value;
        AssertInvariants(l);
 
        dict[key] = value;
    }
}

I'm not going to show the AssertInvariants helper method yet, since it's going to change anyway.

At this point, the implementation suffers from the same problem as the example in the previous article. While you can add new items, you can only add an item with priority 0. You can only remove items if they have priority 0. And you can only 'update' an item if you set the priority to the same value as it already had.

We need to be able to add new items, change their priorities, and so on. How do we get around the above problem, without breaking the invariant?

Edit mode #

One way out of this conundrum is introduce a kind of 'edit mode'. The idea is to temporarily turn off the maintenance of the invariant for long enough to allow edits.

Af first glance, such an idea seems to go against the very definition of an invariant. After all, an invariant is a statement about the object that is always true. If you allow a client developer to turn off that guarantee, then, clearly, the guarantee is gone. Guarantees only work if you can trust them, and you can't trust them if they can be cancelled.

That idea in itself doesn't work, but if we can somehow encapsulate such an 'edit action' in an isolated scope that either succeeds or fails in its entirety, we may be getting somewhere. It's an idea similar to Unit of Work, although here we're not involving an actual database. Still, an 'edit action' is a kind of in-memory transaction.

For didactic reasons, I'll move toward that design in a series of step, where the intermediate steps fail to maintain the invariant. We'll get there eventually. The first step is to introduce 'edit mode'.

private bool isEditing;

While I could have made that flag public, I found it more natural to wrap access to it in two methods:

public void BeginEdit()
{
    isEditing = true;
}
 
public void EndEdit()
{
    isEditing = false;
}

This still doesn't accomplishes anything in itself, but the final change in this step is to change the assertion so that it respects the flag:

private void AssertInvariants(IEnumerable<KeyValuePair<T, byte>> candidate)
{
    if (!isEditing && candidate.Sum(kvp => kvp.Value) != 100)
        throw new InvalidOperationException(
            "The sum of all values must be 100.");
}

Finally, you can add or change priorities, as this little F# example shows:

sut.BeginEdit ()
sut["foo"] <- 50uy
sut["bar"] <- 50uy
sut.EndEdit ()

Even if you nominally 'don't read F#', this little example is almost like C# without semicolons. The <- arrow is F#'s mutation or assignment operator, which in C# would be =, and the uy suffix is the F# way of stating that the literal is a byte.

The above example is well-behaved because the final state of the object is valid. The priorities sum to 100. Even so, no code in PriorityCollection<T> actually checks that, so we could trivially leave the object in an invalid state.

Assert invariant at end of edit #

The first step toward remedying that problem is to add a check to the EndEdit method:

public void EndEdit()
{
    isEditing = false;
    AssertInvariants(dict);
}

The class is still not effectively protecting its invariants, because a client developer could forget to call EndEdit, or client code might pass around a collection in edit mode. Other code, receiving such an object as an argument, may not know whether or not it's in edit mode, so again, doesn't know if it can trust it.

We'll return to that problem shortly, but first, there's another, perhaps more pressing issue that we should attend to.

Edit dictionary #

The current implementation directly edits the collection, and even if a client developer remembers to call EndEdit, other code, higher up in the call stack could circumvent the check and leave the object in an invalid state. Not that I expect client developers to be deliberately malicious, but the notion that someone might wrap a method call in a try-catch block seems realistic.

The following F# unit test demonstrates the issue:

[<Fact>]
let ``Attempt to circumvent`` () =
    let sut = PriorityCollection<string> "foo"
 
    try
        sut.BeginEdit ()
        sut["foo"] <- 50uy
        sut["bar"] <- 48uy
        sut.EndEdit ()
    with _ -> ()
 
    100uy =! sut["foo"]
    test <@ sut.ContainsKey "bar" |> not @>

Again, let me walk you through it in case you're unfamiliar with F#.

The try-with block works just like C# try-catch blocks. Inside of that try-with block, the test enters edit mode, changes the values in such a way that the sum of them is 98, and then calls EndEdit. While EndEdit throws an exception, those four lines of code are wrapped in a try-with block that suppresses all exceptions.

The test attempts to verify that, since the edit failed, the "foo" value should be 100, and there should be no "bar" value. This turns out not to be the case. The test fails. The edits persist, even though EndEdit throws an exception, because there's no roll-back.

You could probably resolve that defect in various ways, but I chose to address it by introducing two, instead of one, backing dictionaries. One holds the data that always maintains the invariant, and the other is a temporary dictionary for editing.

private Dictionary<T, byte> current;
private readonly Dictionary<T, byte> encapsulated;
private readonly Dictionary<T, byte> editable;
private bool isEditing;
 
public PriorityCollection(T initial)
{
    encapsulated = new Dictionary<T, byte> { { initial, 100 } };
    editable = [];
    current = encapsulated;
}

There are two dictionaries: encapsulated holds the always-valid list of priorities, while editable is the dictionary that client code will be editing when in edit mode. Finally, current is either of these: editable when the object is in edit mode, and encapsulated when it's not. Most of the existing code shown so far now uses current, which before was called dict. The important changes are in BeginEdit and EndEdit.

public void BeginEdit()
{
    isEditing = true;
 
    editable.Clear();
    foreach (var kvp in current)
        editable.Add(kvp.Key, kvp.Value);
    current = editable;
}

Besides setting the isEditing flag, BeginEdit now copies all data from current to editable, and then sets current to editable. Keep in mind that encapsulated still holds the original, valid values.

Now that I'm writing this, I'm not even sure if this method is re-entrant, in the following sense: What happens if client code calls BeginEdit, makes some changes, and then calls BeginEdit again? It's questions like these that I don't feel intelligent enough to feel safe that I always answer correctly. That's why I like functional programming better. I don't have to think so hard.

Anyway, this will soon become irrelevant, since BeginEdit and EndEdit will eventually become private methods.

The EndEdit method performs the inverse manoeuvre:

public void EndEdit()
{
    isEditing = false;
    try
    {
        AssertInvariants(current);
 
        encapsulated.Clear();
        foreach (var kvp in current)
            encapsulated.Add(kvp.Key, kvp.Value);
        current = encapsulated;
    }
    catch
    {
        current = encapsulated;
        throw;
    }
}

It first checks the invariant, and only copies the edited values to the encapsulated dictionary if the invariant still holds. Otherwise, it restores the original encapsulated values and rethrows the exception.

This helps to make the nature of editing 'transactional' in nature, but it doesn't address the issue that the collection is in an invalid state during editing, or that a client developer may forget to call EndEdit.

Edit action #

As the next step towards addressing that problem, we may now introduce a 'wrapper method' for that little object protocol:

public void Edit(Action<PriorityCollection<T>> editAction)
{
    BeginEdit();
    editAction(this);
    EndEdit();
}

As you can see, it just wraps that little call sequence so that you don't have to remember to call BeginEdit and EndEdit. My F# test code comes with this example:

sut.Edit (fun col ->
    col["bar"] <- 55uy
    col["baz"] <- 45uy
    col.Remove "foo"
)

The fun col -> part is just F# syntax for a lambda expression. In C#, you'd write it as col =>.

We're close to a solution. What remains is to make BeginEdit and EndEdit private. This means that client code can only edit a PriorityCollection<T> object through the Edit method.

Replace action with interface #

You may complain that this solution isn't properly object-oriented, since it makes use of Action<T> and requires that client code uses lambda expressions.

We can easily fix that.

Instead of the action, you can introduce a Command interface with the same signature:

public interface IPriorityEditor<T> where T : notnull
{
    void EditPriorities(PriorityCollection<T> priorities);
}

Next, change the Edit method:

public void Edit(IPriorityEditor<T> editor)
{
    BeginEdit();
    editor.EditPriorities(this);
    EndEdit();
}

Now you have a nice, object-oriented design, with no lambda expressions in sight.

Full code dump #

The final code is complex enough that it's easy to lose track of what it looks like, as I walk through my process. To make it easer, here's the full code for the collection class:

public sealed class PriorityCollection<T> : IReadOnlyDictionary<T, byte>
    where T : notnull
{
    private Dictionary<T, byte> current;
    private readonly Dictionary<T, byte> encapsulated;
    private readonly Dictionary<T, byte> editable;
    private bool isEditing;
 
    public PriorityCollection(T initial)
    {
        encapsulated = new Dictionary<T, byte> { { initial, 100 } };
        editable = [];
        current = encapsulated;
    }
 
    public void Add(T key, byte value)
    {
        AssertInvariants(current.Append(KeyValuePair.Create(key, value)));
        current.Add(key, value);
    }
 
    public void Remove(T key)
    {
        AssertInvariants(current.Where(kvp => !kvp.Key.Equals(key)));
        current.Remove(key);
    }
 
    public byte this[T key]
    {
        get { return current[key]; }
        set
        {
            var l = current.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
            l[key] = value;
            AssertInvariants(l);
 
            current[key] = value;
        }
    }
 
    public void Edit(IPriorityEditor<T> editor)
    {
        BeginEdit();
        editor.EditPriorities(this);
        EndEdit();
    }
 
    private void BeginEdit()
    {
        isEditing = true;
 
        editable.Clear();
        foreach (var kvp in current)
            editable.Add(kvp.Key, kvp.Value);
        current = editable;
    }
 
    private void EndEdit()
    {
        isEditing = false;
        try
        {
            AssertInvariants(current);
 
            encapsulated.Clear();
            foreach (var kvp in current)
                encapsulated.Add(kvp.Key, kvp.Value);
            current = encapsulated;
        }
        catch
        {
            current = encapsulated;
            throw;
        }
    }
 
    private void AssertInvariants(IEnumerable<KeyValuePair<T, byte>> candidate)
    {
        if (!isEditing && candidate.Sum(kvp => kvp.Value) != 100)
            throw new InvalidOperationException(
                "The sum of all values must be 100.");
    }
 
    public IEnumerable<T> Keys
    {
        get { return current.Keys; }
    }
 
    public IEnumerable<byte> Values
    {
        get { return current.Values; }
    }
 
    public int Count
    {
        get { return current.Count; }
    }
 
    public bool ContainsKey(T key)
    {
        return current.ContainsKey(key);
    }
 
    public IEnumerator<KeyValuePair<T, byte>> GetEnumerator()
    {
        return current.GetEnumerator();
    }
 
    public bool TryGetValue(T key, [MaybeNullWhen(false)] out byte value)
    {
        return current.TryGetValue(key, out value);
    }
 
    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

The IPriorityEditor<T> interface remains as shown above.

Conclusion #

Given how simple the problem is, this solution is surprisingly complicated, and I'm fairly sure that it's not even thread-safe.

At least it does, as far as I can tell, protect the invariant that the sum of priorities must always be exactly 100. Even so, it's just complicated enough that I wouldn't be surprised if a bug is lurking somewhere. It'd be nice if a simpler design existed.

Next: An immutable priority collection.

Comments

Joker_vD #

Where does the notion come that a data structure invariant has to be true at all times? I am fairly certain that it's only required to be true at "quiescent" points of executions. That is, just as the loop invariant is only required to hold before and after each loop step but not inside the loop step, so is the data structure invariant is only required to hold before and after each invocation of its public methods.

This definition actually has an interesting quirk which is absent in the loop invariant: a data structure's method can't, generally speaking, call other public methods of the very same data structure because the invariant might not hold at this particular point of execution! I've been personally bitten by this a couple of times, and I've seen others tripping over this subtle point as well. You yourself notice it when you muse about the re-entrancy of the BeginEdit method.

Now, this particular problem is quite similar to the problem with inner iteration, and can be solved the same way, with the outer editor, as you've done, although I would have probably provided each editor with its own, separate editable dictionary because right now, the editors cannot nest/compose... but that'd complicate implementation even further.

2024-07-03 22:19 UTC

Mark Seemann #

Thank you for writing. As so many other areas of knowledge, the wider field of software development suffers from the problem of overlapping or overloaded terminology. The word invariant is just one of them. In this context, invariant doesn't refer to loop invariants, or any other kind of invariants used in algorithmic analysis.

As outlined in the introduction article, when discussing encapsulation, I follow Object-Oriented Software Construction (OOSC). In that seminal work, Bertrand Meyer proposes the notion of design-by-contract, and specifically decomposes a contract into three parts: preconditions, invariants, and postconditions.

Having actually read the book, I'm well aware that it uses Eiffel as an exemplar of the concept. This has led many readers to conflate design-by-contract with Eiffel, and (in yet another logical derailment) conclude that it doesn't apply to, say, Java or C# programming.

It turns out, however, to transfer easily to other languages, and it's a concept with much practical potential.

A major problem with object-oriented design is that most ideas about good design are too 'fluffy' to be of immediate use to most developers. Take the Single Responsibility Principle (SRP) as an example. It's seductively easy to grasp the overall idea, but turns out to be hard to apply. Being able to identify reasons to change requires more programming experience than most people have. Or rather, the SRP is mostly useful to programmers who already have that experience. Being too 'fluffy', it's not a good learning tool.

I've spent quite some time with development organizations and individual programmers eager to learn, but struggling to find useful, concrete design rules. The decomposition of encapsulation into preconditions, invariants, and postconditions works well as a concrete, almost quantifiable heuristic.

Does it encompass everything that encapsulation means? Probably not, but it's by far the most effective heuristic that I've found so far.

Since I'm currently travelling, I don't have my copy of OOSC with me, but as far as I remember, the notion that an invariant should be true at all times originates there.

In any case, if an invariant doesn't always hold, then of what value is it? The whole idea behind encapsulation (as I read Meyer) is that client developers should be able to use 'objects' without having intimate knowledge of their implementation details. The use of contracts proposes to achieve that ideal by decoupling affordances from implementation details by condensing the legal protocol between object and client code into a contract. This means that a client developer, when making programming decisions, should be able to trust that certain guarantees stipulated by a contract always hold. If a client developer can't trust those guarantees, they aren't really guarantees.

"the data structure invariant is only required to hold before and after each invocation of its public methods"

I can see how a literal reading of OOSC may leave one with that impression. One must keep in mind, however, that the book was written in the eighties, at a time when multithreading wasn't much of a concern. (Incidentally, this is an omission that also mars a much later book on API design, the first edition of the .NET Framework Design Guidelines.)

In modern code, concurrent execution is a real possibility, so is at least worth keeping in mind. I'm still most familiar with the .NET ecosystem, and in it, there are plenty of classes that are documented as not being thread-safe. You could say that such a statement is part of the contract, in which case what you wrote is true: The invariant is only required to hold before and after each method invocation.

If, on the other hand, you want to make the code thread-safe, you must be more rigorous than that. Then an invariant must truly always hold.

This is, of course, a design decision one may take. Just don't bother with thread-safety if it's not important.

Still, the overall thrust of this article series is that immutability makes encapsulation much simpler. This is also true when it comes to concurrency. Immutable data structures are automatically thread-safe.

2024-07-06 8:07 UTC

This blog is totally free, but if you like it, please consider supporting it.

A failed attempt at priority collection with inheritance

2024-06-17T08:04:00+00:00

An instructive dead end.

This article is part of a short series on encapsulation and immutability. As the introductory article claims, object mutation makes it difficult to maintain invariants. In order to demonstrate the problem, I deliberately set out to do it wrong, and report on the result.

In subsequent articles in this series I will then show one way you can maintain the invariants in the face of mutation, as well as how much easier everything becomes if you choose an immutable design.

For now, however, I'll pretend to be naive and see how far I can get with that.

In the first article, I described the example problem in more details, but in short, the exercise is to develop a class that holds a collection of prioritized items, with the invariant that the priorities must always sum to 100. It should be impossible to leave the object in a state where that's not true. It's quite an illuminating exercise, so if you have the time, you should try it for yourself before reading on.

Initialization #

In object-oriented design it's common to inherit from a base class. Since I'll try to implement a collection of prioritized items, it seems natural to inherit from Collection<T>:

public sealed class PriorityCollection<T> : Collection<Prioritized<T>>

Of course, I also had to define Prioritized<T>:

public sealed class Prioritized<T>
{
    public Prioritized(T item, byte priority)
    {
        Item = item;
        Priority = priority;
    }
 
    public T Item { get; set; }
    public byte Priority { get; set; }
}

Since Prioritized<T> is generic, it can be used to prioritize any kind of object. In the tests I wrote, however, I exclusively used strings.

A priority is a number between 0 and 100, so I chose to represent that with a byte. Not that this strongly protects invariants, because values can still exceed 100, but on the other hand, there's no reason to use a 32-bit integer to model a number between 0 and 100.

Now that I write this text, I realize that I could have added a Guard Clause to the Prioritized<T> constructor to enforce that precondition, but as you can tell, I didn't think of doing that. This omission, however, doesn't change the conclusion, because the problems that we'll run into stems from another source.

In any case, just inheriting from Collection<Prioritized<T>> isn't enough to guarantee the invariant that the sum of priorities must be 100. An invariant must always hold, even for a newly initialized object. Thus, we need something like this ensure that this is the case:

public sealed class PriorityCollection<T> : Collection<Prioritized<T>>
{
    public PriorityCollection(params Prioritized<T>[] priorities)
        : base(priorities)
    {
        AssertSumIsOneHundred();
    }
 
    private void AssertSumIsOneHundred()
    {
        if (this.Sum(p => p.Priority) != 100)
            throw new InvalidOperationException(
                "The sum of all priorities must be 100.");
    }
}

So far, there's no real need to have a separate AssertSumIsOneHundred helper method; I could have kept that check in the constructor, and that would have been simpler. I did, however, anticipate that I'd need the helper method in other parts of the code base. As it turned out, I did, but not without having to change it.

Protecting overrides #

The Collection<T> base class offers normal collection methods like Add, Insert, Remove and so on. The default implementation allows client code to make arbitrary changes to the collection, including clearing it. The PriorityCollection<T> class can't allow that, because such edits could easily violate the invariants.

Collection<T> is explicitly designed to be a base class, so it offers various virtual methods that inheritors can override to change the behaviour. In this case, this is necessary.

As it turned out, I quickly realized that I had to change my assertion helper method to check the invariant in various cases:

private static void AssertSumIsOneHundred(IEnumerable<Prioritized<T>> priorities)
{
    if (priorities.Sum(p => p.Priority) != 100)
        throw new InvalidOperationException(
            "The sum of all priorities must be 100.");
}

By taking the sequence of priorities as an input argument, this enables me to simulate what would happen if I make a change to the actual collection, for example when adding an item to the collection:

protected override void InsertItem(int index, Prioritized<T> item)
{
    AssertSumIsOneHundred(this.Append(item));
    base.InsertItem(index, item);
}

By using Append, the InsertItem method creates a sequence of values that simulates what the collection would look like if we add the candidate item. The Append function returns a new collection, so this operation doesn't change the actual PriorityCollection<T>. This only happens if we get past the assertion and call InsertItem.

Likewise, I can protect the invariant in the other overrides:

protected override void RemoveItem(int index)
{
    var l = this.ToList();
    l.RemoveAt(index);
    AssertSumIsOneHundred(l);
    base.RemoveItem(index);
}
 
protected override void SetItem(int index, Prioritized<T> item)
{
    var l = this.ToList();
    l[index] = item;
    AssertSumIsOneHundred(l);
    base.SetItem(index, item);
}

I can even use it in the implementation of ClearItems, although that may seem a tad redundant:

protected override void ClearItems()
{
    AssertSumIsOneHundred([]);
}

I could also just have thrown an exception directly from this method, since it's never okay to clear the collection. This would violate the invariant, because the sum of an empty collection of priorities is zero.

As far as I recall, the entire API of Collection<T> is (transitively) based on those four virtual methods, so now that I've protected the invariant in all four, the PriorityCollection<T> class maintains the invariant, right?

Not yet. See if you can spot the problem.

There are, in fact, at least two remaining problems. One that we can recover from, and one that is insurmountable with this design. I'll get back to the serious problem later, but see if you can spot it already.

Leaf mutation #

In the introductory article I wrote:

"If the mutation happens on a leaf node in an object graph, the leaf may have to notify its parent, so that the parent can recheck the invariants."

I realize that this may sound abstract, but the current code presents a simple example. What happens if you change the Priority of an item after you've initialized the collection?

Consider the following example. For various reasons, I wrote the examples (that is, the unit tests) for this exercise in F#, but even if you're not an F# developer, you can probably understand what's going on. First, we create a Prioritized<string> object and use it to initialize a PriorityCollection<string> object named sut:

let item = Prioritized<string> ("foo", 40uy)
let sut = PriorityCollection<string> (item, Prioritized<string> ("bar", 60uy))

The item has a priority of 40 (the uy suffix is the F# way of stating that the literal is a byte), and the other unnamed value has a priority of 60, so all is good so far; the sum is 100.

Since, however, item is a mutable object, we can now change its Priority:

item.Priority <- 50uy

This changes item.Priority to 50, but since none of the four virtual base class methods of Collection<T> are involved, the sut never notices, the assertion never runs, and the object is now in an invalid state.

That's what I meant when I discussed mutations in leaf nodes. You can think of a collection as a rather flat and boring tree. The collection object itself is the root, and each of the items are leaves, and no further nesting is allowed.

When you edit a leaf, the root isn't automatically aware of such an event. You explicitly have to wire the object graph up so that this happens.

Event propagation #

One possible way to address this issue is to take advantage of .NET's event system. If you're reading along, but you normally write in another language, you can also use the Observer pattern, or even ReactiveX.

We need to have Prioritized<T> raise events, and one option is to let it implement INotifyPropertyChanging:

public sealed class Prioritized<T> : INotifyPropertyChanging

A Prioritized<T> object can now raise its PropertyChanging event before accepting an edit:

public byte Priority
{
    get => priority;
    set
    {
        if (PropertyChanging is { })
            PropertyChanging(
                this,
                new PriorityChangingEventArgs(value));
        priority = value;
    }
}

where PriorityChangingEventArgs is a little helper class that carries the proposed value around:

public class PriorityChangingEventArgs(byte proposal)
    : PropertyChangingEventArgs(nameof(Priority))
{
    public byte Proposal { get; } = proposal;
}

A PriorityCollection<T> object can now subscribe to that event on each of the values it keeps track of, so that it can protect the invariant against leaf node mutations.

private void Priority_PropertyChanging(object? sender, PropertyChangingEventArgs e)
{
    if (sender is Prioritized<T> p &&
        e is Prioritized<T>.PriorityChangingEventArgs pcea)
    {
        var l = this.ToList();
        l[l.IndexOf(p)] = new Prioritized<T>(p.Item, pcea.Proposal);
        AssertSumIsOneHundred(l);
    }
}

Such a solution comes with its own built-in complexity, because the PriorityCollection<T> class must be careful to subscribe to the PropertyChanging event in various different places. A new Prioritized<T> object may be added to the collection during initialization, or via the InsertItem or SetItem methods. Furthermore, the collection should make sure to unsubscribe from the event if an item is removed from the collection.

To be honest, I didn't bother to implement these extra checks, because the point is moot anyway.

Fatal flaw #

The design shown here comes with a fatal flaw. Can you tell what it is?

Since the invariant is that the priorities must always sum to exactly 100, it's impossible to add, remove, or change any items after initialization.

Or, rather, you can add new Prioritized<T> objects as long as their Priority is 0. Any other value breaks the invariant.

Likewise, the only item you can remove is one with a Priority of 0. Again, if you remove an item with any other Priority, you'd be violating the invariant.

A similar situation arises with editing an existing item. While you can change the Priority of an item, you can only 'change' it to the same value. So you can change 0 to 0, 42 to 42, or 100 to 100, but that's it.

But, I can hear you say, I'll only change 60 to 40 because I intend to add a new item with a 20 priority! In the end, the sum will be 100!

Yes, but this design doesn't know that, and you have no way of telling it.

While we may be able to rectify the situation, I consider this design so compromised that I think it better to start afresh with this realization. Thus, I'll abandon this version of PriorityCollection<T> in favour of a fresh start in the next article.

Conclusion #

While I've titled this article "A failed attempt", the actual purpose was to demonstrate how 'aggregate' requirements make it difficult to maintain class invariants.

I've seen many code bases with poor encapsulation. As far as I can tell, a major reason for that is that the usual 'small-scale' object-oriented design techniques like Guard Clauses fall short when an invariant involves the interplay of multiple objects. And in real business logic, that's the rule rather than the exception.

Not all is lost, however. In the next article, I'll develop an alternative object-oriented solution to the priority collection problem.

Next: A mutable priority collection.

Comments

Daniel Frost #

2 things.

I had a difficult time getting this to work with as a mutable type and the only two things I could come with (i spent some hours on it, it was in fact hard!) was

1. To throw an exception when the items in the collection didn't sum up to the budget. That violates the variant because you can add and remove items all you want.
2. Another try, which I didn't finish, is to add some kind of result-object that could tell about the validity of the collection and not expose the collection items before the result is valid. I haven't tried this and it doesn't resemble a collection but it could perhaps be a way to go.
I am also leaning towards a wrapper around the item type, making it immutable, so the items cannot change afterwards. Cheating ?
I tried with the events approach but it is as you put yourself not a very friendly type you end up with.

2024-06-18 11:54 UTC

Mark Seemann #

Daniel, thank you for writing. You'll be interested in the next articles in the series, then.

2024-06-18 13:55 UTC

This blog is totally free, but if you like it, please consider supporting it.

Simpler encapsulation with immutability

2024-06-12T15:33:00+00:00

A worked example.

I've noticed that many software organizations struggle with encapsulation with 'bigger' problems. It may be understandable and easily applicable to define a NaturalNumber type or ensure that a minimum value is less than a maximum value, and so on. How do you, however, guarantee invariants once the scope of the problem becomes bigger and more complex?

In this series of articles, I'll attempt to illustrate how and why this worthy design goal seems elusive, and what you can do to achieve it.

Contracts #

As usual, when I discuss encapsulation, I first need to establish what I mean by the term. It is, after all, one of the most misunderstood concepts in software development. As regular readers will know, I follow the lead of Object-Oriented Software Construction. In that perspective, encapsulation is the appropriate modelling and application of preconditions, invariants, and postconditions.

Particularly when it comes to invariants, things seem to fall apart as the problem being modelled grows in complexity. Teams eventually give up guaranteeing any invariants, leaving client developers with no recourse but defensive coding, which again leads to code duplication, bugs, and maintenance problems.

If you need a reminder, an invariant is an assertion about an object that is always true. The more invariants an object has, the better guarantees it gives, and the more you can trust it. The more you can trust it, the less defensive coding you have to write. You don't have to check if return values are null, strings empty, numbers negative, collections empty, or so on.

All together, I usually denote the collection of invariants, pre-, and postconditions as a type's contract.

For a simple example like modelling a natural number, or a range, or a user name, most people are able to produce sensible and coherent designs. Once, however, the problem becomes more complex, and the invariants involve multiple interacting values, maintaining the contract becomes harder.

Immutability to the rescue #

I'm not going to bury the lede any longer. It strikes me that mutation is a major source of complexity. It's not that hard to check a set of conditions when you create a value (or object or record). What makes it hard to maintain invariants is when objects are allowed to change. This implies that for every possible change to the object, it needs to examine its current state in order to decide whether or not it should allow the operation.

If the mutation happens on a leaf node in an object graph, the leaf may have to notify its parent, so that the parent can recheck the invariants. If the graph has cycles it becomes more complicated still, and if you want to make the problem truly formidable, try making the object thread-safe.

Making the object immutable makes most of these problems go away. You don't have to worry about thread-safety, because immutable values are automatically thread-safe; there's no state for any thread to change.

Even better, though, is that an immutable object's contract is smaller and simpler. It still has preconditions, because there are rules that govern what has to be true before you can create such an object. Furthermore, there may also be rules that stipulate what must be true before you can call a method on it.

Likewise, postconditions are still relevant. If you call a method on the object, it may give you guarantees about what it returns.

There are, however, no independent invariants.

Or rather, the invariants for an immutable object entirely coincide with its preconditions. If it was valid at creation, it remains valid.

Priority collection #

As promised, I'll work through a problem to demonstrate what I mean. I'll first showcase how mutation makes the problem hard, and then how trivial it becomes with an immutable design.

The problem is this: Design and implement a class (or just a data structure if you don't want to do Object-Oriented programming) that models a priority list (not a Priority Queue) as you sometimes run into in surveys. You know, one of these survey questions that asks you to distribute 100 points on various different options:

Option F: 30%
Option A: 25%
Option C: 25%
Option E: 20%
Option B: 0%
Option D: 0%

If you have the time, I suggest that you treat this problem as a kata. Try to do the exercise before reading the next articles in this series. You can assume the following, which is what I did.

The budget is 100. (You could make it configurable, but the problem is gnarly enough even with a hard-coded value.)
You don't need to include items with priority value 0, but you should allow it.
The sum of priorities must be exactly 100. This is the invariant.

The difficult part is that last invariant. Let me stress this requirement: At any time, the object should be in a consistent state; i.e. at any time should the sum of priorities be exactly 100. Not 101 or 99, but 100. Good luck with that.

The object should also be valid at initialization.

Of course, having read this far, you understand that all you have to do is to make the object immutable, but just for the sake of argument, try designing a mutable object with this invariant. Once you've tried your hand with that, read on.

Attempts #

There's educational value going through even failed attempts. When I thought of this example, I fairly quickly outlined in my head one approach that was unlikely to ever work, one that could work, and the nice immutable solution that trivially works.

I'll cover each in turn:

It's surprising how hard even a simple exercise like this one turns out to be, if you try to do it the object-oriented way.

In reality, business rules are much more involved than what's on display here. For only a taste of how bad it might get, read Hillel Wayne's suggestions regarding a similar kind of problem.

Conclusion #

If you've lived all your programming life with mutation as an ever-present possibility, you may not realize how much easier immutability makes everything. This includes invariants.

When you have immutable data, object graphs tend to be simpler. You can't easily define cyclic graphs (although Haskell, due to its laziness, surprisingly does enable this), and invariants essentially coincide with preconditions.

In the following articles, I'll show how mutability makes even simple invariants difficult to implement, and how immutability easily addresses the issue.

Next: A failed attempt at priority collection with inheritance.

Comments

Marken Foo #

I've been enjoying going through your articles in the past couple months, and I really like the very pedagogic treatment of functional programming and adjacent topics.

The kata here is an interesting one, but I don't think I'd link it with the concept of immutability/mutability. My immediate thought was a naïve struct that can represent illegal values and whose validity is managed through functions containing some tricky logic, but that didn't seem promising whether it was done immutably or not.

Instead, the phrase "distribute 100 points" triggered an association with the stars and bars method for similar problems. The idea is that we have N=100 points in a row, and inserting dividers to break it into (numOptions) groups. Concretely, our data structure is (dividers: int array), which is a sorted array of length (numOptions + 1) where the first element is 0 and the last element is N=100. The priorities are then exactly the differences between adjacent elements of the array. The example in the kata (A=25, B=0, C=25, D=0, E=20, F=30) is then represented by the array [| 0; 25; 25; 50; 50; 70; 100|].

This solution seems to respect the invariant, has a configurable budget, can work with other numerical types, and works well whether immutable or not (if mutable, just ensure the array remains sorted, has min 0, and max N). The invariant is encoded in the representation of the data, which seems to me to be the more relevant point than mutability.

And a somewhat disjoint thought, the kata reminded me of a WinForms TableLayoutPanel (or MS Word table) whose column widths all must fit within the container's width...

2024-06-13 13:55 UTC

Mark Seemann #

Thank you for writing. The danger of writing these article series is always that as soon as I've published the first one, someone comes by and puts a big hole through my premise. Well, I write this blog for a couple of independent reasons, and one of them is to learn.

And you just taught me something. Thank you. That is, at least, an elegant implementation.

How would you design the API encapsulating that implementation?

Clearly, arrays already have APIs, so you could obviously define an array-like API that performs the appropriate boundary checks. That, however, doesn't seem to model the given problem. Rather, it reveals the implementation, and forces a client developer to think in terms of the data structure, rather the problem (s)he has to solve.

Ideally, again channelling Bertrand Meyer, an object should present as an Abstract Data Structure (ADT) that doesn't require client developers to understand the implementation details. I'm curious what such an API would look like.

You've already surprised me once, and please do so once again. I'm always happy to learn something new, and that little stars-and-bars concept I've now added to my tool belt.

All that said, this article makes a more general claim, although its possible that the example it showcases is a tad too simple and naive to be a truly revealing one. The claim is that this kind of 'aggregate constraint' often causes so much trouble in the face of arbitrary state mutation that most programmers give up on encapsulation.

What happens if we instead expand the requirements a bit? Let's say that we will require the user to spend at least 90% of the budget, but no more than 100%. Also, there must be at least three prioritized items, and no individual item can receive more than a third of the budget.

2024-06-14 14:22 UTC

Marken Foo #

Thank you for the response. Here's my thoughts - it's a bit of a wall of text, I might be wrong in any of the following, and the conclusion may be disappointing. When you ask how I'd design the API, I'd say it depends on how the priority list is going to be used. The implementation trick with stars and bars might just be a one-off trick that happens to work here, but it doesn't (shouldn't) affect the contract with the outside world.

If we're considering survey questions or budgets, the interest is in the priority values. So I think the problem then is about a list of priorities with an aggregate constraint. So I would define... an array-like API that performs the appropriate boundary checks (wow), but for the item priorities. My approach would be to go for "private data, public functions", and rely on a legal starting state and preserving the legality through the public API. In pseudocode:

                type PriorityList = { budget: int; dividers: int list }
                create :: numItems: int -> budget: int -> PriorityList
        
                // Returns priorities.
                getAll :: plist: PriorityList -> int list
                get :: itemIdx: int -> plist: PriorityList -> int
        
                // *Sets the priority for an item (taking the priority from other items, starting from the back).
                set :: itemIdx: int -> priority: int -> plist: PriorityList -> PriorityList
        
                // *Adds a new item to (the end of) the PriorityList (with priority zero).
                addItem :: plist: PriorityList -> PriorityList
        
                // *Removes an item from the PriorityList (and assigns its priority to the last item). 
                removeItem :: itemIdx: int -> plist PriorityList -> PriorityList
        
                // Utility functions: see text
                _toPriorities :: dividers: int list -> int list
                _toDividers :: priorities: int list -> int list

Crucially: since set, addItem, and removeItem must maintain the invariants, they must have "side effects" of altering other priorities. I think this is unavoidable here because we have aggregate/global constraints, rather than just elementwise/local constraints. (Is this why resizing rows and columns in WinForms tableLayoutPanels and MS Word tables is so tedious?) This will manifest in the API - the client needs to know what "side effects" there are (suggested behaviour in parentheses in the pseudocode comments above). See my crude attempt at implementation.

You may already see where this is going. If I accept that boundary checks are needed, then my secondary goal in encapsulation is to express the constraints as clearly as possible, and hopefully not spread the checking logic all over the code.

Whence the utility functions: it turned out to be useful to convert from a list of dividers to priorities, and vice versa. This is because the elementwise operations/invariants like the individual priority values are easier to express in terms of raw priorities, while the aggregate ones like the total budget are easier in terms of "dividers" (the cumulative priorities). There is a runtime cost to the conversion, but the code becomes clearer. This smells similar to feature envy...

So why not just have the underlying implementation hold a list of priorities in the first place?! Almost everything in the implementation needs translation back to that anyway. D'oh! I refactored myself back to the naïve approach. The original representation seemed elegant, but I couldn't find a way to manipulate it that clients would find intuitive and useful in the given problem.

But... if I approach the design from the angle "what advantages does the cumulative priority model offer?", I might come up with the following candidate API functions, which could be implemented cleanly in the "divider" space:

                // (same type, create, get, getAll, addItem as above)
                // Removes the item and merges its priority with the item before it.
                merge :: ItemIdx: int -> PriorityList
                // Sets the priority of an item to zero and gives it to the item after it.
                collapse :: itemIdx: int -> PriorityList
                // Swaps the priority of an item and the one after it (e.g. to "bubble" a priority value forwards or backwards, although this is easier in the "priority" space)
                swap :: itemIdx: int -> PriorityList
                // Sets (alternative: adds to) the priority of an item, taking the priority from the items after it in sequence ("consuming" them in the forward direction)
                consume :: itemIdx: int -> priority: int -> PriorityList
                // Splits the item into 2 smaller items each with half the priority (could be generalised to n items)
                split :: ItemIdx: int -> PriorityList
                // etc.

And this seems like a more fitting API for that table column width example I keep bringing up. What's interesting to me is that despite the data structures of the budget/survey question and the table column widths being isomorphic, we can come up with rather different APIs depending on which view we consider. I think this is my main takeaway from this exploration, actually.

As for the additional requirements, individually each constraint is easy to handle, but their composition is tricky. If it's easy to transform an illegal PriorityList to make it respect the invariants, we can just apply the transformation after every create/set/add/remove. Something like:

                type PriorityList =
                    { budget: int
                      dividers: int list
                      budgetCondition: int -> bool
                      maxPriority: int
                      minChoices: int }
                
                let _enforceBudget (predicate: int -> bool) (defaultBudget: int) (dividers: int list) : int list =
                    if (List.last dividers |> predicate) then
                        dividers
                    else
                        List.take (dividers.Length - 1) dividers @ [ defaultBudget ]
        
                let _enforceMaxPriority (maxPriority: int) (dividers: int list) : int list =
                    _toPriorities dividers |> List.map (fun p -> min p maxPriority) |> _toDividers

The problem is those transforms may not preserve each others' invariant. Life would be easy if we could write a single transform to preserve everything (I haven't found one - notice that the two above are operating on different int lists so it's tricky). Otherwise, we could write validations instead of transformations, then let create/set/add/remove fail by returning Option.None (explicitly fail) or the original list (silently fail). This comes at the cost of making the API less friendly.

Ultimately with this approach I can't see a way to make all illegal states unrepresentable without sprinkling ad-hoc checks everywhere in the code. The advantages of the "cumulative priorities" representation I can think of are (a) it makes the total budget invariant obvious, and (b) it maps nicely to a UI where you click and drag segments around. Since you might have gone down a different path in the series, I'm curious to see how that shapes up.

2024-06-15 14:48 UTC

Aliaksei Saladukhin #

Hello and thank you for your blog. It is really informative and provides great food for thought.

What if it will be impossible to compile and run program which would lead to illegal (list) state?

I've tried to implement priority collection in Rust, and what I've ended up with is a heterogenous priority list with compile-time priority validation. Idea behind this implementation is simple: you declare recursive generic struct, which holds current element and tail (another list or unit type).

				struct PriorityList<const B: usize, const P: usize, H, T> {
					head: H,
					tail: T,
				}

If, for example, we need list of two Strings with budget 100, and 30/70 priority split, it will have the following type: PriorityList<100, 30, String, PriorityList<100, 70, String, ()>> Note that information about list budget and current element priority is contained in generic arguments B and P respectively. These are compile-time "variables", and will be replaced be their values in compiled program.

Since each element of such list is a list itself, and budget is the same for each element, all elements except the first are invalid priority lists. So, in order to make it possible to create lists other than containing one element, or only one element with >0 priority, validity check should be targeted and deferred. In order to target invariant validation on the first element of the list, I've included validation into list methods (except set_priority method). Every time list method is called, compiler does recursive computation of priority sum, and compares it with list budget, giving compile-time error if there is mismatch. Consider the following example, which will compile and run:

				let list = ListBuilder::new::<10, 10>("Hello");
				let list = list.set_priority::<5>();

Seems like invariants have been violated and sum of priorities is less than the budget. But if we try to manipulate this list in any other way except to add element or change priority, program won't compile

				// Won't compile
				let _ = list.pop();

				// Won't compile
				let list = list.push::<4>("Hi");

				// Will do
				let list = list.push::<5>("Hello there");

This implementation may not be as practical as it could be due to verbose compilation error messages, but is a good showcase and exercise I've also uploaded full source code at GitLab: https://gitlab.com/studiedlist/priority-collection

2024-06-18 08:47 UTC

Mark Seemann #

Marken, thank you for writing. It's always interesting to learn new techniques, and, as I previously mentioned, the array-based implementation certainly seems to make illegal states unrepresentable. And then, as we'll see in the last (yet unpublished) article in this little series, if we also make the data structure immutable, we'll have a truly simple and easy-to-understand API to work with.

I've tried experimenting with the F# script you linked, but I must admit that I'm having trouble understanding how to use it. You did write that it was a crude attempt, so I'm not complaining, but on the other hand, it doesn't work well as an example of good encapsulation. The following may seem as though I'm moving the goalpost, so apologies for that in advance.

Usually, when I consult development organizations about software architecture, the failure to maintain invariants is so fundamental that I usually have to start with that problem. That's the reason that this article series is so narrow-mindedly focused on contract, and seemingly not much else. We must not, though, lose sight of what ultimately motivates us to consider encapsulation beneficial. This is what I've tried to outline in Code That Fits in Your Head: That the human brain is ill-suited to keep all implementation details in mind at the same time. One way we may attempt to address this problem is to hide implementation details behind an API which, additionally, comes with some guarantees. Thus (and this is where you may, reasonably, accuse me of moving the goal post), not only should an object fulfil its contract, it should also be possible to interact with its API without understanding implementation details.

The API you propose seem to have problems, some of which may be rectifiable:

At a fundamental level, it's not really clear to me how to use the various functions in the script file.
The API doesn't keep track of what is being prioritized. This could probably be fixed.
It's not clear whether it's possible to transition from one arbitrary valid distribution to another arbitrary valid distribution.

I'll briefly expand on each.

As an example of the API being less that clear to me, I can't say that I understand what's going on here:

> create 1 100 |> set 1 50 |> addItem |> set 1 30;;
val it: PriorityList = { budget = 100
                         dividers = [0; 50; 100] }

As for what's being prioritized, you could probably mend that shortcoming by letting the array be an array of tuples.

The last part I'm not sure of, but you write:

"Crucially: since set, addItem, and removeItem must maintain the invariants, they must have "side effects" of altering other priorities."

As the most recent article in this series demonstrates, this isn't an overall limitation imposed by the invariant, but rather by your chosen API design. Specifically, assuming that you initially have a 23, 31, 46 distribution, how do you transition to a 19, 29, 43, 7, 2 distribution?

2024-06-27 6:42 UTC

Mark Seemann #

Aliaksei, thank you for writing. I've never programmed in Rust, so I didn't know it had that capability. At first I though it was dependent typing, but after reading up on it, it seems as though it's not quite that.

An exercise like the one in this article series is useful because it can help shed light on options and their various combinations of benefits and drawbacks. Thus, there are no entirely right or wrong solutions to such an exercise.

Since I don't know Rust, I can't easily distinguish what might be possible drawbacks here. I usually regard making illegal states unrepresentable as a benefit, but we must always be careful not to go too far in that direction. One thing is to reject invalid states, but can we still represent all valid states? What if priority distributions are run-time values?

2024-06-28 7:21 UTC

This blog is totally free, but if you like it, please consider supporting it.

You'll regret using natural keys

2024-06-03T19:46:00+00:00

Beating another dead horse.

Although I live in Copenhagen and mostly walk or ride my bicycle in order to get around town, I do own an old car for getting around the rest of the country. In Denmark, cars go through mandatory official inspection every other year, and I've been through a few of these in my life. A few years ago, the mechanic doing the inspection informed me that my car's chassis number was incorrect.

This did make me a bit nervous, because I'd bought the car used, and I was suddenly concerned that things weren't really as I thought. Had I unwittingly bought a stolen car?

But the mechanic just walked over to his computer in order to correct the error. That's when a different kind of unease hit me. When you've programmed for some decades, you learn to foresee various typical failure modes. Since a chassis number is an obvious candidate for a natural key, I already predicted that changing the number would prove to be either impossible, or have all sorts of cascading effects, ultimately terminating in official records no longer recognizing that the car is mine.

As it turned out, though, whoever made that piece of software knew what they were doing, because the mechanic just changed the chassis number, and that was that. This is now five or six years ago, and I still own the same car, and I've never had any problems with the official ownership records.

Uniqueness #

The reason I related this story is that I'm currently following an undergraduate course in databases and information systems. Since this course is aimed at students with no real-world experience, it wisely moves forward in a pedagogical progression. In order to teach database keys, it starts with natural keys. From a didactic perspective, this makes sense, but the result, so far, is that the young people I work with now propose database designs with natural keys.

I'm not blaming anyone. You have to learn to crawl before you can walk.

Still, this situation made me reflect on the following question: Are natural keys ever a good idea?

Let's consider an example. For a little project we're doing, we've created a database of the World's 50 best restaurants. My fellow students suggest a table design like this:

CREATE TABLE Restaurants (
    year TEXT NOT NULL,
    rank TEXT NOT NULL,
    restaurantName TEXT NOT NULL,
    cityName TEXT NOT NULL
);

Granted, at this point, this table definition defines no key at all. I'm not complaining about that. After all, a month ago, the students probably hadn't seen a database table.

From following the course curriculum, it'd be natural, however, to define a key for the Restaurants table as the combination of restaurantName, cityName, and year. The assumption is that name and city uniquely identifies a restaurant.

In this particular example, this assumption may actually turn out to hold. So far. After all, the data set isn't that big, and it's important for restaurants in that league to have recognizable names. If I had to guess, I'd say that there's probably only one Nobelhart & Schmutzig in the world.

Still, a good software architect should challenge the underlying assumptions. Is name and city a natural key? It's easy to imagine that it's not. What if we expand the key to include the country as well? Okay, but what if we had a restaurant named China Wok in Springfield, USA? Hardly unique. Add the state, you say? Probably still not unique.

Identity #

Ensuring uniqueness is only the first of many problems with natural keys. You may quickly reach the conclusion that for a restaurant database, a synthetic key is probably the best choice.

But what about 'natural' natural keys, so to speak? An example may be a car's chassis number. This is already an opaque number, and it probably originates from a database somewhere. Or how about a personal identification number? In Denmark we have the CPR number, and I understand that the US Social Security Number is vaguely analogous.

If you're designing a database that already includes such a personal identification number, you might be tempted to use it as a natural key. After all, it's already a key somewhere else, so it's guaranteed to be unique, right?

Yes, the number may uniquely identify a person, but the converse may not be true. A person may have more than one identification number. At least when time is a factor.

As an example, for technical-historical reasons, the Danish CPR number carries information (which keys shouldn't do), such as a person's date of birth and sex. Since 2014 a new law enables transsexual citizens to get a new CPR number that reflects their perceived gender. The consequence is that the same person may have more than one CPR number. Perhaps not more than one at the same time, but definitely two during a lifetime.

Even if existing keys are guaranteed to be unique, you can't assume that the uniqueness gives rise to a bijection. If you use an external unique key, you may lose track of the entities that you're trying to keep track of.

This is true not only for people, but cars, bicycles (which also have chassis numbers), network cards, etc.

Clerical errors #

Finally, even if you've found a natural key that is guaranteed to be unique and track the actual entity that you want to keep track of, there's a final argument against using an externally defined key in your system: Data-entry errors.

Take the story about my car's chassis number. The mechanic who spotted the discrepancy clearly interpreted it as a clerical error.

After a few decades of programming, I've learned that sooner or later, there will be errors in your data. Either it's a clerical error, or the end-user mistyped, or there was a data conversion error when importing from an external system. Or even data conversion errors within the same system, as it goes through upgrades and migrations.

Your system should be designed to allow corrections to data. This includes corrections of external keys, such as chassis numbers, government IDs, etc. This means that you can't use such keys as database keys in your own system.

Heuristic #

Many were the times, earlier in my career, when I decided to use a 'natural key' as a key in my own database. As far as I recall, I've regretted it every single time.

These days I follow a hard heuristic: Always use synthetic keys for database tables.

Conclusion #

Is it ever a good idea to use natural keys in a database design? My experience tells me that it's not. Ultimately, regardless of how certain you can be that the natural key is stable and correctly tracks the entity that it's supposed to keep track of, data errors will occur. This includes errors in those natural keys.

You should be able to correct such errors without losing track of the involved entities. You'll regret using natural keys. Use synthetic keys.

Comments

James Snape #

There are lots of different types of keys. I agree that using natural keys as physical primary keys is a bad idea but you really should be modelling your data logically with natural keys. Thinking about uniqueness and identity is a part of your data design. Natural keys often end up as constraints, indexes and query plans. When natural keys are not unique enough then you need to consider additional attributes in your design to ensure access to a specific record.

Considering natural keys during design can help elicit additional requirements and business rules. "Does a social security number uniquely identify a person? If not why?" In the UK they recycle them so the natural key is a combination of national insurance number and birth year. You have to ask questions.

2024-06-04 15:43 UTC

Thomas Castiglione #

2024-06-05 9:33 UTC

Nicholas Peterson #

I largely agree with James Snape, but wanted to throw in a few other thoughts on top. Surrogates don't defend you from duplicate data, in fact they facilitate it, because the routine generating the surrogate key isn't influenced by any of the other data in the record. The concept of being unable to correct a natural key is also odd, why can't you? Start a transaction, insert a new record with the correct key, update the related records to point to the new record, then delete the old record, done. Want some crucial information about a related record but only have the surrogate to it? I guess you have to join it every time in order to get the columns the user actually wants to see. A foreign key that uses a natural key often often prevents the join entirely, because it tells the user what they wanted to know.

I find the problem with natural keys usually comes from another source entirely. Developers write code and don't tend to prefer using SQL. They typically interact with databases through ORM libraries. ORMs are complicated and rely on conventions to uniformly deal with data. It's not uncommon for ORMs to dictate the structure of tables to some degree, or what datatypes to prefer. It's usually easier in an ORM to have a single datatype for keys (BIGINT?) and use it uniformly across all the tables.

2024-06-05 12:42 UTC

Mark Seemann #

James, Nicholas, thank you for writing. I realize that there are some unstated assumptions and implied concerns that I should have made more explicit. I certainly have no problem with adding constraints and other rules to model data. For the Danish CPR number, for example, while I wouldn't make it a primary key (for the reasons outlined in the article), I'd definitely put a UNIQUE constraint on it.

Another unspoken context that I had in mind is that systems often exist in a wider context where ACID guarantees fall apart. I suppose it's true that if you look at a database in isolation, you may be able to update a foreign key with the help of some cascading changes rippling through the database, but if you've ever shared the old key outside of the database, you now have orphaned data.

A simple example could be sending out an email with a link that embeds the old key. If you change the key after sending out the email, but before the user clicks, the link no longer works.

That's just a simple and easy-to-explain example. The more integration (particularly system-to-system integration) you have, the worse this kind of problem becomes. I briefly discussed the CPR number example with my doctor wife, and she immediately confirmed that this is a real problem in the Danish health sector, where many independent software systems need to exchange patient data.

You can probably work around such problems in various ways, but if you had avoided using natural keys, you wouldn't have had to change the key in the first place.

2024-06-06 6:56 UTC

Greg Hall #

I think it is best to have two separate generated keys for each row:

A key used only for relationships between tables. I like to call this relid, and make it serialised, so it is just an increasing number. This key is the primary key and should never be exposed outside the database.
A key used only outside the database as a unique reference to which row to update. I like to call this id, and make it a uuid, since it is well accepted to uniquely identify rows by a uuid, and to expose them to the outside world - many public APIs do this. Theoretically, the same uuid should never be generated twice, so this key doesn't necessarily have to be declared as unique.

The relid can be used in simple foreign keys, and in bridging/join tables - tables that contain primary keys of multiple tables. Generally speaking, the relid is far more readable than a uuid - it is easier to hold in your head a simple integer, which usually is not that large, than a 36 character sequence that looks similar to other 36 character sequences. UUIDs generally look like a jumble.

A relid can be 32-bits for tables you're confident will never need more than 2.1 billion rows, which really is 99.99% of all tables ever created by 99.99% of applications. If this turns out to be wrong, it is possible to upgrade the relids to 64-bit for a given table. It's a bit of a pain, especially if there are lots of references to it, but it can be done.

The relid doesn't always have to be a serialised value, and you don't always have to call the column relid. Since the primary key is never exposed publicly, it doesn't matter if different column types or names are used for different use cases. For example, code tables might use one of the codes as the primary key.

I don't think it makes sense to be religious on key usage; just like everything else, there are valid reasons for periodically varying how they work. I'm sure somebody has a valid case where a single key is better than two. I just think it generally makes sense to have a pair of internal and external keys for most cases.

2024-06-07 3:31 UTC

James Snape #

The thing with databases keys is you really need to be precise on what you mean by a key. Any combination of attributes is a candidate key. There are also logical and physical representations of keys. For example, a SQL Server primary key is a physical record locator but logically a unique key constraint. Yes, these behave poorly when you use natural keys as the primary key for all the reasons you mention. They are a complete implementation detail. Users should never see these attributes though and you shouldn't share the values outside of your implementation. Sharing integer surrogate keys in urls is a classic issue allowing enumeration attacks on your data if not secured properly.

Foreign keys are another logical and physical dual use concept. In SQL Server a physical foreign key constrain must reference the primary key from a parent table but logically that doesn't need to happen for relational theory to work.

Alternate keys are combinations of attributes that identify a record (or many records); these are often the natural keys you use in your user interface and where clauses etc. Alternate keys are also how systems communicate. Take your CPR number example, you cannot exchange patient data unless both systems agree on a common key. This can't be an internally generated surrogate value.

Natural keys also serve another purpose in parent-child relationships. By sharing natural key attributes with a parent you can ensure a child is not accidentally moved to a new parent plus you can query a child table without needing to join to the parent table.

There isn't a one-size-fits all when it comes to databases and keys. Joe Celko has written extensively on the subject so maybe its better to read the following than my small commentary:

2024-06-07 09:57 UTC

Mark Seemann #

Greg, thank you for writing. I agree with everything you wrote, and I've been using that kind of design for... wow, at least a decade, it looks! for a slightly different reason. This kind of design seems, even if motivated by a different concern, congruent with what you describe.

Like you also imply, only a sith speaks in absolutes. The irony of the article is that I originally intended it to be more open-ended, in the sense that I was curious if there were genuinely good reasons to use natural keys. As I wrote, the article turned out more unconditional than I originally had in mind.

I am, in reality, quite ready to consider arguments to the contrary. But really, I was curious: Is it ever a good idea to use natural keys as primary keys? It sounds like a rhetorical question, but I don't mind if someone furnishes a counter-example.

As Nicholas Peterson intimated, it's probably not a real problem if those keys never 'leave' the database. What I failed to make explicit in this article is that the problems I've consistently run into occur when a system has shared keys with external systems or users.

2024-06-14 11:26 UTC

Mark Seemann #

James, thank you for writing. I think we're discussing issues at different levels of abstraction. This just underscores how difficult technical writing is. I should have made my context and assumptions more explicit. The error is mine.

Everything you write sounds correct to me. I am aware of both relational calculus and relational algebra, so I'm familiar with the claims you make, and I don't dispute them.

My focus is rather on systems architecture. Even an 'internal' system may actually be composed from multiple independent systems, and my concern is that using natural keys to exchange data between such systems ultimately turns out to make things more difficult than they could have been. The only statement of yours with which I think I disagree is that you can't exchange data between systems unless you use natural keys. You definitely can, although you need to appoint one of the systems to be a 'master key issuer'.

In practice, like Greg Hall, I'd prefer using GUIDs for that purpose, rather than sequential numbers. That also addresses the concern about enumeration attacks. (Somewhat tangentially, I also recommend signing URLs with a private key in order to prevent reverse-engineering, or 'URL-hacking'.)

2024-06-14 11:55 UTC

James Snape #

I think we are basically agreeing here because I would never use natural keys nor externally visible synthetic keys for physical primary keys. (I think this statement is even more restrictive than the article's main premise). Well, with a rule exception for configurable enum type tables because the overhead of joining to resolve a single column value is inefficient. I would however always use a natural key for a logical primary key.

The only reason why I'm slightly pedantic about this is due the the number of clients why have used surrogate keys in a logical model and then gone on to create databases where the concept of entity identity doesn't exist. This creates many of the issues Nicholas Peterson mentioned above: duplicates, historical change tracking, etc. Frankly, it doesn't help that lots of code examples for ORMs just start with an entity that has an ID attribute.

One final comment on sharing data based on a golden master synthetic key. The moment you do I would argue that you have now committed to maintaining that key through all types of data mergers and acquisitions. It must never collide, and always point to exactly the same record and only that record. Since users can use it to refer to an entity and it makes up part of your external API, it now meets the definition of a natural key. Whether you agree or not on my stretching the definition a little, you still should not use this attribute as the physical primary key (record locator) because we should not expose implementation details in our APIs. The first Celko article I linked to explains some of the difficulties for externally visible synthetic keys.

2024-06-14 13:45 UTC

Julius H #

I'd like to comment with an example where using a synthetic key came back to bite me. My system had posts and users with synthetic IDs. Now I wanted to track an unread state across them. Naively, I designed just another entity:

           
public int ID { get; set; }
public int PostID { get; set; }
public int UserID { get; set; }

And it worked flawlessly for years. One day, however, a user complained that he always got an exception "Sequence contains more than one element". Of course I used SingleOrDefault() in application code because I expected 0 or 1 record per user and post. The quick solution was deleting the spurious table row. As a permanant solution I removed the ID field (and column) so the unread state had its natural key as primary key (both columns). So if it happens again in the future, the app will error on insertin rather than querying.

Since my application is in control of the IDs and it's just a very simple join table I think it was the best solution. If the future requirements hold different kinds of unread state, I can always add the key again.

2024-07-22 14:40 UTC

Mark Seemann #

Julius, thank you for writing. I see what you mean, and would also tend to model this as just a table with two foreign keys. From the perspective of entity-relationship modelling, such a table isn't even an entity, but rather a relationship. For that reason, it doesn't need its own key; not because the combination is 'natural', but rather because it's not really an independent 'thing'.

2024-07-29 14:39 UTC

This blog is totally free, but if you like it, please consider supporting it.

Continuous delivery without a CI server

2024-05-27T13:34:00+00:00

An illustrative example.

More than a decade ago, I worked on a small project. It was a small single-page application (SPA) with a REST API backend, deployed to Azure. As far as I recall, the REST API used blob storage, so all in all it wasn't a complex system.

We were two developers, and although we wanted to do continuous delivery (CD), we didn't have much development infrastructure. This was a little startup, and back then, there weren't a lot of free build services available. We were using GitHub, but it was before it had any free services to compile your code and run tests.

Given those constraints, we figured out a simple way to do CD, even though we didn't have a continuous integration (CI) server.

I'll tell you how we did this.

Shining an extraordinary light on the mundane #

The reason I'm relating this little story isn't to convince you that you, too, should do it that way. Rather, it's a didactic device. By doing something extreme, we can sometimes learn about the ordinary.

You can only be pragmatic if you know how to be dogmatic.

What to test and not to test, me

From what I hear and read, it seems that there's a lot of organizations that believe that they're doing CI (or perhaps even CD) because they have a CI server. What the following tale will hopefully highlight is that, while build servers are useful, they aren't a requirement for CI or CD.

Distributed CD #

Dramatis personae: My colleague and me. Scene: One small SPA project with a REST API and blob storage, to be deployed to Azure. Code base in GitHub. Two laptops. Remote work.

One of us (let's say me) would start on implementing a feature, or fixing a bug. I'd use test-driven development (TDD) to get feedback on API ideas, as well as to accumulate a suite of regression tests. After a few hours of effective work, I'd send a pull request to my colleague.

Since we were only two people on the team, the responsibility was clear. It was the other person's job to review the pull request. It was also clear that the longer the reviewer dawdled, the less efficient the process would be. For that reason, we'd typically have agile pull requests with a good turnaround time.

While we were taking advantage of GitHub as a central coordination hub for pull requests, Git itself is famously distributed. Thus, we wondered whether it'd be possible to make the CD process distributed as well.

Yes, apart from GitHub, what we did was already distributed.

A little more automation #

Since we were both doing TDD, we already had automated tests. Due to the simple setup of the system, we'd already automated more than 80% of our process. It wasn't much of a stretch to automate whatever else needed automation. Such as deployment.

We agreed on a few simple rules:

Every part of our process should be automated.
Reviewing a pull request included running all tests.

When people review pull requests, they often just go to GitHub and look around before issuing an LGTM.

But, you do realize that this is Git, right? You can pull down the proposed changes and run them.

What if you're already in the middle of something, working on the same code base? Stash your changes and pull down the code.

The consequence of this process was that every time a pull request was accepted, we already knew that it passed all automated tests on two physical machines. We actually didn't need a server to run the tests a third time.

After a merge, the final part of the development process mandated that the original author should deploy to production. We had Bash script that did that.

Simplicity #

This process came with some built-in advantages. First of all, it was simple. There wasn't a lot of moving parts, so there weren't many steps that could break.

Have you ever had the pleasure of troubleshooting a build? The code works on your machine, but not on the build server.

It sometimes turns out that there's a configuration mismatch with the compiler or test tools. Thus, the problem with the build server doesn't mean that you prevented a dangerous defect from being deployed to production. No, the code just didn't compile on the build server, but would actually have run fine on the production system.

It's much easier troubleshooting issues on your own machine than on some remote server.

I've also seen build servers that were set up to run tests, but along the way, something had failed and the tests didn't run. And no-one was looking at logs or warning emails from the build system because that system would already be sending hundreds of warnings a day.

By agreeing to manually(!) run the automated tests as part of the review process, we were sure that they were exercised.

Finally, by keeping the process simple, we could focus on what mattered: Delivering value to our customer. We didn't have to waste time learning how a proprietary build system worked.

Does it scale? #

I know what you're going to say: This may have worked because the overall requirements were so simple. This will never work in a 'real' development organization, with a 'real' code base.

I understand. I never claimed that it would.

The point of this story is to highlight what CI and CD is. It's a way of working where you continuously integrate your code with everyone else's code, and where you continuously deploy changes to production.

In reality, having a dedicated build system for that can be useful. These days, such systems tend to be services that integrate with GitHub or other sites, rather than an actual server that you have to care for. Even so, having such a system doesn't mean that your organization makes use of CI or CD.

(Oh, and for the mathematically inclined: In this context continuous doesn't mean actually continuous. It just means arbitrarily often.)

Conclusion #

CI and CD are processes that describe how we work with code, and how we work together.

Continuous integration means that you often integrate your code with everyone else's code. How often? More than once a day.

Continuous deployment means that you often deploy code changes to production. How often? Every time new code is integrated.

A build system can be convenient to help along such processes, but it's strictly speaking not required.

This blog is totally free, but if you like it, please consider supporting it.

Fundamentals

2024-05-20T07:04:00+00:00

How to stay current with technology progress.

A long time ago, I landed my dream job. My new employer was a consulting company, and my role was to be the resident Azure expert. Cloud computing was still in its infancy, and there was a good chance that I might be able to establish myself as a leading regional authority on the topic.

As part of the role, I was supposed to write articles and give presentations showing how to solve various problems with Azure. I dug in with fervour, writing sample code bases and even an MSDN Magazine article. To my surprise, after half a year I realized that I was bored.

At that time I'd already spent more than a decade learning new technology, and I knew that I was good at it. For instance, I worked five years for Microsoft Consulting Services, and a dirty little secret of that kind of role is that, although you're sold as an expert in some new technology, you're often only a few weeks ahead of your customer. For example, I was once engaged as a Windows Workflow Foundation expert at a time when it was still in beta. No-one had years of experience with that technology, but I was still expected to know much more about it than my customer.

I had lots of engagements like that, and they usually went well. I've always been good at cramming, and as a consultant you're also unencumbered by all the daily responsibilities and politics that often occupy the time and energy of regular employees. The point being that while I'm decent at learning new stuff, the role of being a consultant also facilitates that sort of activity.

After more then a decade of learning new frameworks, new software libraries, new programming languages, new tools, new online services, it turned out that I was ready for something else. After spending a few months learning Azure, I realized that I'd lost interest in that kind of learning. When investigating a new Azure SDK, I'd quickly come to the conclusion that, oh, this is just another object-oriented library. There are these objects, and you call this method to do that, etc. That's not to say that learning a specific technology is a trivial undertaking. The worse the design, the more difficult it is to learn.

Still, after years of learning new technologies, I'd started recognizing certain patterns. Perhaps, I thought, well-designed technologies are based on some fundamental ideas that may be worth learning instead.

Staying current #

A common lament among software developers is that the pace of technology is so overwhelming that they can't keep up. This is true. You can't keep up.

There will always be something that you don't know. In fact, most things you don't know. This isn't a condition isolated only to technology. The sum total of all human knowledge is so vast that you can't know it all. What you will learn, even after a lifetime of diligent study, will be a nanoscopic fraction of all human knowledge - even of everything related to software development. You can't stay current. Get used to it.

A more appropriate question is: How do I keep my skill set relevant?

Assuming that you wish to stay employable in some capacity, it's natural to be concerned with how your mad Flash skillz will land you the next gig.

Trying to keep abreast of all new technologies in your field is likely to lead to burnout. Rather, put yourself in a position so that you can quickly learn necessary skills, just in time.

Study fundamentals, rather than specifics #

Those many years ago, I realized that it'd be a better investment of my time to study fundamentals. Often, once you have some foundational knowledge, you can apply it in many circumstances. Your general knowledge will enable you to get quickly up to speed with specific technologies.

Success isn't guaranteed, but knowing fundamentals increases your chances.

This may still seem too abstract. Which fundamentals should you learn?

In the remainder of this article, I'll give you some examples. The following collection of general programmer knowledge spans software engineering, computer science, broad ideas, but also specific tools. I only intend this set of examples to serve as inspiration. The list isn't complete, nor does it constitute a minimum of what you should learn.

If you have other interests, you may put together your own research programme. What follows here are just some examples of fundamentals that I've found useful during my career.

A criterion, however, for constituting foundational knowledge is that you should be able to apply that knowledge in a wide variety of contexts. The fundamental should not be tied to a particular programming language, platform, or operating system.

Design patterns #

Perhaps the first foundational notion that I personally encountered was that of design patterns. As the Gang of Four (GoF) wrote in the book, a design pattern is an abstract description of a solution that has been observed 'in the wild', more than once, independently evolved.

Please pay attention to the causality. A design pattern isn't prescriptive, but descriptive. It's an observation that a particular code organisation tends to solve a particular problem.

There are lots of misconceptions related to design patterns. One of them is that the 'library of patterns' is finite, and more or less constrained to the patterns included in the original book.

There are, however, many more patterns. To illustrate how much wider this area is, here's a list of some patterns books in my personal library:

In addition to these, there are many more books in my library that are patterns-adjacent, including one of my own. The point is that software design patterns is a vast topic, and it pays to know at least the most important ones.

A design pattern fits the criterion that you can apply the knowledge independently of technology. The original GoF book has examples in C++ and Smalltalk, but I've found that they apply well to C#. Other people employ them in their Java code.

Knowing design patterns not only helps you design solutions. That knowledge also enables you to recognize patterns in existing libraries and frameworks. It's this fundamental knowledge that makes it easier to learn new technologies.

Often (although not always) successful software libraries and frameworks tend to follow known patterns, so if you're aware of these patterns, it becomes easier to learn such technologies. Again, be aware of the causality involved. I'm not claiming that successful libraries are explicitly designed according to published design patterns. Rather, some libraries become successful because they offer good solutions to certain problems. It's not surprising if such a good solution falls into a pattern that other people have already observed and recorded. It's like parallel evolution.

This was my experience when I started to learn the details of Azure. Many of those SDKs and APIs manifested various design patterns, and once I'd recognized a pattern it became much easier to learn the rest.

The idea of design patterns, particularly object-oriented design patterns, have its detractors, too. Let's visit that as the next set of fundamental ideas.

Functional programming abstractions #

As I'm writing this, yet another Twitter thread pokes fun at object-oriented design (OOD) patterns as being nothing but a published collection of workarounds for the shortcomings of object orientation. The people who most zealously pursue that agenda tends to be functional programmers.

Well, I certainly like functional programming (FP) better than OOD too, but rather than poking fun at OOD, I'm more interested in how design patterns relate to universal abstractions. I also believe that FP has shortcomings of its own, but I'll have more to say about that in a future article.

Should you learn about monoids, functors, monads, catamorphisms, and so on?

Yes you should, because these ideas also fit the criterion that the knowledge is technology-independent. I've used my knowledge of these topics in Haskell (hardly surprising) and F#, but also in C# and Python. The various LINQ methods are really just well-known APIs associated with, you guessed it, functors, monads, monoids, and catamorphisms.

Once you've learned these fundamental ideas, it becomes easier to learn new technologies. This has happened to me multiple times, for example in contexts as diverse as property-based testing and asynchronous message-passing architectures. Once I realize that an API gives rise to a monad, say, I know that certain functions must be available. I also know how I should best compose larger code blocks from smaller ones.

Must you know all of these concepts before learning, say, F#? No, not at all. Rather, a language like F# is a great vehicle for learning such fundamentals. There's a first time for learning anything, and you need to start somewhere. Rather, the point is that once you know these concepts, it becomes easier to learn the next thing.

If, for example, you already know what a monad is when learning F#, picking up the idea behind computation expressions is easy once you realize that it's just a compiler-specific way to enable syntactic sugaring of monadic expressions. You can learn how computation expressions work without that knowledge, too; it's just harder.

This is a recurring theme with many of these examples. You can learn a particular technology without knowing the fundamentals, but you'll have to put in more time to do that.

On to the next example.

SQL #

Which object-relational mapper (ORM) should you learn? Hibernate? Entity Framework?

How about learning SQL? I learned SQL in 1999, I believe, and it's served me well ever since. I consider raw SQL to be more productive than using an ORM. Once more, SQL is largely technology-independent. While each database typically has its own SQL dialect, the fundamentals are the same. I'm most well-versed in the SQL Server dialect, but I've also used my SQL knowledge to interact with Oracle and PostgreSQL. Once you know one SQL dialect, you can quickly solve data problems in one of the other dialects.

It doesn't matter much whether you're interacting with a database from .NET, Haskell, Python, Ruby, or another language. SQL is not only universal, the core of the language is stable. What I learned in 1999 is still useful today. Can you say the same about your current ORM?

Most programmers prefer learning the newest, most cutting-edge technology, but that's a risky gamble. Once upon a time Silverlight was a cutting-edge technology, and more than one of my contemporaries went all-in on it.

On the contrary, most programmers find old stuff boring. It turns out, though, that it may be worthwhile learning some old technologies like SQL. Be aware of the Lindy effect. If it's been around for a long time, it's likely to still be around for a long time. This is true for the next example as well.

HTTP #

The HTTP protocol has been around since 1991. It's an effectively text-based protocol, and you can easily engage with a web server on a near-protocol level. This is true for other older protocols as well.

In my first IT job in the late 1990s, one of my tasks was to set up and maintain Exchange Servers. It was also my responsibility to make sure that email could flow not only within the organization, but that we could exchange email with the rest of the internet. In order to test my mail servers, I would often just telnet into them on port 25 and type in the correct, text-based instructions to send a test email.

Granted, it's not that easy to telnet into a modern web server on port 80, but a ubiquitous tool like curl accomplishes the same goal. I recently wrote how knowing curl is better than knowing Postman. While this wasn't meant as an attack on Postman specifically, neither was it meant as a facile claim that curl is the only tool useful for ad-hoc interaction with HTTP-based APIs. Sometimes you only realize an underlying truth when you write about a thing and then other people find fault with your argument. The underlying truth, I think, is that it pays to understand HTTP and being able to engage with an HTTP-based web service at that level of abstraction.

Preferably in an automatable way.

Shells and scripting #

The reason I favour curl over other tools to interact with HTTP is that I already spend quite a bit of time at the command line. I typically have a little handful of terminal windows open on my laptop. If I need to test an HTTP server, curl is already available.

Many years ago, an employer introduced me to Git. Back then, there were no good graphical tools to interact with Git, so I had to learn to use it from the command line. I'm eternally grateful that it turned out that way. I still use Git from the command line.

When you install Git, by default you also install Git Bash. Since I was already using that shell to interact with Git, it began to dawn on me that it's a full-fledged shell, and that I could do all sorts of other things with it. It also struck me that learning Bash would be a better investment of my time than learning PowerShell. At the time, there was no indication that PowerShell would ever be relevant outside of Windows, while Bash was already available on most systems. Even today, knowing Bash strikes me as more useful than knowing PowerShell.

It's not that I do much Bash-scripting, but I could. Since I'm a programmer, if I need to automate something, I naturally reach for something more robust than shell scripting. Still, it gives me confidence to know that, since I already know Bash, Git, curl, etc., I could automate some tasks if I needed to.

Many a reader will probably complain that the Git CLI has horrible developer experience, but I will, again, postulate that it's not that bad. It helps if you understand some fundamentals.

Algorithms and data structures #

Git really isn't that difficult to understand once you realize that a Git repository is just a directed acyclic graph (DAG), and that branches are just labels that point to nodes in the graph. There are basic data structures that it's just useful to know. DAGs, trees, graphs in general, adjacency lists or adjacency matrices.

Knowing that such data structures exist is, however, not that useful if you don't know what you can do with them. If you have a graph, you can find a minimum spanning tree or a shortest-path tree, which sometimes turn out to be useful. Adjacency lists or matrices give you ways to represent graphs in code, which is why they are useful.

Contrary to certain infamous interview practices, you don't need to know these algorithms by heart. It's usually enough to know that they exist. I can't remember Dijkstra's algorithm off the top of my head, but if I encounter a problem where I need to find the shortest path, I can look it up.

Or, if presented with the problem of constructing current state from an Event Store, you may realize that it's just a left fold over a linked list. (This isn't my own realization; I first heard it from Greg Young in 2011.)

Now we're back at one of the first examples, that of FP knowledge. A list fold is its catamorphism. Again, these things are much easier to learn if you already know some fundamentals.

What to learn #

These examples may seems overwhelming. Do you really need to know all of that before things become easier?

No, that's not the point. I didn't start out knowing all these things, and some of them, I'm still not very good at. The point is rather that if you're wondering how to invest your limited time so that you can remain up to date, consider pursuing general-purpose knowledge rather than learning a specific technology.

Of course, if your employer asks you to use a particular library or programming language, you need to study that, if you're not already good at it. If, on the other hand, you decide to better yourself, you can choose what to learn next.

Ultimately, if your're learning for your own sake, the most important criterion may be: Choose something that interests you. If no-one forces you to study, it's too easy to give up if you lose interest.

If, however, you have the choice between learning Noun.js or design patterns, may I suggest the latter?

For life #

When are you done, you ask?

Never. There's more stuff than you can learn in a lifetime. I've met a lot of programmers who finally give up on the grind to keep up, and instead become managers.

As if there's nothing to learn when you're a manager. I'm fortunate that, before I went solo, I mainly had good managers. I'm under no illusion that they automatically became good managers. All I've heard said about management is that there's a lot to learn in that field, too. Really, it'd be surprising if that wasn't the case.

I can understand, however, how just keep learning the next library, the next framework, the next tool becomes tiring. As I've already outlined, I hit that wall more than a decade ago.

On the other hand, there are so many wonderful fundamentals that you can learn. You can do self-study, or you can enrol in a more formal programme if you have the opportunity. I'm currently following a course on compiler design. It's not that I expect to pivot to writing compilers for the rest of my career, but rather,

"It is considered a topic that you should know in order to be "well-cultured" in computer science.

"A good craftsman should know his tools, and compilers are important tools for programmers and computer scientists.

"The techniques used for constructing a compiler are useful for other purposes as well.

"There is a good chance that a programmer or computer scientist will need to write a compiler or interpreter for a domain-specific language."

Introduction to Compiler Design (from the introduction), Torben Ægidius Mogensen

That's good enough for me, and so far, I'm enjoying the course (although it's also hard work).

You may not find this particular topic interesting, but then hopefully you can find something else that you fancy. 3D rendering? Machine learning? Distributed systems architecture?

Conclusion #

Technology moves at a pace with which it's impossible to keep up. It's not just you who's falling behind. Everyone is. Even the best-paid GAMMA programmer knows next to nothing of all there is to know in the field. They may have superior skills in certain areas, but there will be so much other stuff that they don't know.

You may think of me as a thought leader if you will. If nothing else, I tend to be a prolific writer. Perhaps you even think I'm a good programmer. I should hope so. Who fancies themselves bad at something?

You should, however, have seen me struggle with C programming during a course on computer systems programming. There's a thing I'm happy if I never have to revisit.

You can't know it all. You can't keep up. But you can focus on learning the fundamentals. That tends to make it easier to learn specific technologies that build on those foundations.

This blog is totally free, but if you like it, please consider supporting it.

Gratification

2024-05-13T06:27:00+00:00

Some thoughts on developer experience.

Years ago, I was introduced to a concept called developer ergonomics. Despite the name, it's not about good chairs, standing desks, or multiple monitors. Rather, the concept was related to how easy it'd be for a developer to achieve a certain outcome. How easy is it to set up a new code base in a particular language? How much work is required to save a row in a database? How hard is it to read rows from a database and display the data on a web page? And so on.

These days, we tend to discuss developer experience rather than ergonomics, and that's probably a good thing. This term more immediately conveys what it's about.

I've recently had some discussions about developer experience (DevEx, DX) with one of my customers, and this has lead me to reflect more explicitly on this topic than previously. Most of what I'm going to write here are opinions and beliefs that go back a long time, but apparently, it's only recently that these notions have congealed in my mind under the category name developer experience.

This article may look like your usual old-man-yells-at-cloud article, but I hope that I can avoid that. It's not the case that I yearn for some lost past where 'we' wrote Plankalkül in Edlin. That, in fact, sounds like a horrible developer experience.

The point, rather, is that most attractive things come with consequences. For anyone who have been reading this blog even once in a while, this should come as no surprise.

Instant gratification #

Fat foods, cakes, and wine can be wonderful, but can be detrimental to your health if you overindulge. It can, however, be hard to resist a piece of chocolate, and even if we think that we shouldn't, we often fail to restrain ourselves. The temptation of instant gratification is simply too great.

There are other examples like this. The most obvious are the use of narcotics, lack of exercise, smoking, and dropping out of school. It may feel good in the moment, but can have long-term consequences.

Small children are notoriously bad at delaying gratification, and we often associate the ability to delay gratification with maturity. We all, however, fall in from time to time. Food and wine are my weak spots, while I don't do drugs, and I didn't drop out of school.

It strikes me that we often talk about ideas related to developer experience in a way where we treat developers as children. To be fair, many developers also act like children. I don't know how many times I've heard something like, "I don't want to write tests/go through a code review/refactor! I just want to ship working code now!"

Fine, so do I.

Even if wine is bad for me, it makes life worth living. As the saying goes, even if you don't smoke, don't drink, exercise rigorously, eat healthily, don't do drugs, and don't engage in dangerous activities, you're not guaranteed to live until ninety, but you're guaranteed that it's going to feel that long.

Likewise, I'm aware that doing everything right can sometimes take so long that by the time we've deployed the software, it's too late. The point isn't to always or never do certain things, but rather to be aware of the consequences of our choices.

Developer experience #

I've no problem with aiming to make the experience of writing software as good as possible. Some developer-experience thought leaders talk about the importance of documentation, predictability, and timeliness. Neither do I mind that a development environment looks good, completes my words, or helps me refactor.

To return to the analogy of human vices, not everything that feels good is ultimately bad for you. While I do like wine and chocolate, I also love sushi, white asparagus, turbot, chanterelles, lumpfish roe caviar, true morels, Norway lobster, and various other foods that tend to be categorized as healthy.

A good IDE with refactoring support, statement completion, type information, test runner, etc. is certainly preferable to writing all code in Notepad.

That said, there's a certain kind of developer tooling and language features that strikes me as more akin to candy. These are typically tools and technologies that tend to demo well. Recent examples include OpenAPI, GitHub Copilot, C# top-level statements, code generation, and Postman. Not all of these are unequivocally bad, but they strike me as mostly aiming at immature developers.

The point of this article isn't to single out these particular products, standards, or language features, but on the other hand, in order to make a point, I do have to at least outline why I find them problematic. They're just examples, and I hope that by explaining what is on my mind, you can see the pattern and apply it elsewhere.

OpenAPI #

A standard like OpenAPI, for example, looks attractive because it automates or standardizes much work related to developing and maintaining REST APIs. Frameworks and tools that leverage that standard automatically creates machine-readable schema and contract, which can be used to generate client code. Furthermore, an OpenAPI-aware framework can also autogenerate an entire web-based graphical user interface, which developers can use for ad-hoc testing.

I've worked with clients who also published these OpenAPI user interfaces to their customers, so that it was easy to get started with the APIs. Easy onboarding.

Instant gratification.

What's the problem with this? There are clearly enough apparent benefits that I usually have a hard time talking my clients out of pursuing this strategy. What are the disadvantages? Essentially, OpenAPI locks you into level 2 APIs. No hypermedia controls, no smooth conneg-based versioning, no HATEOAS. In fact, most of what makes REST flexible is lost. What remains is an ad-hoc, informally-specified, bug-ridden, slow implementation of half of SOAP.

I've previously described my misgivings about Copilot, and while I actually still use it, I don't want to repeat all of that here. Let's move on to another example.

Top-level statements #

Among many other language features, C# 9 got top-level-statements. This means that you don't need to write a Main method in a static class. Rather, you can have a single C# code file where you can immediately start executing code.

It's not that I consider this language feature particularly harmful, but it also solves what seems to me a non-problem. It demos well, though. If I understand the motivation right, the feature exists because 'modern' developers are used to languages like Python where you can, indeed, just create a .py file and start adding code statements.

In an attempt to make C# more attractive to such an audience, it, too, got that kind of developer experience enabled.

You may argue that this is a bid to remove some of the ceremony from the language, but I'm not convinced that this moves that needle much. The level of ceremony that a language like C# has is much deeper than that. That's not to target C# in particular. Java is similar, and don't even get me started on C or C++! Did anyone say header files?

Do 'modern' developers choose Python over C# because they can't be arsed to write a Main method? If that's the only reason, it strikes me as incredibly immature. I want instant gratification, and writing a Main method is just too much trouble!

If developers do, indeed, choose Python or JavaScript over C# and Java, I hope and believe that it's for other reasons.

This particular C# feature doesn't bother me, but I find it symptomatic of a kind of 'innovation' where language designers target instant gratification.

Postman #

Let's consider one more example. You may think that I'm now attacking a company that, for all I know, makes a decent product. I don't really care about that, though. What I do care about is the developer mentality that makes a particular tool so ubiquitous.

I've met web service developers who would be unable to interact with the HTTP APIs that they are themselves developing if they didn't have Postman. Likewise, there are innumerable questions on Stack Overflow where people ask questions about HTTP APIs and post screen shots of Postman sessions.

It's okay if you don't know how to interact with an HTTP API. After all, there's a first time for everything, and there was a time when I didn't know how to do this either. Apparently, however, it's easier to install an application with a graphical user interface than it is to use curl.

Do yourself a favour and learn curl instead of using Postman. Curl is a command-line tool, which means that you can use it for both ad-hoc experimentation and automation. It takes five to ten minutes to learn the basics. It's also free.

It still seems to me that many people are of a mind that it's easier to use Postman than to learn curl. Ultimately, I'd wager that for any task you do with some regularity, it's more productive to learn the text-based tool than the point-and-click tool. In a situation like this, I'd suggest that delayed gratification beats instant gratification.

CV-driven development #

It is, perhaps, easy to get the wrong impression from the above examples. I'm not pointing fingers at just any 'cool' new technology. There are techniques, languages, frameworks, and so on, which people pick up because they're exciting for other reasons. Often, such technologies solve real problems in their niches, but are then applied for the sole reason that people want to get on the bandwagon. Examples include Kubernetes, mocks, DI Containers, reflection, AOP, and microservices. All of these have legitimate applications, but we also hear about many examples where people use them just to use them.

That's a different problem from the one I'm discussing in this article. Usually, learning about such advanced techniques requires delaying gratification. There's nothing wrong with learning new skills, but part of that process is also gaining the understanding of when to apply the skill, and when not to. That's a different discussion.

Innovation is fine #

The point of this article isn't that every innovation is bad. Contrary to Charles Petzold, I don't really believe that Visual Studio rots the mind, although I once did publish an article that navigated the same waters.

Despite my misgivings, I haven't uninstalled GitHub Copilot, and I do enjoy many of the features in both Visual Studio (VS) and Visual Studio Code (VS Code). I also welcome and use many new language features in various languages.

I can certainly appreciate how an IDE makes many things easier. Every time I have to begin a new Haskell code base, I long for the hand-holding offered by Visual Studio when creating a new C# project.

And although I don't use the debugger much, the built-in debuggers in VS and VS Code sure beat GDB. It even works in Python!

There's even tooling that I wish for, but apparently never will get.

Simple made easy #

In Simple Made Easy Rich Hickey follows his usual look-up-a-word-in-the-dictionary-and-build-a-talk-around-the-definition style to contrast simple with easy. I find his distinction useful. A tool or technique that's close at hand is easy. This certainly includes many of the above instant-gratification examples.

An easy technique is not, however, necessarily simple. It may or may not be. Rich Hickey defines simple as the opposite of complex. Something that is complex is assembled from parts, whereas a simple thing is, ideally, single and undivisible. In practice, truly simple ideas and tools may not be available, and instead we may have to settle with things that are less complex than their alternatives.

Once you start looking for things that make simple things easy, you see them in many places. A big category that I personally favour contains all the language features and tools that make functional programming (FP) easier. FP tends to be simpler than object-oriented or procedural programming, because it explicitly distinguishes between and separates predictable code from unpredictable code. This does, however, in itself tend to make some programming tasks harder. How do you generate a random number? Look up the system time? Write a record to a database?

Several FP languages have special features that make even those difficult tasks easy. F# has computation expressions and Haskell has do notation.

Let's say you want to call a function that consumes a random number generator. In Haskell (as in .NET) random number generators are actually deterministic, as long as you give them the same seed. Generating a random seed, on the other hand, is non-deterministic, so has to happen in IO.

Without do notation, you could write the action like this:

rndSelect :: Integral i => [a] -> i -> IO [a]
rndSelect xs count = (\rnd -> rndGenSelect rnd xs count) <$> newStdGen

(The type annotation is optional.) While terse, this is hardly readable, and the developer experience also leaves something to be desired. Fortunately, however, you can rewrite this action with do notation, like this:

rndSelect :: Integral i => [a] -> i -> IO [a]
rndSelect xs count = do
  rnd <- newStdGen
  return $ rndGenSelect rnd xs count

Now we can clearly see that the action first creates the rnd random number generator and then passes it to rndGenSelect. That's what happened before, but it was buried in a lambda expression and Haskell's right-to-left causality. Most people would find the first version (without do notation) less readable, and more difficult to write.

Related to developer ergonomics, though, do notation makes the simple code (i.e. code that separates predictable code from unpredictable code) easy (that is; at hand).

F# computation expressions offer the same kind of syntactic sugar, making it easy to write simple code.

Delay gratification #

While it's possible to set up a development context in such a way that it nudges you to work in a way that's ultimately good for you, temptation is everywhere.

Not only may new language features, IDE functionality, or frameworks entice you to do something that may be disadvantageous in the long run. There may also be actions you don't take because it just feels better to move on.

Do you take the time to write good commit messages? Not just a single-line heading, but a proper message that explains your context and reasoning?

Most people I've observed working with source control 'just want to move on', and can't be bothered to write a useful commit message.

I hear about the same mindset when it comes to code reviews, particularly pull request reviews. Everyone 'just wants to write code', and no-one want to review other people's code. Yet, in a shared code base, you have to live with the code that other people write. Why not review it so that you have a chance to decide what that shared code base should look like?

Delay your own gratification a bit, and reap the awards later.

Conclusion #

The only goal I have with this article is to make you think about the consequences of new and innovative tools and frameworks. Particularly if they are immediately compelling, they may be empty calories. Consider if there may be disadvantages to adopting a new way of doing things.

Some tools and technologies give you instant gratification, but may be unhealthy in the long run. This is, like most other things, context-dependent. In the long run your company may no longer be around. Sometimes, it pays to deliberately do something that you know is bad, in order to reach a goal before your competition. That was the original technical debt metaphor.

Often, however, it pays to delay gratification. Learn curl instead of Postman. Learn to design proper REST APIs instead of relying on OpenAI. If you need to write ad-hoc scripts, use a language suitable for that.

Comments

Thomas Levesque #

Regarding Postman vs. curl, I have to disagree. Sure, curl is pretty easy to use. But while it's good for one-off tests, it sucks when you need to maintain a collection of requests that you can re-execute whevenever you want. In a testing session, you either need to re-type whole command, or reuse a previous command from the shell's history. Or have a file with all your commands and copy-paste to the shell. Either way, it's not a good experience.

That being said, I'm not very fond of Postman either. It's too heavyweight for what it does, IMHO, and the import/export mechanism is terrible for sharing collections with the team. These days, I tend to use VSCode extensions like httpYac or REST Client, or the equivalent that is now built into Visual Studio and Rider. It's much easier to work with than Postman (it's just text), while still being interactive. And since it's just a text file, you can just add it to the Git to share it with the team.

2024-05-14 02:38 UTC

Jiehong #

@Thomas Levesque: I agree with you, yet VSCode or Rider's extensions lock you into an editor quite quickly.

But you can have the best of both worlds: a cli tool first, with editor extensions. Just like Hurl.

Note that you can run a curl command from a file with curl --config [curl_request.file], it makes chaining requests (like with login and secrets) rather cumbersome very quickly.

2024-05-16 13:57 UTC

Mark Seemann #

Thank you, both, for writing. In the end, it's up to every team to settle on technical solutions that work for them, in that context. Likewise, it's up to each developer to identify methodology and tools that work for her or him, as long as it doesn't impact the rest of the team.

The reason I suggest curl over other alternatives is that not only is it free, it also tends to be ubiquitous. Most systems come with curl baked in - perhaps not a consumer installation of Windows, but if you have developer tools installed, it's highly likely that you have curl on your machine. It's a fundamental skill that may serve you well if you know it.

In addition to that, since curl is a CLI you can always script it if you need a kind of semi-automation. What prevents you from maintaining a collection of script files? They could even take command-line arguments, if you'd like.

That said, personally, if I realize that I need to maintain a collection of requests that I can re-execute whenever I want, I'd prefer writing a 'real' program. On the other hand, I find a tool like curl useful for ad-hoc testing.

2024-05-21 5:36 UTC

Johannes Egger #

... maintain a collection of requests that you can re-execute whevenever you want.

@Thomas Levesque: that sounds like a proper collection of automatically executable tests would be a better fit. But yeah, it's just easier to write those simple commands than to set up a test project - instant gratification 😉

2024-05-28 17:02 UTC

This blog is totally free, but if you like it, please consider supporting it.

Conservative codomain conjecture

2024-05-06T06:35:00+00:00

An API design heuristic.

For a while now, I've been wondering whether, in the language of Postel's law, one should favour being liberal in what one accepts over being conservative in what one sends. Yes, according to the design principle, a protocol or API should do both, but sometimes, you can't do that. Instead, you'll have to choose. I've recently reached the tentative conclusion that it may be a good idea favouring being conservative in what one sends.

Good API design explicitly considers contracts. What are the preconditions for invoking an operation? What are the postconditions? Are there any invariants? These questions are relevant far beyond object-oriented design. They are equally important in Functional Programming, as well as in service-oriented design.

If you have a type system at your disposal, you can often model pre- and postconditions as types. In practice, however, it frequently turns out that there's more than one way of doing that. You can model an additional precondition with an input type, but you can also model potential errors as a return type. Which option is best?

That's what this article is about, and my conjecture is that constraining the input type may be preferable, thus being conservative about what is returned.

An average example #

That's all quite abstract, so for the rest of this article, I'll discuss this kind of problem in the context of an example. We'll revisit the good old example of calculating an average value. This example, however, is only a placeholder for any kind of API design problem. This article is only superficially about designing an API for calculating an average. More generally, this is about API design. I like the average example because it's easy to follow, and it does exhibit some characteristics that you can hopefully extrapolate from.

In short, what is the contract of the following method?

public static TimeSpan Average(this IEnumerable<TimeSpan> timeSpans)
{
    var sum = TimeSpan.Zero;
    var count = 0;
    foreach (var ts in timeSpans)
    {
        sum += ts;
        count++;
    }
    return sum / count;
}

What are the preconditions? What are the postconditions? Are there any invariants?

Before I answer these questions, I'll offer equivalent code in two other languages. Here it is in F#:

let average (timeSpans : TimeSpan seq) =
    timeSpans
    |> Seq.averageBy (_.Ticks >> double)
    |> int64
    |> TimeSpan.FromTicks

And in Haskell:

average :: (Fractional a, Foldable t) => t a -> a
average xs = sum xs / fromIntegral (length xs)

These three examples have somewhat different implementations, but the same externally observable behaviour. What is the contract?

It seems straightforward: If you input a sequence of values, you get the average of all of those values. Are there any preconditions? Yes, the sequence can't be empty. Given an empty sequence, all three implementations throw an exception. (The Haskell version is a little more nuanced than that, but given an empty list of NominalDiffTime, it does throw an exception.)

Any other preconditions? At least one more: The sequence must be finite. All three functions allow infinite streams as input, but if given one, they will fail to return an average.

Are there any postconditions? I can only think of a statement that relates to the preconditions: If the preconditions are fulfilled, the functions will return the correct average value (within the precision allowed by floating-point calculations).

All of this, however, is just warming up. We've been over this ground before.

Modelling contracts #

Keep in mind that this average function is just an example. Think of it as a stand-in for a procedure that's much more complicated. Think of the most complicated operation in your code base.

Not only do real code bases have many complicated operations. Each comes with its own contract, different from the other operations, and if the team isn't explicitly thinking in terms of contracts, these contracts may change over time, as the team adds new features and fixes bugs.

It's difficult work to keep track of all those contracts. As I argue in Code That Fits in Your Head, it helps if you can automate away some of that work. One way is having good test coverage. Another is to leverage a static type system, if you're fortunate enough to work in a language that has one. As I've also already covered, you can't replace all rules with types, but it doesn't mean that using the type system is ineffectual. Quite the contrary. Every part of a contract that you can offload to the type system frees up your brain to think about something else - something more important, hopefully.

Sometimes there's no good way to to model a precondition with a type, or perhaps it's just too awkward. At other times, there's really only a single way to address a concern. When it comes to the precondition that you can't pass an infinite sequence to the average function, change the type so that it takes some finite collection instead. That's not what this article is about, though.

Assuming that you've already dealt with the infinite-sequence issue, how do you address the other precondition?

Error-handling #

A typical object-oriented move is to introduce a Guard Clause:

public static TimeSpan Average(this IReadOnlyCollection<TimeSpan> timeSpans)
{
    if (!timeSpans.Any())
        throw new ArgumentOutOfRangeException(
            nameof(timeSpans),
            "Can't calculate the average of an empty collection.");
 
    var sum = TimeSpan.Zero;
    foreach (var ts in timeSpans)
        sum += ts;
    return sum / timeSpans.Count;
}

You could do the same in F#:

let average (timeSpans : TimeSpan seq) =
    if Seq.isEmpty timeSpans then
        raise (
            ArgumentOutOfRangeException(
                nameof timeSpans,
                "Can't calculate the average of an empty collection."))
 
    timeSpans
    |> Seq.averageBy (_.Ticks >> double)
    |> int64
    |> TimeSpan.FromTicks

You could also replicate such behaviour in Haskell, but it'd be highly unidiomatic. Instead, I'd rather discuss one idiomatic solution in Haskell, and then back-port it.

While you can throw exceptions in Haskell, you typically handle predictable errors with a sum type. Here's a version of the Haskell function equivalent to the above C# code:

average :: (Foldable t, Fractional a) => t a -> Either String a
average xs =
  if null xs
    then Left "Can't calculate the average of an empty collection."
    else Right $ sum xs / fromIntegral (length xs)

For the readers that don't know the Haskell base library by heart, null is a predicate that checks whether or not a collection is empty. It has nothing to do with null pointers.

This variation returns an Either value. In practice you shouldn't just return a String as the error value, but rather a strongly-typed value that other code can deal with in a robust manner.

On the other hand, in this particular example, there's really only one error condition that the function is able to detect, so you often see a variation where instead of a single error message, such a function just doesn't return anything:

average :: (Foldable t, Fractional a) => t a -> Maybe a
average xs = if null xs then Nothing else Just $ sum xs / fromIntegral (length xs)

This iteration of the function returns a Maybe value, indicating that a return value may or may not be present.

Liberal domain #

We can back-port this design to F#, where I'd also consider it idiomatic:

let average (timeSpans : IReadOnlyCollection<TimeSpan>) =
    if timeSpans.Count = 0 then None else
        timeSpans
        |> Seq.averageBy (_.Ticks >> double)
        |> int64
        |> TimeSpan.FromTicks
        |> Some

This version returns a TimeSpan option rather than just a TimeSpan. While this may seem to put the burden of error-handling on the caller, nothing has really changed. The fundamental situation is the same. Now the function is just being more explicit (more honest, you could say) about the pre- and postconditions. The type system also now insists that you deal with the possibility of error, rather than just hoping that the problem doesn't occur.

In C# you can expand the codomain by returning a nullable TimeSpan value, but such an option may not always be available at the language level. Keep in mind that the Average method is just an example standing in for something that may be more complicated. If the original return type is a reference type rather than a value type, only recent versions of C# allows statically-checked nullable reference types. What if you're working in an older version of C#, or another language that doesn't have that feature?

In that case, you may need to introduce an explicit Maybe class and return that:

public static Maybe<TimeSpan> Average(this IReadOnlyCollection<TimeSpan> timeSpans)
{
    if (timeSpans.Count == 0)
        return new Maybe<TimeSpan>();
 
    var sum = TimeSpan.Zero;
    foreach (var ts in timeSpans)
        sum += ts;
    return new Maybe<TimeSpan>(sum / timeSpans.Count);
}

Two things are going on here; one is obvious while the other is more subtle. Clearly, all of these alternatives change the static type of the function in order to make the pre- and postconditions more explicit. So far, they've all been loosening the codomain (the return type). This suggests a connection with Postel's law: be conservative in what you send, be liberal in what you accept. These variations are all liberal in what they accept, but it seems that the API design pays the price by also having to widen the set of possible return values. In other words, such designs aren't conservative in what they send.

Do we have other options?

Conservative codomain #

Is it possible to instead design the API in such a way that it's conservative in what it returns? Ideally, we'd like it to guarantee that it returns a number. This is possible by making the preconditions even more explicit. I've also covered that alternative already, so I'm just going to repeat the C# code here without further comments:

public static TimeSpan Average(this NotEmptyCollection<TimeSpan> timeSpans)
{
    var sum = timeSpans.Head;
    foreach (var ts in timeSpans.Tail)
        sum += ts;
    return sum / timeSpans.Count;
}

This variation promotes another precondition to a type. The precondition that the input collection mustn't be empty can be explicitly modelled with a type. This enables us to be conservative about the codomain. The method now guarantees that it will return a value.

This idea is also easily ported to F#:

type NonEmpty<'a> = { Head : 'a; Tail : IReadOnlyCollection<'a> }
 
let average (timeSpans : NonEmpty<TimeSpan>) =
    [ timeSpans.Head ] @ List.ofSeq timeSpans.Tail
    |> List.averageBy (_.Ticks >> double)
    |> int64
    |> TimeSpan.FromTicks

The average function now takes a NonEmpty collection as input, and always returns a proper TimeSpan value.

Haskell already comes with a built-in NonEmpty collection type, and while it oddly doesn't come with an average function, it's easy enough to write:

import qualified Data.List.NonEmpty as NE

average :: Fractional a => NE.NonEmpty a -> a
average xs = sum xs / fromIntegral (NE.length xs)

You can find a recent example of using a variation of that function here.

Choosing between the two alternatives #

While Postel's law recommends having liberal domains and conservative codomains, in the case of the average API, we can't have both. If we design the API with a liberal input type, the output type has to be liberal as well. If we design with a restrictive input type, the output can be guaranteed. In my experience, you'll often find yourself in such a conundrum. The average API examined in this article is just an example, while the problem occurs often.

Given such a choice, what should you choose? Is it even possible to give general guidance on this sort of problem?

For decades, I considered such a choice a toss-up. After all, these solutions seem to be equivalent. Perhaps even isomorphic?

When I recently began to explore this isomorphism more closely, it dawned on me that there's a small asymmetry in the isomorphism that favours the conservative codomain option.

Isomorphism #

An isomorphism is a two-way translation between two representations. You can go back and forth between the two alternatives without loss of information.

Is this possible with the two alternatives outlined above? For example, if you have the conservative version, can create the liberal alternative? Yes, you can:

average' :: Fractional a => [a] -> Maybe a
average' = fmap average . NE.nonEmpty

Not surprisingly, this is trivial in Haskell. If you have the conservative version, you can just map it over a more liberal input.

In F# it looks like this:

module NonEmpty =
    let tryOfSeq xs =
        if Seq.isEmpty xs then None
        else Some { Head = Seq.head xs; Tail = Seq.tail xs |> List.ofSeq }
 
let average' (timeSpans : IReadOnlyCollection<TimeSpan>) =
    NonEmpty.tryOfSeq timeSpans |> Option.map average

In C# we can create a liberal overload that calls the conservative method:

public static TimeSpan? Average(this IReadOnlyCollection<TimeSpan> timeSpans)
{
    if (timeSpans.Count == 0)
        return null;
 
    var arr = timeSpans.ToArray();
    return new NotEmptyCollection<TimeSpan>(arr[0], arr[1..]).Average();
}

Here I just used a Guard Clause and explicit construction of the NotEmptyCollection. I could also have added a NotEmptyCollection.TryCreate method, like in the F# and Haskell examples, but I chose the above slightly more imperative style in order to demonstrate that my point isn't tightly coupled to the concept of functors, mapping, and other Functional Programming trappings.

These examples highlight how you can trivially make a conservative API look like a liberal API. Is it possible to go the other way? Can you make a liberal API look like a conservative API?

Yes and no.

Consider the liberal Haskell version of average, shown above; that's the one that returns Maybe a. Can you make a conservative function based on that?

average' :: Fractional a => NE.NonEmpty a -> a
average' xs = fromJust $ average xs

Yes, this is possible, but only by resorting to the partial function fromJust. I'll explain why that is a problem once we've covered examples in the two other languages, such as F#:

let average' (timeSpans : NonEmpty<TimeSpan>) =
    [ timeSpans.Head ] @ List.ofSeq timeSpans.Tail |> average |> Option.get

In this variation, average is the liberal version shown above; the one that returns a TimeSpan option. In order to make a conservative version, the average' function can call the liberal average function, but has to resort to the partial function Option.get.

The same issue repeats a third time in C#:

public static TimeSpan Average(this NotEmptyCollection<TimeSpan> timeSpans)
{
    return timeSpans.ToList().Average().Value;
}

This time, the partial function is the unsafe Value property, which throws an InvalidOperationException if there's no value.

This even violates Microsoft's own design guidelines:

"AVOID throwing exceptions from property getters."

Krzystof Cwalina and Brad Abrams

I've cited Cwalina and Abrams as the authors, since this rule can be found in my 2006 edition of Framework Design Guidelines. This isn't a new insight.

While the two alternatives are 'isomorphic enough' that we can translate both ways, the translations are asymmetric in the sense that one is safe, while the other has to resort to an inherently unsafe operation to make it work.

Encapsulation #

I've called the operations fromJust, Option.get, and Value partial, and only just now used the word unsafe. You may protest that neither of the three examples are unsafe in practice, since we know that the input is never empty. Thus, we know that the liberal function will always return a value, and therefore it's safe to call a partial function, even though these operations are unsafe in the general case.

While that's true, consider how the burden shifts. When you want to promote a conservative variant to a liberal variant, you can rely on all the operations being total. On the other hand, if you want to make a liberal variant look conservative, the onus is on you. None of the three type systems on display here can perform that analysis for you.

This may not be so bad when the example is as simple as taking the average of a collection of numbers, but does it scale? What if the operation you're invoking is much more complicated? Can you still be sure that you safely invoke a partial function on the return value?

As Code That Fits in Your Head argues, procedures quickly become so complicated that they no longer fit in your head. If you don't have well-described and patrolled contracts, you don't know what the postconditions are. You can't trust the return values from method calls, or even the state of the objects you passed as arguments. This tend to lead to defensive coding, where you write code that checks the state of everything all too often.

The remedy is, as always, good old encapsulation. In this case, check the preconditions at the beginning, and capture the result of that check in an object or type that is guaranteed to be always valid. This goes beyond making illegal states unrepresentable because it also works with predicative types. Once you're past the Guard Clauses, you don't have to check the preconditions again.

This kind of thinking illustrates why you need a multidimensional view on API design. As useful as Postel's law sometimes is, it doesn't address all problems. In fact, it turned out to be unhelpful in this context, while another perspective proves more fruitful. Encapsulation is the art and craft of designing APIs in such a way that they suggest or even compels correct interactions. The more I think of this, the more it strikes me that a ranking is implied: Preconditions are more important than postconditions, because if the preconditions are unfulfilled, you can't trust the postconditions, either.

Mapping #

What's going on here? One perspective is to view types as sets. In the average example, the function maps from one set to another:

Which sets are they? We can think of the average function as a mapping from the set of non-empty collections of numbers to the set of real numbers. In programming, we can't represent real numbers, so instead, the left set is going to be the set of all the non-empty collections the computer or the language can represent and hold in (virtual) memory, and the right-hand set is the set of all the possible numbers of whichever type you'd like (32-bit signed integers, 64-bit floating-point numbers, 8-bit unsigned integers, etc.).

In reality, the left-hand set is much larger than the set to the right.

Drawing all those arrows quickly becomes awkward , so instead, we may draw each mapping as a pipe. Such a pipe also corresponds to a function. Here's an intermediate step in such a representation:

One common element is, however, missing from the left set. Which one?

Pipes #

The above mapping corresponds to the conservative variation of the function. It's a total function that maps all values in the domain to a value in the codomain. It accomplishes this trick by explicitly constraining the domain to only those elements on which it's defined. Due to the preconditions, that excludes the empty collection, which is therefore absent from the left set.

What if we also want to allow the empty collection to be a valid input?

Unless we find ourselves in some special context where it makes sense to define a 'default average value', we can't map an empty collection to any meaningful number. Rather, we'll have to map it to some special value, such as Nothing, None, or null:

This extra pipe is free, because it's supplied by the Maybe functor's mapping (Select, map, fmap).

What happens if we need to go the other way? If the function is the liberal variant that also maps the empty collection to a special element that indicates a missing value?

In this case, it's much harder to disentangle the mappings. If you imagine that a liquid flows through the pipes, we can try to be careful and avoid 'filling up' the pipe.

The liquid represents the data that we do want to transmit through the pipe. As this illustration suggests, we now have to be careful that nothing goes wrong. In order to catch just the right outputs on the right side, you need to know how high the liquid may go, and attach a an 'flat-top' pipe to it:

As this illustration tries to get across, this kind of composition is awkward and error-prone. What's worse is that you need to know how high the liquid is going to get on the right side. This depends on what actually goes on inside the pipe, and what kind of input goes into the left-hand side.

This is a metaphor. The longer the pipe is, the more difficult it gets to keep track of that knowledge. The stubby little pipe in these illustrations may correspond to the average function, which is an operation that easily fits in our heads. It's not too hard to keep track of the preconditions, and how they map to postconditions.

Thus, turning such a small liberal function into a conservative function is possible, but already awkward. If the operation is complicated, you can no longer keep track of all the details of how the inputs relate to the outputs.

Additive extensibility #

This really shouldn't surprise us. Most programming languages come with all sorts of facilities that enable extensibility: The ability to add more functionality, more behaviour, more capabilities, to existing building blocks. Conversely, few languages come with removability facilities. You can't, commonly, declare that an object is an instance of a class, except one method, or that a function is just like another function, except that it doesn't accept a particular subset of input.

This explains why we can safely make a conservative function liberal, but why it's difficult to make a liberal function conservative. This is because making a conservative function liberal adds functionality, while making a liberal function conservative attempts to remove functionality.

Conjecture #

All this leads me to the following conjecture: When faced with a choice between two versions of an API, where one has a liberal domain, and the other a conservative codomain, choose the design with the conservative codomain.

If you need the liberal version, you can create it from the conservative operation. The converse need not be true.

Conclusion #

Postel's law encourages us to be liberal with what we accept, but conservative with what we return. This is a good design heuristic, but sometimes you're faced with mutually exclusive alternatives. If you're liberal with what you accept, you'll also need to be too loose with what you return, because there are input values that you can't handle. On the other hand, sometimes the only way to be conservative with the output is to also be restrictive when it comes to input.

Given two such alternatives, which one should you choose?

This article conjectures that you should choose the conservative alternative. This isn't a political statement, but simply a result of the conservative design being the smaller building block. From a small building block, you can compose something bigger, whereas from a bigger unit, you can't easily extract something smaller that's still robust and useful.

This blog is totally free, but if you like it, please consider supporting it.

Service compatibility is determined based on policy

2024-04-29T11:12:00+00:00

A reading of the fourth Don Box tenet, with some commentary.

This article is part of a series titled The four tenets of SOA revisited. In each of these articles, I'll pull one of Don Box's four tenets of service-oriented architecture (SOA) out of the original MSDN Magazine article and add some of my own commentary. If you're curious why I do that, I cover that in the introductory article.

In this article, I'll go over the fourth tenet, quoting from the MSDN Magazine article unless otherwise indicated.

Service compatibility is determined based on policy #

The fourth tenet is the forgotten one. I could rarely remember exactly what it included, but it does give me an opportunity to bring up a few points about compatibility. The articles said:

Object-oriented designs often confuse structural compatibility with semantic compatibility. Service-orientation deals with these two axes separately. Structural compatibility is based on contract and schema and can be validated (if not enforced) by machine-based techniques (such as packet-sniffing, validating firewalls). Semantic compatibility is based on explicit statements of capabilities and requirements in the form of policy.

Every service advertises its capabilities and requirements in the form of a machine-readable policy expression. Policy expressions indicate which conditions and guarantees (called assertions) must hold true to enable the normal operation of the service. Policy assertions are identified by a stable and globally unique name whose meaning is consistent in time and space no matter which service the assertion is applied to. Policy assertions may also have parameters that qualify the exact interpretation of the assertion. Individual policy assertions are opaque to the system at large, which enables implementations to apply simple propositional logic to determine service compatibility.

As you can tell, this description is the shortest of the four. This is also the point where I begin to suspect that my reading of the third tenet may deviate from what Don Box originally had in mind.

This tenet is also the most baffling to me. As I understand it, the motivation behind the four tenets was to describe assumptions about the kind of systems that people would develop with Windows Communication Foundation (WCF), or SOAP in general.

While I worked with WCF for a decade, the above description doesn't ring a bell. Reading it now, the description of policy sounds more like a system such as clojure.spec, although that's not something I know much about either. I don't recall WCF ever having a machine-readable policy subsystem, and if it had, I never encountered it.

It does seem, however, as though what I interpret as contract, Don Box called policy.

Despite my confusion, the word compatibility is worth discussing, regardless of whether that was what Don Box meant. A well-designed service is one where you've explicitly considered forwards and backwards compatibility.

Versioning #

Planning for forwards and backwards compatibility does not imply that you're expected to be able to predict the future. It's fine if you have so much experience developing and maintaining online systems that you may have enough foresight to plan for certain likely changes that you may have to make in the future, but that's not what I have in mind.

Rather, what you should do is to have a system that enables you to detect breaking changes before you deploy them. Furthermore you should have a strategy for how to deal with the perceived necessity to introduce breaking changes.

The most effective strategy that I know of is to employ explicit versioning, particularly message versioning. You can version an entire service as one indivisible block, but I often find it more useful to version at the message level. If you're designing a REST API, for example, you can take advantage of Content Negotiation.

If you like, you can use Semantic Versioning as a versioning scheme, but for services, the thing that mostly matters is the major version. Thus, you may simply label your messages with the version numbers 1, 2, etc.

If you already have a published service without explicit message version information, then you can still retrofit versioning afterwards. Imagine that your existing data looks like this:

{
  "singleTable": {
    "capacity": 16,
    "minimalReservation": 10
  }
}

This JSON document has no explicit version information, but you can interpret that as implying that the document has the 'default' version, which is always 1:

{
  "singleTable": {
    "version": 1,
    "capacity": 16,
    "minimalReservation": 10
  }
}

If you later realize that you need to make a breaking change, you can do that by increasing the (major) version:

{
  "singleTable": {
    "version": 2,
    "id": 12,
    "capacity": 16,
    "minimalReservation": 10
  }
}

Recipients can now look for the version property to learn how to interpret the rest of the message, and failing to find it, infer that this is version 1.

As Don Box wrote, in a service-oriented system, you can't just update all systems in a single coordinated release. Therefore, you must never break compatibility. Versioning enables you to move forward in a way that does break with the past, but without breaking existing clients.

Ultimately, you may attempt to retire old service versions, but be ready to keep them around for a long time.

For more of my thoughts about backwards compatibility, see Backwards compatibility as a profunctor.

Conclusion #

The fourth tenet is the most nebulous, and I wonder if it was ever implemented. If it was, I'm not aware of it. Even so, compatibility is an important component of service design, so I took the opportunity to write about that. In most cases, it pays to think explicitly about message versioning.

I have the impression that Don Box had something in mind more akin to what I call contract. Whether you call it one thing or another, it stands to reason that you often need to attach extra rules to simple types. The schema may define an input value as a number, but the service does require that this particular number is a natural number. Or that a string is really a proper encoding of a date. Perhaps you call that policy. I call it contract. In any case, clearly communicating such expectations is important for systems to be compatible.

This blog is totally free, but if you like it, please consider supporting it.

Fitting a polynomial to a set of points

2024-04-22T05:35:00+00:00

The story of a fiasco.

This is the second in a small series of articles titled Trying to fit the hype cycle. In the introduction, I've described the exercise I had in mind: Determining a formula, or at least a piecewise function, for the Gartner hype cycle. This, to be clear, is an entirely frivolous exercise with little practical application.

In the previous article, I extracted a set of (x, y) coordinates from a bitmap. In this article, I'll showcase my failed attempt at fitting the data to a polynomial.

Failure #

I've already revealed that I failed to accomplish what I set out to do. Why should you read on, then?

You don't have to, and I can't predict the many reasons my readers have for occasionally swinging by. Therefore, I can't tell you why you should keep reading, but I can tell you why I'm writing this article.

This blog is a mix of articles that I write because readers ask me interesting questions, and partly, it's my personal research-and-development log. In that mode, I write about things that I've learned, and I write in order to learn. One can learn from failure as well as from success.

I'm not that connected to 'the' research community (if such a thing exists), but I'm getting the sense that there's a general tendency in academia that researchers rarely publish their negative results. This could be a problem, because this means that the rest of us never learn about the thousands of ways that don't work.

Additionally, in 'the' programming community, we also tend to boast our victories and hide our failures. More than one podcast (sorry about the weasel words, but I don't remember which ones) have discussed how this gives young programmers the wrong impression of what programming is like. It is, indeed, a process of much trial and error, but usually, we only publish our polished, final result.

Well, I did manage to produce code to fit a polynomial to the Gartner hype cycle, but I never managed to get a good fit.

The big picture #

I realize that I have a habit of burying the lede when I write technical articles. I don't know if I've picked up that tendency from F#, which does demand that you define a value or function before you can use it. This, by the way, is a good feature.

Here, I'll try to do it the other way around, and start with the big picture:

data = numpy.loadtxt('coords.txt', delimiter=',')
 
x = data[:, 0]
t = data[:, 1]
w = fit_polynomial(x, t, 9)
 
plot_fit(x, t, w)

This, by the way, is a Python script, and it opens with these imports:

import numpy
import matplotlib.pyplot as plt

The first line of code reads the CSV file into the data variable. The first column in that file contains all the x values, and the second column the y values. The book that I've been following uses t for the data, rather than y. (Now that I think about it, I believe that this may only be because it works from an example in which the data to be fitted are 100 m dash times, denoted t.)

Once the script has extracted the data, it calls the fit_polynomial function to produce a set of weights w. The constant 9 is the degree of polynomial to fit, although I think that I've made an off-by-one error so that the result is only a eighth-degree polynomial.

Finally, the code plots the original data together with the polynomial:

The green dots are the (x, y) coordinates that I extracted in the previous article, while the red curve is the fitted eighth-degree polynomial. Even though we're definitely in the realm of over-fitting, it doesn't reproduce the Gartner hype cycle.

I've even arrived at the value 9 after some trial and error. After all, I wasn't trying to do any real science here, so over-fitting is definitely allowed. Even so, 9 seems to be the best fit I can achieve. With lover values, like 8, below, the curve deviates too much:

The value 10 looks much like 9, but above that (11), the curve completely disconnects from the data, it seems:

I'm not sure why it does this, to be honest. I would have thought that the more degrees you added, the more (over-)fitted the curve would be. Apparently, this is not so, or perhaps I made a mistake in my code.

Calculating the weights #

The fit_polynomial function calculates the polynomial coefficients using a linear algebra formula that I've found in at least two text books. Numpy makes it easy to invert, transpose, and multiply matrices, so the formula itself is just a one-liner. Here it is in the entire context of the function, though:

def fit_polynomial(x, t, degree):
    """
    Fits a polynomial to the given data.
 
    Parameters
    ----------
    x : Array of shape [n_samples]
    t : Array of shape [n_samples]
    degree : degree of the polynomial
 
    Returns
    -------
    w : Array of shape [degree + 1]
    """
 
    # This expansion creates a matrix, so we name that with an upper-case letter
    # rather than a lower-case letter, which is used for vectors.
    X = expand(x.reshape((len(x), 1)), degree)
    return numpy.linalg.inv(X.T @ X) @ X.T @ t

This may look daunting, but is really just two lines of code. The rest is docstring and a comment.

The above-mentioned formula is the last line of code. The one before that expands the input data t from a simple one-dimensional array to a matrix of those values squared, cubed, etc. That's how you use the least squares method if you want to fit it to a polynomial of arbitrary degree.

Expansion #

The expand function looks like this:

def expand(x, degree):
    """
    Expands the given array to polynomial elements of the given degree.
 
    Parameters
    ----------
    x : Array of shape [n_samples, 1]
    degree : degree of the polynomial
 
    Returns
    -------
    Xp : Array of shape [n_samples, degree + 1]
    """
 
    Xp = numpy.ones((len(x), 1))
    for i in range(1, degree + 1):
        Xp = numpy.hstack((Xp, numpy.power(x, i)))
    return Xp

The function begins by creating a column vector of ones, here illustrated with only three rows:

>>> Xp = numpy.ones((3, 1))
>>> Xp
array([[1.],
       [1.],
       [1.]])

It then proceeds to loop over as many degrees as you've asked it to, each time adding a column to the Xp matrix. Here's an example of doing that up to a power of three, on example input [1,2,3]:

>>> x = numpy.array([1,2,3]).reshape((3, 1))
>>> x
array([[1],
       [2],
       [3]])
>>> Xp = numpy.hstack((Xp, numpy.power(x, 1)))
>>> Xp
array([[1., 1.],
       [1., 2.],
       [1., 3.]])
>>> Xp = numpy.hstack((Xp, numpy.power(x, 2))) 
>>> Xp
array([[1., 1., 1.],
       [1., 2., 4.],
       [1., 3., 9.]])
>>> Xp = numpy.hstack((Xp, numpy.power(x, 3))) 
>>> Xp
array([[ 1.,  1.,  1.,  1.],
       [ 1.,  2.,  4.,  8.],
       [ 1.,  3.,  9., 27.]])

Once it's done looping, the expand function returns the resulting Xp matrix.

Plotting #

Finally, here's the plot_fit procedure:

def plot_fit(x, t, w):
    """
    Plots the polynomial with the given weights and the data.
 
    Parameters
    ----------
    x : Array of shape [n_samples]
    t : Array of shape [n_samples]
    w : Array of shape [degree + 1]
    """
 
    xs = numpy.linspace(x[0], x[0]+len(x), 100)
    ys = numpy.polyval(w[::-1], xs)
 
    plt.plot(xs, ys, 'r')
    plt.scatter(x, t, s=10, c='g')
    plt.show()

This is fairly standard pyplot code, so I don't have much to say about it.

Conclusion #

When I started this exercise, I'd hoped that I could get close to the Gartner hype cycle by over-fitting the model to some ridiculous polynomial degree. This turned out not to be the case, for reasons that I don't fully understand. As I increase the degree, the curve begins to deviate from the data.

I can't say that I'm a data scientist or a statistician of any skill, so it's possible that my understanding is still too shallow. Perhaps I'll return to this article later and marvel at the ineptitude on display here.

Comments

Aaron M. Ucko #

I suspect that increasing the degree wound up backfiring by effectively putting too much weight on the right side, whose flatness clashed with the increasingly steep powers you were trying to mix in. A vertically offset damped sinusoid might make a better starting point for modeling, though identifying its parameters wouldn't be quite as straightforward. One additional wrinkle there is that you want to level fully off after the valley; you could perhaps make that happen by plugging a scaled arctangent or something along those lines into the sinusoid.

Incidentally, a neighboring post in my feed reader was about a new release of an open-source data analysis and curve fitting program (QSoas) that might help if you don't want to take such a DIY approach.

2024-05-16 02:37 UTC

Mark Seemann #

Aaron, thank you for writing. In retrospect, it becomes increasingly clear to me why this doesn't work. This highlights, I think, why it's a good idea to sometimes do stupid exercises like this one. You learn something from it, even when you fail.

2024-05-22 6:15 UTC

This blog is totally free, but if you like it, please consider supporting it.

Services share schema and contract, not class

2024-04-15T07:25:00+00:00

A reading of the third Don Box tenet, with some commentary.

In this article, I'll go over the third tenet, quoting from the MSDN Magazine article unless otherwise indicated.

Services share schema and contract, not class #

Compared to the second tenet, the following description may at first seem more dated. Here's what the article said:

Object-oriented programming encourages developers to create new abstractions in the form of classes. Most modern development environments not only make it trivial to define new classes, modern IDEs do a better job guiding you through the development process as the number of classes increases (as features like IntelliSense® provide a more specific list of options for a given scenario).

Classes are convenient abstractions as they share both structure and behavior in a single named unit. Service-oriented development has no such construct. Rather, services interact based solely on schemas (for structures) and contracts (for behaviors). Every service advertises a contract that describes the structure of messages it can send and/or receive as well as some degree of ordering constraints over those messages. This strict separation between structure and behavior vastly simplifies deployment, as distributed object concepts such as marshal-by-value require a common execution and security environment which is in direct conflict with the goals of autonomous computing.

Services do not deal in types or classes per se; rather, only with machine readable and verifiable descriptions of the legal "ins and outs" the service supports. The emphasis on machine verifiability and validation is important given the inherently distributed nature of how a service-oriented application is developed and deployed. Unlike a traditional class library, a service must be exceedingly careful about validating the input data that arrives in each message. Basing the architecture on machine-validatible schema and contract gives both developers and infrastructure the hints they need to protect the integrity of an individual service as well as the overall application as a whole.

Because the contract and schema for a given service are visible over broad ranges of both space and time, service-orientation requires that contracts and schema remain stable over time. In the general case, it is impossible to propagate changes in schema and/or contract to all parties who have ever encountered a service. For that reason, the contract and schema used in service-oriented designs tend to have more flexibility than traditional object-oriented interfaces. It is common for services to use features such as XML element wildcards (like xsd:any) and optional SOAP header blocks to evolve a service in ways that do not break already deployed code.

With its explicit discussion of XML, SOAP, and XSD, this description may seem more stuck in 2004 than the two first tenets.

I'll cover the most obvious consequence first.

At the boundaries... #

In the MSDN article, the four tenets guide the design of Windows Communication Foundation (WCF) - a technology that in 2004 was under development, but still not completed. While SOAP already existed as a platform-independent protocol, WCF was a .NET endeavour. Most developers using the Microsoft platform at the time were used to some sort of binary protocol, such as DCOM or .NET Remoting. Thus, it makes sense that Don Box was deliberately explicit that this was not how SOA (or WCF) was supposed to work.

In fact, since SOAP is platform-independent, you could write a web service in one language (say, Java) and consume it with a different language (e.g. C++). WCF was Microsoft's SOAP technology for .NET.

If you squint enough that you don't see the explicit references to XML or SOAP, however, the description still applies. Today, you may exchange data with JSON over REST, Protocol Buffers via gRPC, or something else, but it's still common to have a communications protocol that is independent of specific service implementations. A service may be written in Python, Haskell, C, or any other language that supports the wire format. As this little list suggests, the implementation language doesn't even have to be object-oriented.

In fact,

A formal interface definition language (IDL) may enable you to automate serialization and deserialization, but these are usually constrained to defining the shape of data and operations. Don Box talks about validation, and types don't replace validation - particularly if you allow xsd:any. That particular remark is quite at odds with the notion that a formal schema definition is necessary, or even desirable.

And indeed, today we often see JSON-based REST APIs that are more loosely defined. Even so, the absence of a machine-readable IDL doesn't entail the absence of a schema. As Alexis King wrote related to the static-versus-dynamic-types debate, dynamic type systems are not inherently more open. A similar argument can be made about schema. Regardless of whether or not a formal specification exists, a service always has a de-facto schema.

To be honest, though, when I try to interpret what this and the next tenet seem to imply, an IDL may have been all that Don Box had in mind. By schema he may only have meant XSD, and by contract, he may only have meant SOAP. More broadly speaking, this notion of contract may entail nothing more than a list of named operations, and references to schemas that indicate what input each operation takes, and what output it returns.

What I have in mind with the rest of this article may be quite an embellishment on that notion. In fact, my usual interpretation of the word contract may be more aligned with what Don Box calls policy. Thus, if you want a very literal reading of the four tenets, what comes next may fit better with the fourth tenet, that service compatibility is determined based on policy.

Regardless of whether you think that the following discussion belongs here, or in the next article, I'll assert that it's paramount to designing and developing useful and maintainable web services.

Encapsulation #

If we, once more, ignore the particulars related to SOAP and XML, we may rephrase the notion of schema and contract as follows. Schema describes the shape of data: Is it a number, a string, a tuple, or a combination of these? Is there only one, or several? Is the data composed from smaller such definitions? Does the composition describe the combination of several such definitions, or does it describe mutually exclusive alternatives?

Compliant data may be encoded as objects or data structures in memory, or serialized to JSON, XML, CSV, byte streams, etc. We may choose to call a particular agglomeration of data a message, which we may pass from one system to another. The first tenet already used this metaphor.

You can't, however, just pass arbitrary valid messages from one system to another. Certain operations allow certain data, and may promise to return other kinds of messages. In additions to the schema, we also need to describe a contract.

What's a contract? If you consult Object-Oriented Software Construction, a contract stipulates invariants, pre- and postconditions for various operations.

Preconditions state what must be true before an operation can take place. This often puts the responsibility on the caller to ensure that the system is in an appropriate state, and that the message that it intends to pass to the other system is valid according to that state.

Postconditions, on the other hand, detail what the caller can expect in return. This includes guarantees about response messages, but may also describe the posterior state of the system.

Invariants, finally, outline what is always true about the system.

Although such a description of a contract originates from a book about object-oriented design, it's useful in other areas, too, such as functional programming. It strikes me that it applies equally well in the context of service-orientation.

The combination of contract and well-described message structure is, in other words, encapsulation. There's nothing wrong with that: It works. If you actually apply it as a design principle, that is.

Conclusion #

The third SOA tenet emphasizes that only data travels over service boundaries. In order to communicate effectively, services must agree on the shape of data, and which operations are legal when. While they exchange data, however, they don't share address space, or even internal representation.

One service may be written in F# and the client in Clojure. Even so, it's important that they have a shared understanding of what is possible, and what is not. The more explicit you, as a service owner, can be, the better.

Next: Service compatibility is determined based on policy.

This blog is totally free, but if you like it, please consider supporting it.

Extracting curve coordinates from a bitmap

2024-04-08T05:32:00+00:00

Another example of using Haskell as an ad-hoc scripting language.

This article is part of a short series titled Trying to fit the hype cycle. In the first article, I outlined what it is that I'm trying to do. In this article, I'll describe how I extract a set of x and y coordinates from this bitmap:

(Actually, this is scaled-down version of the image. The file I work with is a bit larger.)

As I already mentioned in the previous article, these days there are online tools for just about everything. Most likely, there's also an online tool that will take a bitmap like that and return a set of (x, y) coordinates.

Since I'm doing this for the programming exercise, I'm not interested in that. Rather, I'd like to write a little Haskell script to do it for me.

Module and imports #

Yes, I wrote Haskell script. As I've described before, with good type inference, a statically typed language can be as good for scripting as a dynamic one. Just as might be the case with, say, a Python script, you'll be iterating, trying things out until finally the script settles into its final form. What I present here is the result of my exercise. You should imagine that I made lots of mistakes underway, tried things that didn't work, commented out code and print statements, imported modules I eventually didn't need, etc. Just like I imagine you'd also do with a script in a dynamically typed language. At least, that's how I write Python, when I'm forced to do that.

In other words, the following is far from the result of perfect foresight, but rather the equilibrium into which the script settled.

I named the module HypeCoords, because the purpose of it is to extract the (x, y) coordinates from the above Gartner hype cycle image. These are the imports it turned out that I ultimately needed:

module HypeCoords where
 
import qualified Data.List.NonEmpty as NE
import Data.List.NonEmpty (NonEmpty((:|)))
import Codec.Picture
import Codec.Picture.Types

The Codec.Picture modules come from the JuicyPixels package. This is what enables me to read a .png file and extract the pixels.

Black and white #

If you look at the above bitmap, you may notice that it has some vertical lines in a lighter grey than the curve itself. My first task, then, is to get rid of those. The easiest way to do that is to convert the image to a black-and-white bitmap, with no grey scale.

Since this is a one-off exercise, I could easily do that with a bitmap editor, but on the other hand, I thought that this was a good first task to give myself. After all, I didn't know the JuicyPixels library at all, so this was an opportunity to start with a task just a notch simpler than the one that was my actual goal.

I thought that the easiest way to convert to a black-and-white image would be to turn all pixels white if they are lighter than some threshold, and black otherwise.

A PNG file has more information than I need, so I first converted the image to an 8-bit RGB bitmap. Even though the above image looks as though it's entirely grey scale, each pixel is actually composed of three colours. In order to compare a pixel with a threshold, I needed a single measure of how light or dark it is.

That turned out to be about as simple as it sounds: Just take the average of the three colours. Later, I'd need a function to compute the average for another reason, so I made it a reusable function:

average :: Integral a => NE.NonEmpty a -> a
average nel = sum nel `div` fromIntegral (NE.length nel)

It's a bit odd that the Haskell base library doesn't come with such a function (at least to my knowledge), but anyway, this one is specialized to do integer division. Notice that this function computes only non-exceptional averages, since it requires the input to be a NonEmpty list. No division-by-zero errors here, please!

Once I'd computed a pixel average and compared it to a threshold value, I wanted to replace it with either black or white. In order to make the code more readable I defined two named constants:

black :: PixelRGB8
black = PixelRGB8 minBound minBound minBound
white :: PixelRGB8
white = PixelRGB8 maxBound maxBound maxBound

With that in place, converting to black-and-white is only a few more lines of code:

toBW :: PixelRGB8 -> PixelRGB8
toBW (PixelRGB8 r g b) =
  let threshold = 192 :: Integer
      lum = average (fromIntegral r :| [fromIntegral g, fromIntegral b])
  in if lum <= threshold then black else white

I arrived at the threshold of 192 after a bit of trial-and-error. That's dark enough that the light vertical lines fall to the white side, while the real curve becomes black.

What remained was to glue the parts together to save the black-and-white file:

main :: IO ()
main = do
  readResult <- readImage "hype-cycle-cleaned.png"
  case readResult of
    Left msg -> putStrLn msg
    Right img -> do
      let bwImg = pixelMap toBW $ convertRGB8 img
      writePng "hype-cycle-bw.png" bwImg

The convertRGB8 function comes from JuicyPixels.

The hype-cycle-bw.png picture unsurprisingly looks like this:

Ultimately, I didn't need the black-and-white bitmap file. I just wrote the script to create the file in order to be able to get some insights into what I was doing. Trust me, I made a lot of stupid mistakes along the way, and among other issues had some 'fun' with integer overflows.

Extracting image coordinates #

Now I had a general feel for how to work with the JuicyPixels library. It still required quite a bit of spelunking through the documentation before I found a useful API to extract all the pixels from a bitmap:

pixelCoordinates :: Pixel a => Image a -> [((Int, Int), a)]
pixelCoordinates = pixelFold (\acc x y px -> ((x,y),px):acc) []

While this is, after all, just a one-liner, I'm surprised that something like this doesn't come in the box. It returns a list of tuples, where the first element contains the pixel coordinates (another tuple), and the second element the pixel information (e.g. the RGB value).

One y value per x value #

There were a few more issues to be addressed. The black curve in the black-and-white bitmap is thicker than a single pixel. This means that for each x value, there will be several black pixels. In order to do linear regression, however, we need a single y value per x value.

One easy way to address that concern is to calculate the average y value for each x value. This may not always be the best choice, but as far as we can see in the above black-and-white image, it doesn't look as though there's any noise left in the picture. This means that we don't have to worry about outliers pulling the average value away from the curve. In other words, finding the average y value is an easy way to get what we need.

averageY :: Integral b => NonEmpty (a, b) -> (a, b)
averageY nel = (fst $ NE.head nel, average $ snd <$> nel)

The averageY function converts a NonEmpty list of tuples to a single tuple. Watch out! The input tuples are not the 'outer' tuples that pixelCoordinates returns, but rather a list of actual pixel coordinates. Each tuple is a set of coordinates, but since the function never manipulates the x coordinate, the type of the first element is just unconstrained a. It can literally be anything, but will, in practice, be an integer.

The assumption is that the input is a small list of coordinates that all share the same x coordinate, such as (42, 99) :| [(42, 100), (42, 102)]. The function simply returns a single tuple that it creates on the fly. For the first element of the return tuple, it picks the head tuple from the input ((42, 99) in the example), and then that tuple's fst element (42). For the second element, the function averages all the snd elements (99, 100, and 102) to get 100 (integer division, you may recall):

ghci> averageY ((42, 99) :| [(42, 100), (42, 102)])
(42,100)

What remains is to glue together the building blocks.

Extracting curve coordinates #

A few more steps were required, but these I just composed in situ. I found no need to define them as individual functions.

The final composition looks like this:

main :: IO ()
main = do
  readResult <- readImage "hype-cycle-cleaned.png"
  case readResult of
    Left msg -> putStrLn msg
    Right img -> do
      let bwImg = pixelMap toBW $ convertRGB8 img
      let blackPixels =
            fst <$> filter ((black ==) . snd) (pixelCoordinates bwImg)
      let h = imageHeight bwImg
      let lineCoords = fmap (h -) . averageY <$> NE.groupAllWith fst blackPixels
      writeFile "coords.txt" $
        unlines $ (\(x,y) -> show x ++ "," ++ show y) <$> lineCoords

The first lines of code, until and including let bwImg, are identical to what you've already seen.

We're only interested in the black pixels, so the main action uses the standard filter function to keep only those that are equal to the black constant value. Once the white pixels are gone, we no longer need the pixel information. The expression that defines the blackPixels value finally (remember, you read Haskell code from right to left) throws away the pixel information by only retaining the fst element. That's the tuple that contains the coordinates. You may want to refer back to the type signature of pixelCoordinates to see what I mean.

The blackPixels value has the type [(Int, Int)].

Two more things need to happen. One is to group the pixels together per x value so that we can use averageY. The other is that we want the coordinates as normal Cartesian coordinates, and right now, they're in screen coordinates.

When working with bitmaps, it's quite common that pixels are measured out from the top left corner, instead of from the bottom left corner. It's not difficult to flip the coordinates, but we need to know the height of the image:

let h = imageHeight bwImg

The imageHeight function is another JuicyPixels function.

Because I sometimes get carried away, I write the code in a 'nice' compact style that could be more readable. I accomplished both of the above remaining tasks with a single line of code:

let lineCoords = fmap (h -) . averageY <$> NE.groupAllWith fst blackPixels

This first groups the coordinates according to x value, so that all coordinates that share an x value are collected in a single NonEmpty list. This means that we can map all of those groups over averageY. Finally, the expression flips from screen coordinates to Cartesian coordinates by subtracting the y coordinate from the height h.

The final writeFile expression writes the coordinates to a text file as comma-separated values. The first ten lines of that file looks like this:

9,13
10,13
11,13
12,14
13,15
14,15
15,16
16,17
17,17
18,18
...

Do these points plot the Gartner hype cycle?

Sanity checking by plotting the coordinates #

To check whether the coordinates look useful, we could plot them. If I wanted to use a few more hours, I could probably figure out how to do that with JuicyPixels as well, but on the other hand, I already know how to do that with Python:

data = numpy.loadtxt('coords.txt', delimiter=',')
x = data[:, 0]
t = data[:, 1]
plt.scatter(x, t, s=10, c='g')
plt.show()

That produces this plot:

LGTM.

Conclusion #

In this article, you've seen how a single Haskell script can extract curve coordinates from a bitmap. The file is 41 lines all in all, including module declaration and white space. This article shows every single line in that file, apart from some blank lines.

I loaded the file into GHCi and ran the main action in order to produce the CSV file.

I did spend a few hours looking around in the JuicyPixels documentation before I'd identified the functions that I needed. All in all I used some hours on this exercise. I didn't keep track of time, but I guess that I used more than three, but probably fewer than six, hours on this.

This was the successful part of the overall exercise. Now onto the fiasco.

Next: Fitting a polynomial to a set of points.

This blog is totally free, but if you like it, please consider supporting it.

Trying to fit the hype cycle

2024-04-01T07:14:00+00:00

An amateur tries his hand at linear modelling.

About a year ago, I was contemplating a conference talk I was going to give. Although I later abandoned the idea for other reasons, for a few days I was thinking about using the Gartner hype cycle for an animation. What I had in mind would require me to draw the curve in a way that would enable me to zoom in and out. Vector graphics would be much more useful for that job than a bitmap.

Along the way, I considered if there was a function that would enable me to draw it on the fly. A few web searches revealed the Cross Validated question Is there a linear/mixture function that can fit the Gartner hype curve? So I wasn't the first person to have that idea, but at the time I found it, the question was effectively dismissed without a proper answer. Off topic, dontcha know?

A web search also seems to indicate the existence of a few research papers where people have undertaken this task, but there's not a lot about it. True, the Gartner hype cycle isn't a real function, but it sounds like a relevant exercise in statistics, if one's into that kind of thing.

Eventually, for my presentation, I went with another way to illustrate what I wanted to say, so for half I year, I didn't think more about it.

Linear regression? #

Recently, however, I was following a course in mathematical analysis of data, and among other things, I learned how to fit a line to data. Not just a straight line, but any degree of polynomial. So I thought that perhaps it'd be an interesting exercise to see if I could fit the hype cycle to some high-degree polynomial - even though I do realize that the hype cycle isn't a real function, and neither does it look like a straight polynomial function.

In order to fit a polynomial to the curve, I needed some data, so my first task was to convert an image to a series of data points.

I'm sure that there are online tools and apps that offer to do that for me, but the whole point of this was that I wanted to learn how to tackle problems like these. It's like doing katas. The journey is the goal.

This turned out to be an exercise consisting of two phases so distinct that I wrote them in two different languages.

As the articles will reveal, the first part went quite well, while the other was, essentially, a fiasco.

Conclusion #

There's not much point in finding a formula for the Gartner hype cycle, but the goal of this exercise was, for me, to tinker with some new techniques to see if I could learn from doing the exercise. And I did learn something.

In the next articles in this series, I'll go over some of the details.

Next: Extracting curve coordinates from a bitmap.

This blog is totally free, but if you like it, please consider supporting it.

Services are autonomous

2024-03-25T08:31:00+00:00

A reading of the second Don Box tenet, with some commentary.

In this article, I'll go over the second tenet. The quotes are from the MSDN Magazine article unless otherwise indicated.

Services are autonomous #

Compared with the first tenet, you'll see that Don Box had more to say about this one. I, conversely, have less to add. First, here's what the article said:

Service-orientation mirrors the real world in that it does not assume the presence of an omniscient or omnipotent oracle that has awareness and control over all parts of a running system. This notion of service autonomy appears in several facets of development, the most obvious place being the area of deployment and versioning.

Object-oriented programs tend to be deployed as a unit. Despite the Herculean efforts made in the 1990s to enable classes to be independently deployed, the discipline required to enable object-oriented interaction with a component proved to be impractical for most development organizations. When coupled with the complexities of versioning object-oriented interfaces, many organizations have become extremely conservative in how they roll out object-oriented code. The popularity of the XCOPY deployment and private assemblies capabilities of the .NET Framework is indicative of this trend.

Service-oriented development departs from object-orientation by assuming that atomic deployment of an application is the exception, not the rule. While individual services are almost always deployed atomically, the aggregate deployment state of the overall system/application rarely stands still. It is common for an individual service to be deployed long before any consuming applications are even developed, let alone deployed into the wild. Amazon.com is one example of this build-it-and-they-will-come philosophy. There was no way the developers at Amazon could have known the multitude of ways their service would be used to build interesting and novel applications.

It is common for the topology of a service-oriented application to evolve over time, sometimes without direct intervention from an administrator or developer. The degree to which new services may be introduced into a service-oriented system depends on both the complexity of the service interaction and the ubiquity of services that interact in a common way. Service-orientation encourages a model that increases ubiquity by reducing the complexity of service interactions. As service-specific assumptions leak into the public facade of a service, fewer services can reasonably mimic that facade and stand in as a reasonable substitute.

The notion of autonomous services also impacts the way failures are handled. Objects are deployed to run in the same execution context as the consuming application. Service-oriented designs assume that this situation is the exception, not the rule. For that reason, services expect that the consuming application can fail without notice and often without any notification. To maintain system integrity, service-oriented designs use a variety of techniques to deal with partial failure modes. Techniques such as transactions, durable queues, and redundant deployment and failover are quite common in a service-oriented system.

Because many services are deployed to function over public networks (such as the Internet), service-oriented development assumes not only that incoming message data may be malformed but also that it may have been transmitted for malicious purposes. Service-oriented architectures protect themselves by placing the burden of proof on all message senders by requiring applications to prove that all required rights and privileges have been granted. Consistent with the notion of service autonomy, service-oriented architectures invariably rely on administratively managed trust relationships in order to avoid per-service authentication mechanisms common in classic Web applications.

Again, I'd like to highlight how general these ideas are. Once lifted out of the context of Windows Communication Foundation, all of this applies more broadly.

Perhaps a few details now seem dated, but in general I find that this description holds up well.

Wildlife #

It's striking that someone in 2004 observed that big, complex, coordinated releases are impractical. Even so, it doesn't seem as though adopting a network-based technology and architecture in itself solves that problem. I wrote about that in 2012, and I've seen Adam Ralph make a similar observation. Many organizations inadvertently create distributed monoliths. I think that this often stems from a failure of heeding the tenet that services are autonomous.

I've experienced the following more than once. A team of developers rely on a service. As they take on a new feature, they realize that the way things are currently modelled prevents them from moving forward. Typical examples include mismatched cardinalities. For example, a customer record has a single active address, but the new feature requires that customers may have multiple active addresses. It could be that a customer has a permanent address, but also a summerhouse.

It is, however, the other service that defines how customer addresses are modelled, so the development team contacts the service team to discuss a breaking change. The service team agrees to the breaking change, but this means that the service and the relying client team now have to coordinate when they deploy the new versions of their software. The service is no longer autonomous.

I've already discussed this kind of problem in a previous article, and as Don Box also implied, this discussion is related to the question of versioning, which we'll get back to when covering the fourth tenet.

Transactions #

It may be worthwhile to comment on this sentence:

Techniques such as transactions, durable queues, and redundant deployment and failover are quite common in a service-oriented system.

Indeed, but particularly regarding database transactions, a service may use them internally (typically leveraging a database engine like SQL Server, Oracle, PostgreSQL, etc.), but not across services. Around the time Don Box wrote the original MSDN Magazine article an extension to SOAP colloquially known as WS-Death Star was in the works, and it included WS Transaction.

I don't know whether Don Box had something like this in mind when he wrote the word transaction, but in my experience, you don't want to go there. If you need to, you can make use of database transactions to keep your own service ACID-consistent, but don't presume that this is possible with multiple autonomous services.

As always, even if a catchphrase such as services are autonomous sounds good, it's always illuminating to understand that there are trade-offs involved - and what they are. Here, a major trade-off is that you need to think about error-handling in a different way. If you don't already know how to address such concerns, look up lock-free transactions and eventual consistency. As Don Box also mentioned, durable queues are often part of such a solution, as is idempotence.

Validation #

From this discussion follows that an autonomous service should, ideally, exist independently of the software ecosystem in which it exists. While an individual service can't impose its will on its surroundings, it can, and should, behave in a consistent and correct manner.

This does include deliberate consistency for the service itself. An autonomous service may make use of ACID or eventual consistency as the service owner deems appropriate.

It should also treat all input as suspect, until proven otherwise. Input validation is an important part of service design. It is my belief that validation is a solved problem, but that doesn't mean that you don't have to put in the work. You should consider correctness, versioning, as well as Postel's law.

Security #

A similar observation relates to security. Some services (particularly read-only services) may allow for anonymous access, but if a service needs to authenticate or authorize requests, consider how this is done in an autonomous manner. Looking up account information in a centralized database isn't the autonomous way. If a service does that, it now relies on the account database, and is no longer autonomous.

Instead, rely on claims-based identity. In my experience, OAuth with JWT is usually fine.

If your service needs to know something about the user that only an external source can tell it, don't look it up in an external system. Instead, demand that it's included in the JWT as a claim. Do you need to validate the age of the user? Require a date-of-birth or age claim. Do you need to know if the request is made on behalf of a system administrator? Demand a list of role claims.

Conclusion #

The second of Don Box's four tenets of SOA state that services should be autonomous. At first glance, you may think that all this means is that a service shouldn't share its database with another service. That is, however, a minimum bar. You need to consider how a service exists in an environment that it doesn't control. Again, the wildlife metaphor seems apt. Particularly if your service is exposed to the internet, it lives in a hostile environment.

Not only should you consider all input belligerent, you must also take into account that friendly systems may disappear or change. Your service exists by itself, supported by itself, relying on itself. If you need to coordinate work with other service owners, that's a strong hint that your service isn't, after all, autonomous.

Next: Services share schema and contract, not class.

This blog is totally free, but if you like it, please consider supporting it.

Program	→	Function⁺
Function	→	Fid Pattern⁺ = Exp
Pattern	→	Vid \| `true` \| `false` \| (Pattern, Pattern)
Exp	→	Vid \| `true` \| `false` \| Fid Exp⁺ \| (Exp)