Repeatable execution in Haskell

A way to figure out what to log, and what not to log, using Haskell.

This article is part of a series of articles about repeatable execution. The previous article argued that if you've logged the impure actions that a system made, you have enough information to reproduce what happened.

In most languages, it's difficult to discriminate between pure functions and impure actions, but Haskell explicitly makes that distinction. I often use it for proof of concepts for that reason. I'll do that here as well.

This proof of concept is mostly to verify what a decade of functional programming has already taught me. For the functionality that the previous article introduced, the impure actions involve a database and the system clock.

The code shown in this article is available on GitHub.

Pure interactions #

I'll use free monads to model impure interactions as pure functions. For this particular example code base, an impureim sandwich would have been sufficient. I do, however, get the impression that many readers find it hard to extrapolate from impureim sandwiches to a general architecture. For the benefit of those readers, the example uses free monads.

The system clock interaction is the simplest:

newtype ClockInstruction next = CurrentTime (LocalTime -> next) deriving Functor

There's only one instruction. It takes no input, but returns the current time and date.

For database interactions, I went through a few iterations and arrived at this set of instructions:

data ReservationsInstruction next =
    ReadReservation UUID (Maybe Reservation -> next)
  | ReadReservations LocalTime ([Reservation] -> next)
  | CreateReservation Reservation next
  deriving Functor

There's two queries and a command. The intent with the CreateReservation command is to create a new reservation row in the database. The two queries fetch a single reservation based on ID, or a set of reservations based on a date. A central type for this instruction set is Reservation:

data Reservation = Reservation
  { reservationId :: UUID
  , reservationDate :: LocalTime
  , reservationName :: String
  , reservationEmail :: String
  , reservationQuantity :: Int
  } deriving (Eq, Show, Read, Generic)

The program has to interact both with the system clock and the database, so ultimately it turned out to be useful to combine these two instruction sets into one:

type ReservationsProgram = Free (Sum ReservationsInstruction ClockInstruction)

I used the Sum functor to combine the two instruction sets, and then turned them into a Free monad.

With free monads, I find that my code becomes more readable if I define helper functions for each instruction:

readReservation :: UUID -> ReservationsProgram (Maybe Reservation)
readReservation rid = liftF $ InL $ ReadReservation rid id
 
readReservations :: LocalTime -> ReservationsProgram [Reservation]
readReservations t = liftF $ InL $ ReadReservations t id
 
createReservation :: Reservation -> ReservationsProgram ()
createReservation r = liftF $ InL $ CreateReservation r ()
 
currentTime :: ReservationsProgram LocalTime
currentTime = liftF $ InR $ CurrentTime id

There's much else going on in the code base, but that's how I model feature-specific impure actions.

Receive a reservation #

The central feature of the service is to receive and handle an HTTP POST request, as described in the introductory article. When a document arrives it triggers a series of non-trivial work:

The service validates the input data. Among other things, it checks that the reservation is in the future. It uses currentTime for this.
It queries the database for existing reservations. It uses readReservations for this.
It uses complex business logic to determine whether to accept the reservation. This essentially implements the Maître d' kata.
If it accepts the reservation, it stores it. It uses createReservation for this.

These steps manifest as this function:

tryAccept :: NominalDiffTime
          -> [Table]
          -> Reservation
          -> ExceptT (APIError ByteString) ReservationsProgram ()
tryAccept seatingDuration tables r = do
  now <- lift currentTime
  _ <- liftEither $ validateReservation now r
  reservations <-
    fmap (removeNonOverlappingReservations seatingDuration r) <$>
    lift $ readReservations $ reservationDate r
 
  _ <- liftEither $ canAccommodateReservation tables reservations r
 
  lift $ createReservation r

If you're interested in details, the code is available on GitHub. I may later write other articles about interesting details.

In the context of repeatable execution and logging, the key is that this is a pure function. It does, however, return a ReservationsProgram (free monad), so it's not going to do anything until interpreted. The interpreters are impure, so this is where logging has to take place.

HTTP API #

The above tryAccept function is decoupled from boundary concerns. It has little HTTP-specific functionality.

I've written the actual HTTP API using Servant. The following function translates the above Domain Model to an HTTP API:

type ReservationsProgramT = FreeT (Sum ReservationsInstruction ClockInstruction)
 
reservationServer :: NominalDiffTime
                  -> [Table]
                  -> ServerT ReservationAPI (ReservationsProgramT Handler)
reservationServer seatingDuration tables = getReservation :<|> postReservation
  where
    getReservation rid = do
      mr <- toFreeT $ readReservation rid
      case mr of
        Just r -> return r
        Nothing -> throwError err404
    postReservation r = do
      e <- toFreeT $ runExceptT $ tryAccept seatingDuration tables r
      case e of
        Right () -> return ()
        Left (ValidationError err) -> throwError $ err400 { errBody = err }
        Left  (ExecutionError err) -> throwError $ err500 { errBody = err }

This API also exposes a reservation as a resource you can query with a GET request, but I'm not going to comment much on that. It uses the above readReservation helper function, but there's little logic involved in the implementation.

The above reservationServer function implements, by the way, only a partial API. It defines the /reservations resource, as explained in the overview article. Its type is defined as:

type ReservationAPI =
       Capture "reservationId" UUID :> Get '[JSON] Reservation
  :<|> ReqBody '[JSON] Reservation :> Post '[JSON] ()

That's just one resource. Servant enables you define many resources and combine them into a larger API. For this example, the /reservations resource is all there is, so I define the entire API like this:

type API = "reservations" :> ReservationAPI

You can also define your complete server from several partial services, but in this example, I only have one:

server = reservationServer

Had I had more resources, I could have combined several values with a combinator, but now that I have only reservationServer it seems redundant, I admit.

Hosting the API #

The reservationServer function, and thereby also server, returns a ServerT value. Servant ultimately demands a Server value to serve it. We need to transform the ServerT value into a Server value, which we can do with hoistServer:

runApp :: String -> Int -> IO ()
runApp connStr port = do
  putStrLn $ "Starting server on port " ++ show port ++ "."
  putStrLn "Press Ctrl + C to stop the server."
  ls <- loggerSet
  let logLn s = pushLogStrLn ls $ toLogStr s
  let hoistSQL = hoistServer api $ runInSQLServerAndOnSystemClock logLn $ pack connStr
  (seatingDuration, tables) <- readConfig
  logHttp <- logHttpMiddleware ls
  run port $ logHttp $ serve api $ hoistSQL $ server seatingDuration tables

The hoistServer function enables you to translate a ServerT api m into a ServerT api n value. Since Server is a type alias for ServerT api Handler, we need to translate the complicated monad returned from server into a Handler. The runInSQLServerAndOnSystemClock function does most of the heavy lifting.

You'll also notice that the runApp function configures some logging. Apart from some HTTP-level middleware, the logLn function logs a line to a text file. The runApp function passes it as an argument to the runInSQLServerAndOnSystemClock function. We'll return to logging later in this article, but first I find it instructive to outline what happens in runInSQLServerAndOnSystemClock.

As the name implies, two major actions take place. The function interprets database interactions by executing impure actions against SQL Server. It also interprets clock interactions by querying the system clock.

Using the system clock #

The system-clock-based interpreter is the simplest of the two interpreters. It interprets ClockInstruction values by querying the system clock for the current time:

runOnSystemClock :: MonadIO m => ClockInstruction (m a) -> m a
runOnSystemClock (CurrentTime next) = liftIO (zonedTimeToLocalTime <$> getZonedTime) >>= next

This function translates a ClockInstruction (m a) to an m a value by executing the impure getZonedTime function. From the returned ZonedTime value, it then extracts the local time, which it passes to next.

You may have two questions:

Why map ClockInstruction (m a) instead of ClockInstruction a?
Why MonadIO?

I'll address each in turn.

My ultimate goal with each of these interpreters is to compose them into runInSQLServerAndOnSystemClock. As described above, this function transforms ServerT API (ReservationsProgramT Handler) into a ServerT API Handler (also known as Server API). Another way to put this is that we need to collapse ReservationsProgramT Handler to Handler by, so to speak, removing ReservationsProgramT.

Recall that a type like ReservationsProgramT Handler is really in 'curried' form. This is actually the parametrically polymorphic type ReservationsProgramT Handler a. Likewise, Handler is also parametrically polymorphic: Handler a. What we need, then, is a function with the type ReservationsProgramT Handler a -> Handler a or, more generally, FreeT f m a -> m a. This follows because ReservationsProgramT is an alias for FreeT ..., and Handler is a container of a values.

There's a function for that in Control.Monad.Trans.Free called iterT:

iterT :: (Functor f, Monad m) => (f (m a) -> m a) -> FreeT f m a -> m a

This fits our need. For each of the functors in ReservationsProgramT, then, we need a function f (m a) -> m a. Specifically, for ClockInstruction, we need to define a function with the type ClockInstruction (Handler a) -> Handler a. Consider, however, the definition of Handler. It's a newtype over a newtype, so much wrapping is required. If I specifically wanted to return that explicit type, I'd have to take the IO vale produced by getZonedTime and wrap it in Handler, which would require me to first wrap it in ExceptT, which again would require me to wrap it in Either. That's a lot of bother, but Handler is also a MonadIO instance, and that elegantly sidesteps the issue. By implementing runOnSystemClock with liftIO, it works for all MonadIO instances, including Handler.

Hopefully, that explains why runOnSystemClock has the type that it has.

Using the database #

The database interpreter is more complex than runOnSystemClock, but it follows the same principles. The reasoning outlined above also apply here.

runInSQLServer :: MonadIO m => Text -> ReservationsInstruction (m a) -> m a
runInSQLServer connStr (ReadReservation rid next) =
  liftIO (readReservation connStr rid) >>= next
runInSQLServer connStr (ReadReservations t next) =
  liftIO (readReservations connStr t) >>= next
runInSQLServer connStr (CreateReservation r next) =
  liftIO (insertReservation connStr r) >> next

Since ReservationsInstruction is a sum type with three cases, the runInSQLServer action has to handle all three. Each case calls a dedicated helper function. I'll only show one of these to give you a sense for how they look.

readReservations :: Text -> LocalTime -> IO [Reservation]
readReservations connStr (LocalTime d _) =
  let sql =
        "SELECT [Guid], [Date], [Name], [Email], [Quantity]\
        \FROM [dbo].[Reservations]\
        \WHERE CONVERT(DATE, [Date]) = " <> toSql d
  in withConnection connStr $ \conn -> fmap unDbReservation <$> query conn sql

You can see all the details about withConnection, unDbReservation, etcetera in the Git repository. The principal point is that these are just normal IO actions.

Basic composition #

The two interpreters are all we need to compose a working system:

runInSQLServerAndOnSystemClock :: MonadIO m => Text -> ReservationsProgramT m a -> m a
runInSQLServerAndOnSystemClock connStr = iterT go
  where go (InL rins) = DB.runInSQLServer connStr rins
        go (InR cins) = runOnSystemClock cins

The iterT function enables you to interpret a FreeT value, of which ReservationsProgramT is an alias. The go function just pattern-matches on the two cases of the Sum functor, and delegates to the corresponding interpreter.

This composition enables the system to run and do the intended work. You can start the server and make GET and POST requests against the /reservations resource, as outlined in the first article in this small series.

This verifies what I already hypothesized. This feature set requires two distinct sets of impure interactions:

Getting the current time
Querying and writing to a database

Once you've worked with Haskell for some time, you'll get good at predicting which actions are impure, and which functionality can be kept pure. The current result isn't surprising.

It does make it clear what ought to be logged. All the pure functionality can be reproduced if you have the inputs. You only need to log the impure interactions, and now you know what they are.

Compose with logging #

You need to log the impure operations, and you know that they're interacting with the system clock and the database. As usual, starting with the system clock is most accessible. You can write what's essentially a Decorator of any ClockInstruction interpreter:

logClock :: MonadIO m
         => (String -> IO ())
         -> (forall x. ClockInstruction (m x) -> m x)
         -> ClockInstruction (m a) -> m a
logClock logLn inner (CurrentTime next) = do
  output <- inner $ CurrentTime return
  liftIO $ writeLogEntry logLn "CurrentTime" () output
  next output

The logClock action decorates any inner interpreter with the logging action logLn. It returns an action of the same type as it decorates.

It relies on a helper function called writeLogEntry, which handles some of the formalities of formatting and time-stamping each log entry.

You can decorate any database interpreter in the same way:

logReservations :: MonadIO m
                => (String -> IO ())
                -> (forall x. ReservationsInstruction (m x) -> m x)
                -> ReservationsInstruction (m a) -> m a
logReservations logLn inner (ReadReservation rid next) = do
  output <- inner $ ReadReservation rid return
  liftIO $ writeLogEntry logLn "ReadReservation" rid output
  next output
logReservations logLn inner (ReadReservations t next) = do
  output <- inner $ ReadReservations t return
  liftIO $ writeLogEntry logLn "ReadReservations" t output
  next output
logReservations logLn inner (CreateReservation r next) = do
  output <- inner $ CreateReservation r (return ())
  liftIO $ writeLogEntry logLn "CreateReservation" r output
  next

The logReservations action follows the same template as logClock; only it has more lines of code because ReservationsInstruction is a discriminated union with three cases.

With these Decorator actions you can change the application composition so that it logs all impure inputs and outputs:

runInSQLServerAndOnSystemClock :: MonadIO m
                               => (String -> IO ())
                               -> Text
                               -> ReservationsProgramT m a -> m a
runInSQLServerAndOnSystemClock logLn connStr = iterT go
  where go (InL rins) = logReservations logLn (DB.runInSQLServer connStr) rins
        go (InR cins) = logClock logLn runOnSystemClock cins

This not only implements the desired functionality, but also Goldilogs: not too little, not too much, but just what you need. Notice that I didn't have to change any of my Domain Model or HTTP-specific code to enable logging. This cross-cutting concern is enabled entirely via composition.

Repeatability #

An HTTP request like this:

POST /reservations/ HTTP/1.1
Content-Type: application/json

{
  "id": "c3cbfbc7-6d64-4ead-84ef-7f89de5b7e1c",
  "date": "2020-03-20 19:00:00",
  "name": "Elissa Megan Powers",
  "email": "emp@example.com",
  "quantity": 3
}

produces a series of log entries like these:

LogEntry {logTime = 2019-12-29 20:21:53.0029235 UTC, logOperation = "CurrentTime", logInput = "()", logOutput = "2019-12-29 21:21:53.0029235"}
LogEntry {logTime = 2019-12-29 20:21:54.0532677 UTC, logOperation = "ReadReservations", logInput = "2020-03-20 19:00:00", logOutput = "[]"}
LogEntry {logTime = 2019-12-29 20:21:54.0809254 UTC, logOperation = "CreateReservation", logInput = "Reservation {reservationId = c3cbfbc7-6d64-4ead-84ef-7f89de5b7e1c, reservationDate = 2020-03-20 19:00:00, reservationName = \"Elissa Megan Powers\", reservationEmail = \"emp@example.com\", reservationQuantity = 3}", logOutput = "()"}
LogEntry {logTime = 2019-12-29 20:21:54 UTC, logOperation = "PostReservation", logInput = "\"{ \\\"id\\\": \\\"c3cbfbc7-6d64-4ead-84ef-7f89de5b7e1c\\\", \\\"date\\\": \\\"2020-03-20 19:00:00\\\", \\\"name\\\": \\\"Elissa Megan Powers\\\", \\\"email\\\": \\\"emp@example.com\\\", \\\"quantity\\\": 3 }\"", logOutput = "()"}

This is only a prototype to demonstrate what's possible. In an attempt to make things simple for myself, I decided to just log data by using the Show instance of each value being logged. In order to reproduce behaviour, I'll rely on the corresponding Read instance for the type. This was probably naive, and not a decision I would employ in a production system, but it's good enough for a prototype.

For example, the above log entry states that the CurrentTime instruction was evaluated and that the output was 2019-12-29 21:21:53.0029235. Second, the ReadReservations instruction was evaluated with the input 2020-03-20 19:00:00 and the output was the empty list ([]). The third line records that the CreateReservation instruction was evaluated with a particular input, and that the output was ().

The fourth and final record is the the actual values observed at the HTTP boundary.

You can load and parse the logged data into a unit test or an interactive session:

λ> l <- lines <$> readFile "the/path/to/the/log.txt"
λ> replayData = readReplayData l
λ> replayData
ReplayData {
  observationsOfPostReservation =
    [Reservation {
      reservationId = c3cbfbc7-6d64-4ead-84ef-7f89de5b7e1c,
      reservationDate = 2020-03-20 19:00:00,
      reservationName = "Elissa Megan Powers",
      reservationEmail = "emp@example.com",
      reservationQuantity = 3}],
  observationsOfRead = fromList [],
  observationsOfReads = fromList [(2020-03-20 19:00:00,[[]])],
  observationsOfCurrentTime = [2019-12-29 21:21:53.0029235]}
λ> r = head $ observationsOfPostReservation replayData
λ> r
Reservation {
  reservationId = c3cbfbc7-6d64-4ead-84ef-7f89de5b7e1c,
  reservationDate = 2020-03-20 19:00:00,
  reservationName = "Elissa Megan Powers",
  reservationEmail = "emp@example.com",
  reservationQuantity = 3}

(I've added line breaks and indentation to some of the output to make it more readable, compared to what GHCi produces.)

The most important thing to notice is the readReplayData function that parses the log file into Haskell data. I've also written a prototype of a function that can replay the actions as they happened:

λ> (seatingDuration, tables) <- readConfig
λ> replay replayData $ tryAccept seatingDuration tables r
Right ()

The original HTTP request returned 200 OK and that's exactly how reservationServer translates a Right () result. So the above interaction is a faithful reproduction of what actually happened.

Replay #

You may have noticed that I used a replay function above. This is only a prototype to get the point across. It's just another interpreter of ReservationsProgram (or, rather an ExceptT wrapper of ReservationsProgram):

replay :: ReplayData -> ExceptT e ReservationsProgram a -> Either e a
replay d = replayImp d . runExceptT
  where
    replayImp :: ReplayData -> ReservationsProgram a -> a
    replayImp rd p = State.evalState (iterM go p) rd
    go (InL (ReadReservation rid next)) = replayReadReservation rid >>= next
    go (InL (ReadReservations t next)) = replayReadReservations t >>= next
    go (InL (CreateReservation _ next)) = next
    go (InR (CurrentTime next)) = replayCurrentTime >>= next

While this is compact Haskell code that I wrote, I still found it so abstruse that I decided to add a type annotation to a local function. It's not required, but I find that it helps me understand what replayImp does. It uses iterM (a cousin to iterT) to interpret the ReservationsProgram. The entire interpretation is stateful, so runs in the State monad. Here's an example:

replayCurrentTime :: State ReplayData LocalTime
replayCurrentTime = do
  xs <- State.gets observationsOfCurrentTime
  let (observation:rest) = xs
  State.modify (\s -> s { observationsOfCurrentTime = rest })
  return observation

The replayCurrentTime function replays log observations of CurrentTime instructions. The observationsOfCurrentTime field is a list of observed values, parsed from a log. A ReservationsProgram might query the CurrentTime multiple times, so there could conceivably be several such observations. The idea is to replay them, starting with the earliest.

Each time the function replays an observation, it should remove it from the log. It does that by first retrieving all observations from the state. It then pattern-matches the observation from the rest of the observations. I execute my code with the -Wall option, so I'm puzzled that I don't get a warning from the compiler about that line. After all, the xs list could be empty. This is, however, prototype code, so I decided to ignore that issue.

Before the function returns the observation it updates the replay data by effectively removing the observation, but without touching anything else.

The replayReadReservation and replayReadReservations functions follow the same template. You can consult the source code repository if you're curious about the details. You may also notice that the go function doesn't do anything when it encounters a CreateReservation instruction. This is because that instruction has no return value, so there's no reason to consult a log to figure out what to return.

Summary #

The point of this article was to flesh out a fully functional feature (a vertical slice, if you're so inclined) in Haskell, in order to verify that the only impure actions involved are:

Getting the current time
Interacting with the application database

This turns out to be the case.

Furthermore, prototype code demonstrates that based on a log of impure interactions, you can repeat the logged execution.

Now that we know what is impure and what can be pure, we can reproduce the same architecture in C# (or another mainstream programming language).

Next: Repeatable execution in C#.

Comments

Jiehong #

The Free Monad, as any monad, enforces sequential operations.

How would you deal with having to sent multiple transactions (let's say to the db and via http), while also retrying n times if it fails?

2020-04-06 9:38 UTC

Mark Seemann #

Jiehong, thank you for writing. I'm not sure that I can give you a complete answer, as this is something that I haven't experimented with in Haskell.

In C#, on the other hand, you can implement stability patterns like Circuit Breaker and retries with Decorators. I don't see why you can't do that in Haskell as well.

2020-04-10 10:15 UTC

Published: Monday, 30 March 2020 08:02:00 UTC

Repeatable execution in Haskell by Mark Seemann