Which Semigroup best addresses the twist in the previous article?

This article is part of a short article series on applicative validation with a twist. The twist is that validation, when it fails, should return not only a list of error messages; it should also retain that part of the input that was valid.

In this article, I'll show how I did a quick proof of concept in Haskell.

Data definitions #

You can't use the regular Either instance of Applicative for validation because it short-circuits on the first error. In other words, you can't collect multiple error messages, even if the input has multiple issues. Instead, you need a custom Applicative instance. You can easily write such an instance yourself, but there are a couple of libraries that already do this. For this prototype, I chose the validation package.

import Data.Bifunctor
import Data.Time
import Data.Semigroup
import Data.Validation

Apart from importing Data.Validation, I also need a few other imports for the proof of concept. All of them are well-known. I used no language extensions.

For the proof of concept, the input is a triple of a name, a date of birth, and an address:

data Input = Input {
  inputName :: Maybe String,
  inputDoB :: Maybe Day,
  inputAddress :: Maybe String }
  deriving (EqShow)

The goal is actually to parse (not validate) Input into a safer data type:

data ValidInput = ValidInput {
  validName :: String,
  validDoB :: Day,
  validAddress :: String }
  deriving (EqShow)

If parsing/validation fails, the output should report a collection of error messages and return the Input value with any valid data retained.

Looking for a Semigroup #

My hypothesis was that validation, even with that twist, can be implemented elegantly with an Applicative instance. The validation package defines its Validation data type such that it's an Applicative instance as long as its error type is a Semigroup instance:

Semigroup err => Applicative (Validation err)

The question is: which Semigroup can we use?

Since we need to return both a list of error messages and a modified Input value, it sounds like we'll need a product type of some sorts. A tuple will do; something like (Input, [String]). Is that a Semigroup instance, though?

Tuples only form semigroups if both elements give rise to a semigroup:

(Semigroup a, Semigroup b) => Semigroup (a, b)

The second element of my candidate is [String], which is fine. Lists are Semigroup instances. But what about Input? Can we somehow combine two Input values into one? It's not entirely clear how we should do that, so that doesn't seem too promising.

What we need to do, however, is to take the original Input and modify it by (optionally) resetting one or more fields. In other words, a series of functions of the type Input -> Input. Aha! There's the semigroup we need: Endo Input.

So the Semigroup instance we need is (Endo Input, [String]), and the validation output should be of the type Validation (Endo Input, [String]) a.

Validators #

Cool, we can now implement the validation logic; a function for each field, starting with the name:

validateName :: Input -> Validation (Endo Input, [String]) String
validateName (Input (Just name) _ _) | length name > 3 = Success name
validateName (Input (Just _) _ _) =
  Failure (Endo $ \x -> x { inputName = Nothing }, ["no bob and toms allowed"])
validateName _ = Failure (mempty, ["name is required"])

This function reproduces the validation logic implied by the forum question that started it all. Notice, particularly, that when the name is too short, the endomorphism resets inputName to Nothing.

The date-of-birth validation function works the same way:

validateDoB :: Day -> Input -> Validation (Endo Input, [String]) Day
validateDoB now (Input _ (Just dob) _) | addGregorianYearsRollOver (-12) now < dob =
  Success dob
validateDoB _ (Input _ (Just _) _) =
  Failure (Endo $ \x -> x { inputDoB = Nothing }, ["get off my lawn"])
validateDoB _ _ = Failure (mempty, ["dob is required"])

Again, the validation logic is inferred from the forum question, although I found it better keep the function pure by requiring a now argument.

The address validation is the simplest of the three validators:

validateAddress :: Monoid a => Input -> Validation (a, [String]) String
validateAddress (Input _ _ (Just a)) = Success a
validateAddress _ = Failure (mempty, ["add1 is required"])

This one's return type is actually more general than required, since I used mempty instead of Endo id. This means that it actually works for any Monoid a, which also includes Endo Input.

Composition #

All three functions return Validation (Endo Input, [String]), which has an Applicative instance. This means that we should be able to compose them together to get the behaviour we're looking for:

validateInput :: Day -> Input -> Either (Input, [String]) ValidInput
validateInput now args =
  toEither $
  first (first (`appEndo` args)) $
  ValidInput <$> validateName args <*> validateDoB now args <*> validateAddress args

That compiles, so it probably works.

Sanity check #

Still, it'd be prudent to check. Since this is only a proof of concept, I'm not going to set up a test suite. Instead, I'll just start GHCi for some ad-hoc testing:

λ> now <- localDay <&> zonedTimeToLocalTime <&> getZonedTime
λ> validateInput now & Input Nothing Nothing Nothing
Left (Input {inputName = Nothing, inputDoB = Nothing, inputAddress = Nothing},
      ["name is required","dob is required","add1 is required"])
λ> validateInput now & Input (Just "Bob") Nothing Nothing
Left (Input {inputName = Nothing, inputDoB = Nothing, inputAddress = Nothing},
      ["no bob and toms allowed","dob is required","add1 is required"])
λ> validateInput now & Input (Just "Alice") Nothing Nothing
Left (Input {inputName = Just "Alice", inputDoB = Nothing, inputAddress = Nothing},
      ["dob is required","add1 is required"])
λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2002 10 12) Nothing
Left (Input {inputName = Just "Alice", inputDoB = Nothing, inputAddress = Nothing},
      ["get off my lawn","add1 is required"])
λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2012 4 21) Nothing
Left (Input {inputName = Just "Alice", inputDoB = Just 2012-04-21, inputAddress = Nothing},
      ["add1 is required"])
λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2012 4 21) (Just "x")
Right (ValidInput {validName = "Alice", validDoB = 2012-04-21, validAddress = "x"})

In order to make the output more readable, I've manually edited the GHCi session by adding line breaks to the output.

It looks like it's working like it's supposed to. Only the last line successfully parses the input and returns a Right value.

Conclusion #

Before I started this proof of concept, I had an inkling of the way this would go. Instead of making the prototype in F#, I found it more productive to do it in Haskell, since Haskell enables me to compose things together. I particularly appreciate how a composition of types like (Endo Input, [String]) is automatically a Semigroup instance. I don't have to do anything. That makes the language great for prototyping things like this.

Now that I've found the appropriate semigroup, I know how to convert the code to F#. That's in the next article.

Next: An F# demo of validation with partial data round trip.


Comments

Great work and excellent post. I just had a few clarification quesitons.

...But what about Input? Can we somehow combine two Input values into one? It's not entirely clear how we should do that, so that doesn't seem too promising.

What we need to do, however, is to take the original Input and modify it by (optionally) resetting one or more fields. In other words, a series of functions of the type Input -> Input. Aha! There's the semigroup we need: Endo Input.

How rhetorical are those questions? Whatever the case, I will take the bait.

Any product type forms a semigroup if all of its elements do. You explicitly stated this for tuples of length 2; it also holds for records such as Input. Each field on that record has type Maybe a for some a, so it suffices to select a semigroup involving Maybe a. There are few different semigropus involving Maybe that have different functions.

I think the most common semigroup for Maybe a has the function that returns the first Just _ if one exists or else returns Nothing. Combining that with Nothing as the identity element gives the monoid that is typically associated with Maybe a (and I know by the name monoidal plus). Another monoid, and therefore a semigroup, is to return the last Just _ instead of the first.

Instead of the having a preference for Just _, the function could have a preference for Nothing. As before, when both inputs are Just _, the output could be either of the inputs.

I think either of those last two semigroups will achieved the desired behavior in the problem at hand. Your code never replaces an instace of Just a with a different instance, so we don't need a preference for some input when they are both Just _.

In the end though, I think the semigroup you derived from Endo leads to simpler code.

At the end of the type signature for validateName / validateDoB / validateAddress, what does String / Day / String mean?

Why did you pass all three arguments into every parsing/validation function? I think it is a bit simpler to only pass in the needed argument. Maybe you thought this was good enough for prototype code.

Why did you use add1 in your error message instead of address? Was it only for prototype code to make the message a bit shorter?

2020-12-21 14:21 UTC

Tyson, thank you for writing. The semigroup you suggest, I take it, would look something like this:

newtype Perhaps a = Perhaps { runPerhaps :: Maybe  a } deriving (EqShow)
 
instance Semigroup (Perhaps a) where
  Perhaps Nothing <> _ = Perhaps Nothing
  _ <> Perhaps Nothing = Perhaps Nothing
  Perhaps (Just x) <> _ = Perhaps (Just x)

That might work, but it's an atypical semigroup. I think that it's lawful - at least, I can't come up with a counterexample against associativity. It seems reminiscent of Boolean and (the All monoid), but it isn't a monoid, as far as I can tell.

Granted, a Monoid constraint isn't required to make the validation code work, but following the principle of least surprise, I still think that picking a well-known semigroup such as Endo is preferable.

Regarding your second question, the type signature of e.g. validateName is:

validateName :: Input -> Validation (Endo Input, [String]) String

Like Either, Validation has two type arguments: err and a; it's defined as data Validation err a. In the above function type, the return value is a Validation value where the err type is (Endo Input, [String]) and a is String.

All three validation functions share a common err type: (Endo Input, [String]). On the other hand, they return various a types: String, Day, and String, respectively.

Regarding your third question, I could also have defined the functions so that they would only have taken the values they'd need to validate. That would better fit Postel's law, so I should probably have done that...

As for the last question, I was just following the 'spec' implied by the original forum question.

2020-12-22 15:05 UTC


Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Monday, 21 December 2020 06:54:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Monday, 21 December 2020 06:54:00 UTC