A Haskell proof of concept of validation with partial data round trip by Mark Seemann
Which Semigroup best addresses the twist in the previous article?
This article is part of a short article series on applicative validation with a twist. The twist is that validation, when it fails, should return not only a list of error messages; it should also retain that part of the input that was valid.
In this article, I'll show how I did a quick proof of concept in Haskell.
Data definitions #
You can't use the regular Either
instance of Applicative
for validation because it short-circuits on the first error. In other words, you can't collect multiple error messages, even if the input has multiple issues. Instead, you need a custom Applicative
instance. You can easily write such an instance yourself, but there are a couple of libraries that already do this. For this prototype, I chose the validation package.
import Data.Bifunctor import Data.Time import Data.Semigroup import Data.Validation
Apart from importing Data.Validation
, I also need a few other imports for the proof of concept. All of them are well-known. I used no language extensions.
For the proof of concept, the input is a triple of a name, a date of birth, and an address:
data Input = Input { inputName :: Maybe String, inputDoB :: Maybe Day, inputAddress :: Maybe String } deriving (Eq, Show)
The goal is actually to parse (not validate) Input
into a safer data type:
data ValidInput = ValidInput { validName :: String, validDoB :: Day, validAddress :: String } deriving (Eq, Show)
If parsing/validation fails, the output should report a collection of error messages and return the Input
value with any valid data retained.
Looking for a Semigroup #
My hypothesis was that validation, even with that twist, can be implemented elegantly with an Applicative
instance. The validation package defines its Validation
data type such that it's an Applicative
instance as long as its error type is a Semigroup
instance:
Semigroup err => Applicative (Validation err)
The question is: which Semigroup
can we use?
Since we need to return both a list of error messages and a modified Input
value, it sounds like we'll need a product type of some sorts. A tuple will do; something like (Input, [String])
. Is that a Semigroup
instance, though?
Tuples only form semigroups if both elements give rise to a semigroup:
(Semigroup a, Semigroup b) => Semigroup (a, b)
The second element of my candidate is [String]
, which is fine. Lists are Semigroup
instances. But what about Input
? Can we somehow combine two Input
values into one? It's not entirely clear how we should do that, so that doesn't seem too promising.
What we need to do, however, is to take the original Input
and modify it by (optionally) resetting one or more fields. In other words, a series of functions of the type Input -> Input
. Aha! There's the semigroup we need: Endo Input
.
So the Semigroup
instance we need is (Endo Input, [String])
, and the validation output should be of the type Validation (Endo Input, [String]) a
.
Validators #
Cool, we can now implement the validation logic; a function for each field, starting with the name:
validateName :: Input -> Validation (Endo Input, [String]) String validateName (Input (Just name) _ _) | length name > 3 = Success name validateName (Input (Just _) _ _) = Failure (Endo $ \x -> x { inputName = Nothing }, ["no bob and toms allowed"]) validateName _ = Failure (mempty, ["name is required"])
This function reproduces the validation logic implied by the forum question that started it all. Notice, particularly, that when the name is too short, the endomorphism resets inputName
to Nothing
.
The date-of-birth validation function works the same way:
validateDoB :: Day -> Input -> Validation (Endo Input, [String]) Day validateDoB now (Input _ (Just dob) _) | addGregorianYearsRollOver (-12) now < dob = Success dob validateDoB _ (Input _ (Just _) _) = Failure (Endo $ \x -> x { inputDoB = Nothing }, ["get off my lawn"]) validateDoB _ _ = Failure (mempty, ["dob is required"])
Again, the validation logic is inferred from the forum question, although I found it better keep the function pure by requiring a now
argument.
The address validation is the simplest of the three validators:
validateAddress :: Monoid a => Input -> Validation (a, [String]) String validateAddress (Input _ _ (Just a)) = Success a validateAddress _ = Failure (mempty, ["add1 is required"])
This one's return type is actually more general than required, since I used mempty
instead of Endo id
. This means that it actually works for any Monoid a
, which also includes Endo Input
.
Composition #
All three functions return Validation (Endo Input, [String])
, which has an Applicative
instance. This means that we should be able to compose them together to get the behaviour we're looking for:
validateInput :: Day -> Input -> Either (Input, [String]) ValidInput validateInput now args = toEither $ first (first (`appEndo` args)) $ ValidInput <$> validateName args <*> validateDoB now args <*> validateAddress args
That compiles, so it probably works.
Sanity check #
Still, it'd be prudent to check. Since this is only a proof of concept, I'm not going to set up a test suite. Instead, I'll just start GHCi for some ad-hoc testing:
λ> now <- localDay <&> zonedTimeToLocalTime <&> getZonedTime λ> validateInput now & Input Nothing Nothing Nothing Left (Input {inputName = Nothing, inputDoB = Nothing, inputAddress = Nothing}, ["name is required","dob is required","add1 is required"]) λ> validateInput now & Input (Just "Bob") Nothing Nothing Left (Input {inputName = Nothing, inputDoB = Nothing, inputAddress = Nothing}, ["no bob and toms allowed","dob is required","add1 is required"]) λ> validateInput now & Input (Just "Alice") Nothing Nothing Left (Input {inputName = Just "Alice", inputDoB = Nothing, inputAddress = Nothing}, ["dob is required","add1 is required"]) λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2002 10 12) Nothing Left (Input {inputName = Just "Alice", inputDoB = Nothing, inputAddress = Nothing}, ["get off my lawn","add1 is required"]) λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2012 4 21) Nothing Left (Input {inputName = Just "Alice", inputDoB = Just 2012-04-21, inputAddress = Nothing}, ["add1 is required"]) λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2012 4 21) (Just "x") Right (ValidInput {validName = "Alice", validDoB = 2012-04-21, validAddress = "x"})
In order to make the output more readable, I've manually edited the GHCi session by adding line breaks to the output.
It looks like it's working like it's supposed to. Only the last line successfully parses the input and returns a Right
value.
Conclusion #
Before I started this proof of concept, I had an inkling of the way this would go. Instead of making the prototype in F#, I found it more productive to do it in Haskell, since Haskell enables me to compose things together. I particularly appreciate how a composition of types like (Endo Input, [String])
is automatically a Semigroup
instance. I don't have to do anything. That makes the language great for prototyping things like this.
Now that I've found the appropriate semigroup, I know how to convert the code to F#. That's in the next article.
Next: An F# demo of validation with partial data round trip.
Comments
Great work and excellent post. I just had a few clarification quesitons.
How rhetorical are those questions? Whatever the case, I will take the bait.
Any product type forms a semigroup if all of its elements do. You explicitly stated this for tuples of length 2; it also holds for records such as
Input
. Each field on that record has typeMaybe a
for somea
, so it suffices to select a semigroup involvingMaybe a
. There are few different semigropus involvingMaybe
that have different functions.I think the most common semigroup for
Maybe a
has the function that returns the firstJust _
if one exists or else returnsNothing
. Combining that withNothing
as the identity element gives the monoid that is typically associated withMaybe a
(and I know by the name monoidal plus). Another monoid, and therefore a semigroup, is to return the lastJust _
instead of the first.Instead of the having a preference for
Just _
, the function could have a preference forNothing
. As before, when both inputs areJust _
, the output could be either of the inputs.I think either of those last two semigroups will achieved the desired behavior in the problem at hand. Your code never replaces an instace of
Just a
with a different instance, so we don't need a preference for some input when they are bothJust _
.In the end though, I think the semigroup you derived from
Endo
leads to simpler code.At the end of the type signature for
validateName
/validateDoB
/validateAddress
, what doesString
/Day
/String
mean?Why did you pass all three arguments into every parsing/validation function? I think it is a bit simpler to only pass in the needed argument. Maybe you thought this was good enough for prototype code.
Why did you use
add1
in your error message instead ofaddress
? Was it only for prototype code to make the message a bit shorter?Tyson, thank you for writing. The semigroup you suggest, I take it, would look something like this:
That might work, but it's an atypical semigroup. I think that it's lawful - at least, I can't come up with a counterexample against associativity. It seems reminiscent of Boolean and (the All monoid), but it isn't a monoid, as far as I can tell.
Granted, a
Monoid
constraint isn't required to make the validation code work, but following the principle of least surprise, I still think that picking a well-known semigroup such asEndo
is preferable.Regarding your second question, the type signature of e.g.
validateName
is:Like
Either
,Validation
has two type arguments:err
anda
; it's defined asdata Validation err a
. In the above function type, the return value is aValidation
value where theerr
type is(Endo Input, [String])
anda
isString
.All three validation functions share a common
err
type:(Endo Input, [String])
. On the other hand, they return variousa
types:String
,Day
, andString
, respectively.Regarding your third question, I could also have defined the functions so that they would only have taken the values they'd need to validate. That would better fit Postel's law, so I should probably have done that...
As for the last question, I was just following the 'spec' implied by the original forum question.