A Haskell proof of concept of validation with partial data round trip by Mark Seemann
Which Semigroup best addresses the twist in the previous article?
This article is part of a short article series on applicative validation with a twist. The twist is that validation, when it fails, should return not only a list of error messages; it should also retain that part of the input that was valid.
In this article, I'll show how I did a quick proof of concept in Haskell.
Data definitions #
You can't use the regular Either instance of Applicative for validation because it short-circuits on the first error. In other words, you can't collect multiple error messages, even if the input has multiple issues. Instead, you need a custom Applicative instance. You can easily write such an instance yourself, but there are a couple of libraries that already do this. For this prototype, I chose the validation package.
import Data.Bifunctor import Data.Time import Data.Semigroup import Data.Validation
Apart from importing Data.Validation, I also need a few other imports for the proof of concept. All of them are well-known. I used no language extensions.
For the proof of concept, the input is a triple of a name, a date of birth, and an address:
data Input = Input { inputName :: Maybe String, inputDoB :: Maybe Day, inputAddress :: Maybe String } deriving (Eq, Show)
The goal is actually to parse (not validate) Input into a safer data type:
data ValidInput = ValidInput { validName :: String, validDoB :: Day, validAddress :: String } deriving (Eq, Show)
If parsing/validation fails, the output should report a collection of error messages and return the Input value with any valid data retained.
Looking for a Semigroup #
My hypothesis was that validation, even with that twist, can be implemented elegantly with an Applicative instance. The validation package defines its Validation data type such that it's an Applicative instance as long as its error type is a Semigroup instance:
Semigroup err => Applicative (Validation err)
The question is: which Semigroup can we use?
Since we need to return both a list of error messages and a modified Input value, it sounds like we'll need a product type of some sorts. A tuple will do; something like (Input, [String]). Is that a Semigroup instance, though?
Tuples only form semigroups if both elements give rise to a semigroup:
(Semigroup a, Semigroup b) => Semigroup (a, b)
The second element of my candidate is [String], which is fine. Lists are Semigroup instances. But what about Input? Can we somehow combine two Input values into one? It's not entirely clear how we should do that, so that doesn't seem too promising.
What we need to do, however, is to take the original Input and modify it by (optionally) resetting one or more fields. In other words, a series of functions of the type Input -> Input. Aha! There's the semigroup we need: Endo Input.
So the Semigroup instance we need is (Endo Input, [String]), and the validation output should be of the type Validation (Endo Input, [String]) a.
Validators #
Cool, we can now implement the validation logic; a function for each field, starting with the name:
validateName :: Input -> Validation (Endo Input, [String]) String validateName (Input (Just name) _ _) | length name > 3 = Success name validateName (Input (Just _) _ _) = Failure (Endo $ \x -> x { inputName = Nothing }, ["no bob and toms allowed"]) validateName _ = Failure (mempty, ["name is required"])
This function reproduces the validation logic implied by the forum question that started it all. Notice, particularly, that when the name is too short, the endomorphism resets inputName to Nothing.
The date-of-birth validation function works the same way:
validateDoB :: Day -> Input -> Validation (Endo Input, [String]) Day validateDoB now (Input _ (Just dob) _) | addGregorianYearsRollOver (-12) now < dob = Success dob validateDoB _ (Input _ (Just _) _) = Failure (Endo $ \x -> x { inputDoB = Nothing }, ["get off my lawn"]) validateDoB _ _ = Failure (mempty, ["dob is required"])
Again, the validation logic is inferred from the forum question, although I found it better keep the function pure by requiring a now argument.
The address validation is the simplest of the three validators:
validateAddress :: Monoid a => Input -> Validation (a, [String]) String validateAddress (Input _ _ (Just a)) = Success a validateAddress _ = Failure (mempty, ["add1 is required"])
This one's return type is actually more general than required, since I used mempty instead of Endo id. This means that it actually works for any Monoid a, which also includes Endo Input.
Composition #
All three functions return Validation (Endo Input, [String]), which has an Applicative instance. This means that we should be able to compose them together to get the behaviour we're looking for:
validateInput :: Day -> Input -> Either (Input, [String]) ValidInput validateInput now args = toEither $ first (first (`appEndo` args)) $ ValidInput <$> validateName args <*> validateDoB now args <*> validateAddress args
That compiles, so it probably works.
Sanity check #
Still, it'd be prudent to check. Since this is only a proof of concept, I'm not going to set up a test suite. Instead, I'll just start GHCi for some ad-hoc testing:
λ> now <- localDay <&> zonedTimeToLocalTime <&> getZonedTime
λ> validateInput now & Input Nothing Nothing Nothing
Left (Input {inputName = Nothing, inputDoB = Nothing, inputAddress = Nothing},
["name is required","dob is required","add1 is required"])
λ> validateInput now & Input (Just "Bob") Nothing Nothing
Left (Input {inputName = Nothing, inputDoB = Nothing, inputAddress = Nothing},
["no bob and toms allowed","dob is required","add1 is required"])
λ> validateInput now & Input (Just "Alice") Nothing Nothing
Left (Input {inputName = Just "Alice", inputDoB = Nothing, inputAddress = Nothing},
["dob is required","add1 is required"])
λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2002 10 12) Nothing
Left (Input {inputName = Just "Alice", inputDoB = Nothing, inputAddress = Nothing},
["get off my lawn","add1 is required"])
λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2012 4 21) Nothing
Left (Input {inputName = Just "Alice", inputDoB = Just 2012-04-21, inputAddress = Nothing},
["add1 is required"])
λ> validateInput now & Input (Just "Alice") (Just & fromGregorian 2012 4 21) (Just "x")
Right (ValidInput {validName = "Alice", validDoB = 2012-04-21, validAddress = "x"})
In order to make the output more readable, I've manually edited the GHCi session by adding line breaks to the output.
It looks like it's working like it's supposed to. Only the last line successfully parses the input and returns a Right value.
Conclusion #
Before I started this proof of concept, I had an inkling of the way this would go. Instead of making the prototype in F#, I found it more productive to do it in Haskell, since Haskell enables me to compose things together. I particularly appreciate how a composition of types like (Endo Input, [String]) is automatically a Semigroup instance. I don't have to do anything. That makes the language great for prototyping things like this.
Now that I've found the appropriate semigroup, I know how to convert the code to F#. That's in the next article.
Next: An F# demo of validation with partial data round trip.
Comments
Great work and excellent post. I just had a few clarification quesitons.
How rhetorical are those questions? Whatever the case, I will take the bait.
Any product type forms a semigroup if all of its elements do. You explicitly stated this for tuples of length 2; it also holds for records such as
Input. Each field on that record has typeMaybe afor somea, so it suffices to select a semigroup involvingMaybe a. There are few different semigropus involvingMaybethat have different functions.I think the most common semigroup for
Maybe ahas the function that returns the firstJust _if one exists or else returnsNothing. Combining that withNothingas the identity element gives the monoid that is typically associated withMaybe a(and I know by the name monoidal plus). Another monoid, and therefore a semigroup, is to return the lastJust _instead of the first.Instead of the having a preference for
Just _, the function could have a preference forNothing. As before, when both inputs areJust _, the output could be either of the inputs.I think either of those last two semigroups will achieved the desired behavior in the problem at hand. Your code never replaces an instace of
Just awith a different instance, so we don't need a preference for some input when they are bothJust _.In the end though, I think the semigroup you derived from
Endoleads to simpler code.At the end of the type signature for
validateName/validateDoB/validateAddress, what doesString/Day/Stringmean?Why did you pass all three arguments into every parsing/validation function? I think it is a bit simpler to only pass in the needed argument. Maybe you thought this was good enough for prototype code.
Why did you use
add1in your error message instead ofaddress? Was it only for prototype code to make the message a bit shorter?Tyson, thank you for writing. The semigroup you suggest, I take it, would look something like this:
That might work, but it's an atypical semigroup. I think that it's lawful - at least, I can't come up with a counterexample against associativity. It seems reminiscent of Boolean and (the All monoid), but it isn't a monoid, as far as I can tell.
Granted, a
Monoidconstraint isn't required to make the validation code work, but following the principle of least surprise, I still think that picking a well-known semigroup such asEndois preferable.Regarding your second question, the type signature of e.g.
validateNameis:Like
Either,Validationhas two type arguments:erranda; it's defined asdata Validation err a. In the above function type, the return value is aValidationvalue where theerrtype is(Endo Input, [String])andaisString.All three validation functions share a common
errtype:(Endo Input, [String]). On the other hand, they return variousatypes:String,Day, andString, respectively.Regarding your third question, I could also have defined the functions so that they would only have taken the values they'd need to validate. That would better fit Postel's law, so I should probably have done that...
As for the last question, I was just following the 'spec' implied by the original forum question.