Applicative validation by Mark Seemann
Validate input in applicative style for superior readability and composability.
This article is an instalment in an article series about applicative functors. It demonstrates how applicative style can be used to compose small validation functions to a larger validation function in such a way that no validation messages are lost, and the composition remains readable.
All example code in this article is given in Haskell. No F# translation is offered, because Scott Wlaschin has an equivalent example covering input validation in F#.
JSON validation #
In my Pluralsight course about a functional architecture in F#, you can see an example of an on-line restaurant reservation system. I often return to that example scenario, so for regular readers of this blog, it should be known territory. For newcomers, imagine that you've been asked to develop an HTTP-based API that accepts JSON documents containing restaurant reservations. Such a JSON document could look like this:
{ "date": "2017-06-27 18:30:00+02:00", "name": "Mark Seemann", "email": "mark@example.com", "quantity": 4 }
It contains the date and time of the (requested) reservation, the email address and name of the person making the reservation, as well as the number of people who will be dining. Particularly, notice that the date and time is represented as a string value (specifically, in ISO 8601 format), since JSON has no built-in date and time data type.
In Haskell, you can represent such a JSON document using a type like this:
data ReservationJson = ReservationJson { jsonDate :: String, jsonQuantity :: Double, jsonName :: String, jsonEmail :: String } deriving (Eq, Show, Read, Generic)
Haskell's strength is in its type system, so you should prefer to model a reservation using a strong type:
data Reservation = Reservation { reservationDate :: ZonedTime, reservationQuantity :: Int, reservationName :: String, reservationEmail :: String } deriving (Show, Read)
Instead of modelling the date and time as a string, you model it as a ZonedTime
value. Additionally, you should model quantity as an integer, since a floating point value doesn't make much sense.
While you can always translate a Reservation
value to a ReservationJson
value, the converse doesn't hold. There are ReservationJson
values that you can't translate to Reservation
. Such ReservationJson
values are invalid.
You should write code to validate and translate ReservationJson
values to Reservation
values, if possible.
Specialised validations #
The ReservationJson
type is a complex type, because it's composed of multiple (four) elements of different types. You can easily define at least three validation rules that ought to hold:
- You should be able to convert the
jsonDate
value to aZonedTime
value. jsonQuantity
must be a positive integer.jsonEmail
should look believably like an email address.
In Haskell, people often use Either
for validation, but instead of using Either
directly, I'll introduce a specialised Validation
type:
newtype Validation e r = Validation (Either e r) deriving (Eq, Show, Functor)
You'll notice that this is simply a redefinition of Either
. Haskell can automatically derive its Functor
instance with the DeriveFunctor
language extension.
My motivation for introducing a new type is that the way that Either
is Applicative
is not quite how I'd like it to be. Introducing a newtype
enables you to change how a type behaves. More on that later. First, you can implement the three individual validation functions.
Date validation #
If the JSON date value is an ISO 8601-formatted string, then you can parse it as a ZonedTime
. In that case, you should return the Right
case of Validation
. If you can't parse the string into a ZonedTime
value, you should return a Left
value containing a helpful error message.
validateDate :: String -> Validation [String] ZonedTime validateDate candidate = case readMaybe candidate of Just d -> Validation $ Right d Nothing -> Validation $ Left ["Not a date."]
This function uses readMaybe
from Text.Read
to attempt to parse the candidate
String
. When readMaybe
can read the String
value, it returns a Just
value with the parsed value inside; otherwise, it returns Nothing
. The function pattern-matches on those two cases and returns the appropriate value in each case.
Notice that errors are represented as a list of String
values, although this particular function only returns a single message in its list of error messages. The reason for that is that you should be able to collect multiple validation issues for a complex value such as ReservationJson
, and keeping track of errors in a list makes that possible.
Haskell golfers may argue that this implementation is overly verbose, and it could, for instance, instead be written as:
validateDate = Validation . maybe (Left ["Not a date."]) Right . readMaybe
which is true, but not as readable. Both versions get the job done, though, as these GCHi-based ad-hoc tests demonstrate:
λ> validateDate "2017/27/06 18:30:00 UTC+2" Validation (Left ["Not a date."]) λ> validateDate "2017-06-27 18:30:00+02:00" Validation (Right 2017-06-27 18:30:00 +0200)
That takes care of parsing dates. On to the next validation function.
Quantity validation #
JSON numbers aren't guaranteed to be integers, so it's possible that even a well-formed Reservation JSON document could contain a quantity
property of 9.7
, -11.9463
, or similar. When handling restaurant reservations, however, it only makes sense to handle positive integers. Even 0
is useless in this context. Thus, validation must check for two conditions, so in principle, you could write two separate functions for that. In order to keep the example simple, though, I've included both tests in the same function:
validateQuantity :: Double -> Validation [String] Int validateQuantity candidate = if isInt candidate && candidate > 0 then Validation $ Right $ round candidate else Validation $ Left ["Not a positive integer."] where isInt x = x == fromInteger (round x)
If candidate
is both an integer, and greater than zero, then validateQuantity
returns Right
; otherwise, it returns a Left
value containing an error message. Like validateDate
, you can easily test validateQuantity
in GHCi:
λ> validateQuantity 4 Validation (Right 4) λ> validateQuantity (-1) Validation (Left ["Not a positive integer."]) λ> validateQuantity 2.32 Validation (Left ["Not a positive integer."])
Perhaps you can think of rules for names, but I can't, so we'll leave the name be and move on to validating email addresses.
Email validation #
It's notoriously difficult to validate SMTP addresses, so you shouldn't even try. It seems fairly safe to assume, however, that an email address must contain at least one @
character, so that's going to be all the validation you have to implement:
validateEmail :: String -> Validation [String] String validateEmail candidate = if '@' `elem` candidate then Validation $ Right candidate else Validation $ Left ["Not an email address."]
Straightforward. Try it out in GHCI:
λ> validateEmail "foo" Validation (Left ["Not an email address."]) λ> validateEmail "foo@example.org" Validation (Right "foo@example.org")
Indeed, that works.
Applicative composition #
What you really should be doing is to validate a ReservationJson
value. You have the three validation rules implemented, so now you have to compose them. There is, however, a catch: you must evaluate all rules, and return a list of all the errors you encountered. That's probably going to be a better user experience for a user.
That's the reason you can't use Either
. While it's Applicative
, it doesn't behave like you'd like it to behave in this scenario. Particularly, the problem is that it throws away all but the first Left
value it finds:
λ> Right (,,) <*> Right 42 <*> Left "foo" <*> Left "bar" Left "foo"
Notice how Left "bar"
is ignored.
With the new type Validation
based on Either
, you can now define how it behaves as an applicative functor:
instance Monoid m => Applicative (Validation m) where pure = Validation . pure Validation (Left x) <*> Validation (Left y) = Validation (Left (mappend x y)) Validation f <*> Validation r = Validation (f <*> r)
This instance is restricted to Monoid
Left
types. It has special behaviour for the case where both expressions passed to <*>
are Left
values. In that case, it uses mappend
(from Monoid
) to 'add' the two Left
values together in a new Left
value.
For all other cases, this instance of Applicative
delegates to the behaviour defined for Either
. It also uses pure
from Either
to implement its own pure
function.
Lists ([]
) form a monoid, and since all the above validation functions return lists of errors, it means that you can compose them using this definition of Applicative
:
validateReservation :: ReservationJson -> Validation [String] Reservation validateReservation candidate = pure Reservation <*> vDate <*> vQuantity <*> vName <*> vEmail where vDate = validateDate $ jsonDate candidate vQuantity = validateQuantity $ jsonQuantity candidate vName = pure $ jsonName candidate vEmail = validateEmail $ jsonEmail candidate
The candidate
is a ReservationJson
value, but each of the validation functions work on either String
or Double
, so you'll have to use the ReservationJson
type's access functions (jsonDate
, jsonQuantity
, and so on) to pull the relevant values out of it. Once you have those, you can pass them as arguments to the appropriate validation function.
Since there's no rule for jsonName
, you can use pure
to create a Validation
value. All four resulting values (vDate
, vQuantity
, vName
, and vEmail
) are Validation [String]
values; only their Right
types differ.
The Reservation
record constructor is a function of the type ZonedTime -> Int -> String -> String -> Reservation
, so when you arrange the four v*
values correctly between the <*>
operator, you have the desired composition.
Try it in GHCi:
λ> validateReservation $ ReservationJson "2017-06-30 19:00:00+02:00" 4 "Jane Doe" "j@example.com" Validation (Right (Reservation { reservationDate = 2017-06-30 19:00:00 +0200, reservationQuantity = 4, reservationName = "Jane Doe", reservationEmail = "j@example.com"})) λ> validateReservation $ ReservationJson "2017/14/12 6pm" 4.1 "Jane Doe" "jane.example.com" Validation (Left ["Not a date.","Not a positive integer.","Not an email address."]) λ> validateReservation $ ReservationJson "2017-06-30 19:00:00+02:00" (-3) "Jane Doe" "j@example.com" Validation (Left ["Not a positive integer."])
The first ReservationJson
value passed to validateReservation
is valid, so the return value is a Right
value.
The next ReservationJson
value is about as wrong as it can be, so three different error messages are returned in a Left
value. This demonstrates that Validation
doesn't give up the first time it encounters a Left
value, but rather collects them all.
The third example demonstrates that even a single invalid value (in this case a negative quantity) is enough to make the entire input invalid, but as expected, there's only a single error message.
Summary #
Validation may be the poster child of applicative functors, but it is a convenient way to solve the problem. In this article you saw how to validate a complex data type, collecting and reporting on all problems, if any.
In order to collect all errors, instead of immediately short-circuiting on the first error, you have to deviate from the standard Either
implementation of <*>
. If you go back to read Scott Wlaschin's article, you should be aware that it specifically implements its applicative functor in that way, instead of the normal behaviour of Either
.
More applicative functors exist. This article series has, I think, room for more examples.