Validating or verifying emails

On separating preconditions from business rules.

My recent article Validation and business rules elicited this question:

"Regarding validation should be pure function, lets have user registration as an example, is checking the email address uniqueness a validation or a business rule? It may not be pure since the check involves persistence mechanism."

Cherif BOUCHELAGHEM

This is a great opportunity to examine some heuristics in greater detail. As always, this mostly presents how I think about problems like this, and so doesn't represent any rigid universal truth.

The specific question is easily answered, but when the topic is email addresses and validation, I foresee several follow-up questions that I also find interesting.

Uniqueness constraint #

A new user signs up for a system, and as part of the registration process, you want to verify that the email address is unique. Is that validation or a business rule?

Again, I'm going to put the cart before the horse and first use the definition to answer the question.

Validation is a pure function that decides whether data is acceptable.

Validation and business rules

Can you implement the uniqueness constraint with a pure function? Not easily. What most systems would do, I'm assuming, is to keep track of users in some sort of data store. This means that in order to check whether or not a email address is unique, you'd have to query that database.

Querying a database is non-deterministic because you could be making multiple subsequent queries with the same input, yet receive differing responses. In this particular example, imagine that you ask the database whether ann.siebel@example.com is already registered, and the answer is no, that address is new to us.

Database queries are snapshots in time. All that answer tells you is that at the time of the query, the address would be unique in your database. While that answer travels over the network back to your code, a concurrent process might add that very address to the database. Thus, the next time you ask the same question: Is ann.siebel@example.com already registered? the answer would be: Yes, we already know of that address.

Verifying that the address is unique (most likely) involves an impure action, and so according to the above definition isn't a validation step. By the law of the the excluded middle, then, it must be a business rule.

Using a different rule of thumb, Robert C. Martin arrives at the same conclusion:

"Uniqueness is semantic not syntactic, so I vote that uniqueness is a business rule not a validation rule."

Robert C. Martin

This highlights a point about this kind of analysis. Using functional purity is a heuristic shortcut to sorting verification problems. Those that are deterministic and have no side effects are validation problems, and those that are either non-deterministic or have side effects are not.

Being able to sort problems in this way is useful because it enables you to choose the right tool for the job, and to avoid the wrong tool. In this case, trying to address the uniqueness constraint with validation is likely to cause trouble.

Why is that? Because of what I already described. A database query is a snapshot in time. If you make a decision based on that snapshot, it may be the wrong decision once you reach a conclusion. Granted, when discussing user registration, the risk of several processes concurrently trying to register the same email address probably isn't that big, but in other domains, contention may be a substantial problem.

Being able to identify a uniqueness constraint as something that isn't validation enables you to avoid that kind of attempted solution. Instead, you may contemplate other designs. If you keep users in a relational database, the easiest solution is to put a uniqueness constraint on the Email column and let the database deal with the problem. Just be prepared to handle the exception that the INSERT statement may generate.

If you have another kind of data store, there are other ways to model the constraint. You can even do so using lock-free architectures, but that's out of scope for this article.

Validation checks preconditions #

Encapsulation is an important part of object-oriented programming (and functional programming as well). As I've often outlined, I base my understanding of encapsulation on Object-Oriented Software Construction. I consider contract (preconditions, invariants, and postconditions) essential to encapsulation.

I'll borrow a figure from my article Can types replace validation?:

An arrow labelled 'validation' pointing from a document to the left labelled 'Data' to a box to the right labelled 'Type'.

The role of validation is to answer the question: Does the data make sense?

This question, and its answer, is typically context-dependent. What 'makes sense' means may differ. This is even true for email addresses.

When I wrote the example code for my book Code That Fits in Your Head, I had to contemplate how to model email addresses. Here's an excerpt from the book:

Email addresses are notoriously difficult to validate, and even if you had a full implementation of the SMTP specification, what good would it do you?

Users can easily give you a bogus email address that fits the spec. The only way to really validate an email address is to send a message to it and see if that provokes a response (such as the user clicking on a validation link). That would be a long-running asynchronous process, so even if you'd want to do that, you can't do it as a blocking method call.

The bottom line is that it makes little sense to validate the email address, apart from checking that it isn't null. For that reason, I'm not going to validate it more than I've already done.

Code That Fits in Your Head, p. 102

In this example, I decided that the only precondition I would need to demand was that the email address isn't null. This was motivated by the operations I needed to perform with the email address - or rather, in this case, the operations I didn't need to perform. The only thing I needed to do with the address was to save it in a database and send emails:

public async Task EmailReservationCreated(int restaurantId, Reservation reservation)
{
    if (reservation is null)
        throw new ArgumentNullException(nameof(reservation));
 
    var r = await RestaurantDatabase.GetRestaurant(restaurantId).ConfigureAwait(false);
 
    var subject = $"Your reservation for {r?.Name}.";
    var body = CreateBodyForCreated(reservation);
    var email = reservation.Email.ToString();
 
    await Send(subject, body, email).ConfigureAwait(false);
}

This code example suggests why I made it a precondition that Email mustn't be null. Had null be allowed, I would have had to resort to defensive coding, which is exactly what encapsulation makes redundant.

Validation is a process that determines whether data is useful in a particular context. In this particular case, all it takes is to check the Email property on the DTO. The sample code that comes with Code That Fits in Your Head shows the basics, while An applicative reservation validation example in C# contains a more advanced solution.

Preconditions are context-dependent #

I would assume that a normal user registration process has little need to validate an ostensible email address. A system may want to verify the address, but that's a completely different problem. It usually involves sending an email to the address in question and have some asynchronous process register if the user verifies that email. For an article related to this problem, see Refactoring registration flow to functional architecture.

Perhaps you've been reading this with mounting frustration: How about validating the address according to the SMTP spec?

Indeed, that sounds like something one should do, but turns out to be rarely necessary. As already outlined, users can easily supply a bogus address like foo@bar.com. It's valid according to the spec, and so what? How does that information help you?

In most contexts I've found myself, validating according to the SMTP specification is a distraction. One might, however, imagine scenarios where it might be required. If, for example, you need to sort addresses according to user name or host name, or perform some filtering on those parts, etc. it might be warranted to actually require that the address is valid.

This would imply a validation step that attempts to parse the address. Once again, parsing here implies translating less-structured data (a string) to more-structured data. On .NET, I'd consider using the MailAddress class which already comes with built-in parser functions.

The point being that your needs determine your preconditions, which again determine what validation should do. The preconditions are context-dependent, and so is validation.

Conclusion #

Email addresses offer a welcome opportunity to discuss the difference between validation and verification in a way that is specific, but still, I hope, easy to extrapolate from.

Validation is a translation from one (less-structured) data format to another. Typically, the more-structured data format is an object, a record, or a hash map (depending on language). Thus, validation is determined by two forces: What the input data looks like, and what the desired object requires; that is, its preconditions.

Validation is always a translation with the potential for error. Some input, being less-structured, can't be represented by the more-structured format. In addition to parsing, a validation function must also be able to fail in a composable matter. That is, fortunately, a solved problem.

Published: Monday, 03 July 2023 05:41:00 UTC

Validating or verifying emails by Mark Seemann

Uniqueness constraint #

Validation checks preconditions #

Preconditions are context-dependent #

Conclusion #

Wish to comment?

Published

Tags