Danish CPR numbers in F# by Mark Seemann
An example of domain-modelling in F#, including a fine example of using the option type as an applicative functor.
This article is an instalment in an article series about applicative functors, although the applicative functor example doesn't appear until towards the end. This article also serves the purpose of showing an example of Domain-Driven Design in F#.
Danish personal identification numbers #
As outlined in the previous article, in Denmark, everyone has a personal identification number, in Danish called CPR-nummer (CPR number).
CPR numbers have a simple format: DDMMYY-SSSS
, where the first six digits indicate a person's birth date, and the last four digits are a sequence number. Some information, however, is also embedded in the sequence number. An example could be 010203-1234
, which indicates a woman born February 1, 1903.
One way to model this in F# is with a single-case discriminated union:
type CprNumber = private CprNumber of (int * int * int * int) with override x.ToString () = let (CprNumber (day, month, year, sequenceNumber)) = x sprintf "%02d%02d%02d-%04d" day month year sequenceNumber
This is a common idiom in F#. In object-oriented design with e.g. C# or Java, you'd typically create a class and put guard clauses in its constructor. This would prevent a user from initialising an object with invalid data (such as 401500-1234
). While you can create classes in F# as well, a single-case union with a private case constructor can achieve the same degree of encapsulation.
In this case, I decided to use a quadruple (4-tuple) as the internal representation, but this isn't visible to users. This gives me the option to refactor the internal representation, if I need to, without breaking existing clients.
Creating CPR number values #
Since the CprNumber
case constructor is private, you can't just create new values like this:
let cpr = CprNumber (1, 1, 1, 1118)
If you're outside the Cpr
module that defines the type, this doesn't compile. This is by design, but obviously you need a way to create values. For convenience, input values for day, month, and so on, are represented as int
s, which can be zero, negative, or way too large for CPR numbers. There's no way to statically guarantee that you can create a value, so you'll have to settle for a tryCreate
function; i.e. a function that returns Some CprNumber
if the input is valid, or None
if it isn't. In Haskell, this pattern is called a smart constructor.
There's a couple of rules to check. All integer values must fall in certain ranges. Days must be between 1 and 31, months must be between 1 and 12, and so on. One way to enable succinct checks like that is with an active pattern:
let private (|Between|_|) min max candidate = if min <= candidate && candidate <= max then Some candidate else None
Straightforward: return Some candidate
if candidate
is between min
and max
; otherwise, None
. This enables you to pattern-match input integer values to particular ranges.
Perhaps you've already noticed that years are represented with only two digits. CPR is an old system (from 1968), and back then, bits were expensive. No reason to waste bits on recording the millennium or century in which people were born. It turns out, after all, that there's a way to at least partially figure out the century in which people were born. The first digit of the sequence number contains that information:
// int -> int -> int let private calculateFourDigitYear year sequenceNumber = let centuryDigit = sequenceNumber / 1000 // Integer division // Table from https://da.wikipedia.org/wiki/CPR-nummer match centuryDigit, year with | Between 0 3 _, _ -> 1900 | 4 , Between 0 36 _ -> 2000 | 4 , _ -> 1900 | Between 5 8 _, Between 0 57 _ -> 2000 | Between 5 8 _, _ -> 1800 | _ , Between 0 36 _ -> 2000 | _ -> 1900 + year
As the code comment informs the reader, there's a table that defines the century, based on the two-digit year and the first digit of the sequence number. Note that birth dates in the nineteenth century are possible. No Danes born before 1900 are alive any longer, but at the time the CPR system was introduced, one person in the system was born in 1863!
The calculateFourDigitYear
function starts by pulling the first digit out of the sequence number. This is a four-digit number, so dividing by 1,000 produces the digit. I left a comment about integer division, because I often miss that detail when I read code.
The big pattern-match expression uses the Between
active pattern, but it ignores the return value from the pattern. This explains the wild cards (_
), I hope.
Although a pattern-match expression is often formatted over several lines of code, it's a single expression that produces a single value. Often, you see code where a let
-binding binds a named value to a pattern-match expression. Another occasional idiom is to pipe a pattern-match expression to a function. In the calculateFourDigitYear
function I use a language construct I've never seen anywhere else: the eight-lines pattern-match expression returns an int
, which I simply add to year
using the +
operator.
Both calculateFourDigitYear
and the Between
active pattern are private functions. They're only there as support functions for the public API. You can now implement a tryCreate
function:
// int -> int -> int -> int -> CprNumber option let tryCreate day month year sequenceNumber = match month, year, sequenceNumber with | Between 1 12 m, Between 0 99 y, Between 0 9999 s -> let fourDigitYear = calculateFourDigitYear y s if 1 <= day && day <= DateTime.DaysInMonth (fourDigitYear, m) then Some (CprNumber (day, m, y, s)) else None | _ -> None
The tryCreate
function begins by pattern-matching a triple (3-tuple) using the Between
active pattern. The month
must always be between 1
and 12
(both included), the year
must be between 0
and 99
, and the sequenceNumber
must always be between 0
and 9999
(in fact, I'm not completely sure if 0000
is valid).
Finding the appropriate range for the day
is more intricate. Is 31
always valid? Clearly not, because there's no November 31, for example. Is 30
always valid? No, because there's never a February 30. Is 29
valid? This depends on whether or not the year is a leap year.
This reveals why you need calculateFourDigitYear
. While you can use DateTime.DaysInMonth
to figure out how many days a given month has, you need the year. Specifically, February 1900 had 28 days, while February 2000 had 29.
Ergo, if day
, month
, year
, and sequenceNumber
all fall within their appropriate ranges, tryCreate
returns a Some CprNumber
value; otherwise, it returns None
.
Notice how this is different from an object-oriented constructor with guard clauses. If you try to create an object with invalid input, it'll throw an exception. If you try to create a CprNumber
value, you'll receive a CprNumber option
, and you, as the client developer, must handle both the Some
and the None
case. The compiler will enforce this.
> let gjern = Cpr.tryCreate 11 11 11 1118;; val gjern : Cpr.CprNumber option = Some 111111-1118 > gjern |> Option.map Cpr.born;; val it : DateTime option = Some 11.11.1911 00:00:00
As most F# developers know, F# gives you enough syntactic sugar to make this a joy rather than a chore... and the warm and fuzzy feeling of safety is priceless.
CPR data #
The above FSI session uses Cpr.born
, which you haven't seen yet. With the tools available so far, it's trivial to implement; all the work is already done:
// CprNumber -> DateTime let born (CprNumber (day, month, year, sequenceNumber)) = DateTime (calculateFourDigitYear year sequenceNumber, month, day)
While the CprNumber
case constructor is private
, it's still available from inside of the module. The born
function pattern-matches day
, month
, year
, and sequenceNumber
out of its input argument, and trivially delegates the hard work to calculateFourDigitYear
.
Another piece of data you can deduce from a CPR number is the gender of the person:
// CprNumber -> bool let isFemale (CprNumber (_, _, _, sequenceNumber)) = sequenceNumber % 2 = 0 let isMale (CprNumber (_, _, _, sequenceNumber)) = sequenceNumber % 2 <> 0
The rule is that if the sequence number is even, then the person is female; otherwise, the person is male (and if you change sex, you get a new CPR number).
> gjern |> Option.map Cpr.isFemale;; val it : bool option = Some true
Since 1118
is even, this is a woman.
Parsing CPR strings #
CPR numbers are often passed around as text, so you'll need to be able to parse a string
representation. As described in the previous article, you should follow Postel's law. Input could include extra white space, and the middle dash could be missing.
The .NET Base Class Library contains enough utility methods working on string
values that this isn't going to be particularly difficult. It can, however, be awkward to interoperate with object-oriented APIs, so you'll often benefit from adding a few utility functions that give you curried functions instead of objects with methods. Here's one that adapts Int32.TryParse
:
module private Int = // string -> int option let tryParse candidate = match candidate |> Int32.TryParse with | true, i -> Some i | _ -> None
Nothing much goes on here. While F# has pleasant syntax for handling out
parameters, it can be inconvenient to have to pattern-match every time you'd like to try to parse an integer.
Here's another helper function:
module private String = // int -> int -> string -> string option let trySubstring startIndex length (s : string) = if s.Length < startIndex + length then None else Some (s.Substring (startIndex, length))
This one comes with two benefits: The first benefit is that it's curried, which enables partial application and piping. You'll see an example of this further down. The second benefit is that it handles at least one error condition in a type-safe manner. When trying to extract a sub-string from a string, the Substring
method can throw an exception if the index or length arguments are out of range. This function checks whether it can extract the requested sub-string, and returns None
if it can't.
I wouldn't be surprised if there are edge cases (for example involving negative integers) that trySubstring
doesn't handle gracefully, but as you may have noticed, this is a function in a private
module. I only need it to handle a particular use case, and it does that.
You can now add the tryParse
function:
// string -> CprNumber option let tryParse (candidate : string ) = let (<*>) fo xo = fo |> Option.bind (fun f -> xo |> Option.map f) let canonicalized = candidate.Trim().Replace("-", "") let dayCandidate = canonicalized |> String.trySubstring 0 2 let monthCandidate = canonicalized |> String.trySubstring 2 2 let yearCandidate = canonicalized |> String.trySubstring 4 2 let sequenceNumberCandidate = canonicalized |> String.trySubstring 6 4 Some tryCreate <*> Option.bind Int.tryParse dayCandidate <*> Option.bind Int.tryParse monthCandidate <*> Option.bind Int.tryParse yearCandidate <*> Option.bind Int.tryParse sequenceNumberCandidate |> Option.bind id
The function starts by defining a private <*>
operator. Readers of the applicative functor article series will recognise this as the 'apply' operator. The reason I added it as a private operator is that I don't need it anywhere else in the code base, and in F#, I'm always worried that if I add <*>
at a more visible level, it could collide with other definitions of <*>
- for example one for lists. This one particularly makes option
an applicative functor.
The first step in parsing candidate
is to remove surrounding white space and the interior dash.
The next step is to use String.trySubstring
to pull out candidates for day, month, and so on. Each of these four are string option
values.
All four of these must be Some
values before we can even start to attempt to turn them into a CprNumber
value. If only a single value is None
, tryParse
should return None
as well.
You may want to re-read the article on the List applicative functor for a detailed explanation of how the <*>
operator works. In tryParse
, you have four option
values, so you apply them all using four <*>
operators. Since four values are being applied, you'll need a function that takes four curried input arguments of the appropriate types. In this case, all four are int option
values, so for the first expression in the <*>
chain, you'll need an option of a function that takes four int
arguments.
Lo and behold! tryCreate
takes four int
arguments, so the only action you need to take is to make it an option
by putting it in a Some
case.
The only remaining hurdle is that tryCreate
returns CprNumber option
, and since you're already 'in' the option
applicative functor, you now have a CprNumber option option
. Fortunately, bind id
is always the 'flattening' combo, so that's easily dealt with.
> let andreas = Cpr.tryParse " 0109636221";; val andreas : Cpr.CprNumber option = Some 010963-6221
Since you now have both a way to parse a string, and turn a CprNumber
into a string, you can write the usual round-trip property:
[<Fact>] let ``CprNumber correctly round-trips`` () = Property.check <| property { let! expected = Gen.cprNumber let actual = expected |> string |> Cpr.tryParse Some expected =! actual }
This test uses Hedgehog, Unquote, and xUnit.net. The previous article demonstrates a way to test that Cpr.tryParse
can handle mangled input.
Summary #
This article mostly exhibited various F# design techniques you can use to achieve an even better degree of encapsulation than you can easily get with object-oriented languages. Towards the end, you saw how using option
as an applicative functor enables you to compose more complex optional values from smaller values. If just a single value is None
, the entire expression becomes None
, but if all values are Some
values, the computation succeeds.
This article is an entry in the F# Advent Calendar in English 2018.
Next: The Lazy applicative functor.
Comments
Great post, a very good read! Interestingly enough we recently made an F# implementation for the swedish personal identification number. In fact v1.0.0 will be published any day now. Interesting to see how the problem with four-digit years are handled differently in Denmark and Sweden.
I really like the Between Active pattern of your solution, we did not really take as a generic approach, instead we modeled with types for Year, Month, Day, etc. But I find your solution to be very concise and clear. Also we worked with the Result type instead of Option to be able to provide the client with helpful error messages. For our Object Oriented friends we are also exposing a C#-friendly facade which adds a bit of boiler plate code.
Viktor, thank you for your kind words. The
Result
(or Either) type does, indeed, provide more information when things go wrong. This can be useful when client code needs to handle different error cases in different ways. Sometimes, it may also be useful, as you write, when you want to provide more helpful error messages.Particularly when it comes to parsing or input validation, Either can be useful.
The main reason I chose to model with
option
in this article was that I wanted to demonstrate how to use the applicative nature ofoption
, but I suppose I could have equally done so withResult
.