Bank OCR kata in F#: user stories 3 and 4

In previous posts, I've walked through the first two user stories of the Bank OCR kata in F#. In this post I'll walk through my implementation of user stories 3 and 4.

The reason I'm collecting both user stories in a single blog post is that the requirements of user story 4 represent a change from user story 3, and since I didn't use a DVCS for this kata, I no longer have my implementation for user story 3.

The core unit test #

As with the previous user stories, I started by writing a Parameterized Test:

[<Theory>]
[<InlineData("
    _  _     _  _  _  _  _ 
  | _| _||_||_ |_   ||_||_|
  ||_  _|  | _||_|  ||_| _|", "123456789")>]
[<InlineData("
 _     _  _  _  _  _  _  _ 
 _||_||_ |_||_| _||_||_ |_ 
 _|  | _||_||_||_ |_||_| _|", "345882865")>]
[<InlineData("
 _     _  _  _  _  _       
 _||_||_ |_||_| _||_|  ||_|
 _|  | _||_||_||_ |_|  |  |", "345882814")>]
[<InlineData("
    _  _     _  _  _  _  _ 
  | _| _||_||_ |_   ||_||_|
  ||_  _|  | _||_|  ||_||_|", "123456789")>]
[<InlineData("
    _  _     _  _  _  _  _ 
  | _| _||_||_ |_   ||_||_|
  | _| _|  | _||_|  ||_||_|", "133456788 AMB ['133456188', '193456788']")>]
[<InlineData("
    _  _     _  _  _  _  _ 
  | _| _||_||_ |_   ||_|| |
  | _| _|  | _||_|  ||_||_|", "133456780 ERR")>]
[<InlineData("
 _     _  _  _  _  _       
 _||_||_ |_||_| _||_|  ||_|
 _|| | _||_||_||_ |_|  |  |", "345882814")>]
[<InlineData("
 _  _        _  _     _  _ 
|_||_   |  || | _| _  _||_ 
|_||_|  |  ||_|   |_| _||_|", "86110??36 ILL")>]
[<InlineData("
 _     _  _  _  _  _       
 _||_||_ |_||_| _||_|  ||_|
 _|  | _||  |_|   |_|  |  |", "345?8?814 ILL")>]
[<InlineData("
 
  |  |  |  |  |  |  |  |  |
  |  |  |  |  |  |  |  |  |", "711111111")>]
[<InlineData("
 _  _  _  _  _  _  _  _  _ 
  |  |  |  |  |  |  |  |  |
  |  |  |  |  |  |  |  |  |", "777777177")>]
[<InlineData("
 _  _  _  _  _  _  _  _  _ 
 _|| || || || || || || || |
|_ |_||_||_||_||_||_||_||_|", "200800000")>]
[<InlineData("
 _  _  _  _  _  _  _  _  _ 
 _| _| _| _| _| _| _| _| _|
 _| _| _| _| _| _| _| _| _|", "333393333")>]
[<InlineData("
 _  _  _  _  _  _  _  _  _ 
|_||_||_||_||_||_||_||_||_|
|_||_||_||_||_||_||_||_||_|", "888888888 AMB ['888886888', '888888880', '888888988']")>]
[<InlineData("
 _  _  _  _  _  _  _  _  _ 
|_ |_ |_ |_ |_ |_ |_ |_ |_ 
 _| _| _| _| _| _| _| _| _|", "555555555 AMB ['559555555', '555655555']")>]
[<InlineData("
 _  _  _  _  _  _  _  _  _ 
|_ |_ |_ |_ |_ |_ |_ |_ |_ 
|_||_||_||_||_||_||_||_||_|\r\n", "666666666 AMB ['686666666', '666566666']")>]
[<InlineData("
 _  _  _  _  _  _  _  _  _ 
|_||_||_||_||_||_||_||_||_|
 _| _| _| _| _| _| _| _| _|", "999999999 AMB ['993999999', '999959999', '899999999']")>]
[<InlineData("
    _  _  _  _  _  _     _ 
|_||_|| || ||_   |  |  ||_ 
  | _||_||_||_|  |  |  | _|", "490067715 AMB ['490067115', '490867715', '490067719']")>]
[<InlineData("
    _  _     _  _  _  _  _ 
 _| _| _||_||_ |_   ||_||_|
  ||_  _|  | _||_|  ||_| _|", "123456789")>]
[<InlineData("
 _     _  _  _  _  _  _    
| || || || || || || ||_   |
|_||_||_||_||_||_||_| _|  |", "000000051")>]
[<InlineData("
    _  _  _  _  _  _     _ 
|_||_|| ||_||_   |  |  | _ 
  | _||_||_||_|  |  |  | _|", "490867715")>]
let ParseToPrintoutReturnsCorrectResult entry expected =
    entry
    |> ParseToPrintout
    |> should equal expected

Once again the test utilizes xUnit.net's data theories feature to decouple the test code from the test data, as well as FsUnit for the assertion DSL.

The test is really simple: it's 85 lines of data and three lines of code which state that the result of piping the entry argument to the ParseToPrintout function should be equal to the expected argument.

Implementation #

The implementation may seem more daunting. At first, I'll post the function in its entirety, and afterwards I'll break it down and walk you through it.

let ParseToPrintout entry =
    let toPrintable opt =
        match opt with
        | Some(d) -> d.ToString()
        | None -> "?"
 
    let formatEntry s =
        s
        |> ParseToDigits
        |> Seq.map toPrintable
        |> String.Concat    
 
    let isLegible potentialDigits = potentialDigits |> Seq.forall Option.isSome
 
    let isValidAndLegible s =
        let potentialDigits = s |> ParseToDigits
 
        (potentialDigits |> isLegible) &&
        (potentialDigits |> skipNones |> IsValid)
 
    if entry |> isValidAndLegible
    then entry |> formatEntry
    else
        let getCandidates chars =
            let withAlternatives c =
                seq {
                    match c with
                    | ' ' -> yield! [ '|'; '_' ]
                    | '|' | '_' -> yield ' '
                    | _ -> yield! []
                    }
 
            chars
            |> Seq.mapi (fun i _ -> chars
                                    |> ReplaceAt i withAlternatives)
            |> Seq.concat
 
        let validCandidates =
            entry.ToCharArray()
            |> getCandidates
            |> Seq.map ToString
            |> Seq.filter isValidAndLegible
            |> Seq.toList
 
        let formatAlternatives alternatives =
            alternatives
            |> Seq.map formatEntry
            |> Seq.map (sprintf "'%s'")
            |> Seq.reduce (sprintf "%s, %s")
 
        match validCandidates with
        | [head] -> head |> formatEntry
        | [] -> if entry |> ParseToDigits |> isLegible
                then entry |> formatEntry |> sprintf "%s ERR"
                else entry |> formatEntry |> sprintf "%s ILL"
        | _ -> validCandidates
                |> formatAlternatives
                |> sprintf "%s AMB [%s]" (formatEntry entry)

The first thing to notice is that the ParseToPrintout function is composed of quite a few helper functions. A thing that I currently don't like so much about F# is that it's necessary to define a function before it can be used, so all the helper functions must appear before the function that defines the general flow.

It's possible to move the helper functions to other modules so that they don't clutter the implementation, but most of the functions defined here seem to me to be a very specialized part of the overall implementation. In this particular code base, I've been working under the assumption that it would be best to define a function in as narrow a scope as possible, but looking at the result, I don't think that was the right decision. In the future, I think I'll move more helper functions to a separate module.

Valid and legible #

As a start, skim past the helper functions. The actual program flow of the function starts here:

if entry |> isValidAndLegible
then entry |> formatEntry

That part of the if/then branch should be fairly easy to understand. If the entry is valid (according to the checksum function from user story 2) and legible, the result is formatted and returned.

The isValidAndLegible function determines whether the entry is... well... valid and legible. The first thing it does is to pipe the string into the ParseToDigits function, which, you'll recall, returns a sequence of integer options.

To figure out whether those potential digits are legible it invokes the isLegible function, which I'll return to shortly.

In order to figure out whether the potential digits are valid, it invokes the IsValid method from user story 2. However, in order to do so, it needs to convert the sequence of integer options to a sequence of integers. It does that by invoking the skipNones helper method, that I defined in a helper module:

let skipNones sequence = Seq.collect Option.toArray sequence

This function takes a sequence of options and returns a sequence of values by skipping all the Nones. It does that by turning each option into an array of 0 or 1 elements and then concatenate all these arrays together to form a new sequence.

This array is piped into the IsValid function, which will only return true if there are exactly nine digits and the checksum is OK.

The isLegible function is simpler because all it needs to do is to verify that all the options are Somes (instead of Nones).

Formatting the entry #

The formatEntry function pipes the entry into the ParseToDigits function, which returns a sequence of integer options. Each option is mapped into a string, replacing each None with a "?".

Finally, the sequence of strings is concatenated into a single string by piping it to the built-in String.Concat method.

Finding candidates #

While formatting the entry is fairly easy if it's valid and legible, it suddenly becomes much harder to attempt to find other candidates if the entry seems invalid. What needs to be done in this case is to iterate through each and every character of the entry and try to replace just a single character at a time to see if it produces a better entry.

According to user story 4, a blank space could be either a vertical pipe or and underscore, while an underscore and a pipe might instead be a blank space. This relationship can be formalized by the withAlternatives function:

let withAlternatives c =
    seq {
        match c with
        | ' ' -> yield! [ '|'; '_' ]
        | '|' | '_' -> yield ' '
        | _ -> yield! []
        }

This function takes as input a single character and returns a sequence of characters. If the input is ' ', the output is a list containing '|' and '_'. On the other hand, if the input is either '|' or '_', the output is a sequence with the single element ' '. For all other characters (such as line breaks), the output is an empty sequence, indicating that no replacement should be attempted.

This defines the alternatives which can be fed into the ReplaceAt helper function:

let ReplaceAt index produceReplacements sequence =
    sequence
    |> Seq.nth index
    |> produceReplacements
    |> Seq.map (fun x -> sequence |> ReplaceElementAt index x)

In this case, the ReplaceAt function takes as input a sequence of characters (the entry string) and a function providing alternatives (withAlternatives) and produces a sequence of sequence of characters where only a single character has been replaced according to the alternatives.

If that sounds difficult, perhaps these unit tests explain the behavior better:

[<Fact>]
let ProduceAllReplacementsAtReturnsCorrectResult1() =
    let actual = ReplaceAt 0 (fun c -> [ 'I'; 'i' ]) "123"
    actual |> Seq.map ToString |> should equalSequence [ "I23"; "i23" ]
 
[<Fact>]
let ProduceAllReplacementsAtReturnsCorrectResult2() =
    let actual = ReplaceAt 1 (fun c -> [ 'V' ]) "153"
    actual |> Seq.map ToString |> should equalSequence [ "1V3" ]

The first unit tests replaces the first (zero-indexed) character in the input string "123" with two alternatives, 'I' and 'i', producing the output sequence of "I23" and "i23".

The second test replaces the second (zero-indexed) character in the input string "153" with the single alternative 'V' to produce the output "1V3".

The ReplaceAt function does this by first using the Seq.nth function to pick the nth character out of the sequence, and then pipe that character into the function that produces the alternatives. This produces a sequence of replacement characters, e.g. '|' and '_' as explained above.

This sequence of replacement characters is then used to produce a sequence of strings where the element at the index is replaced with each of the replacement characters. In order to do that, the original input is piped into the ReplaceElementAt helper function:

let ReplaceElementAt index element sequence =
    let beforeIndex = sequence |> Seq.take index
    let atIndex = element |> Seq.singleton
    let afterIndex = sequence |> Seq.skip (index + 1)
    afterIndex |> Seq.append atIndex |> Seq.append beforeIndex

This function takes a sequence of elements (e.g. a string) and returns another sequence (e.g. another string) by replacing the element at the specified index with the supplied element.

All these helper functions can be combined to produce a sequence of candidate strings:

chars
|> Seq.mapi (fun i _ -> chars
                        |> ReplaceAt i withAlternatives)
|> Seq.concat

Of these candidates, a lot will be invalid, but filtering them to find only the valid candidates is fairly easy:

let validCandidates =
    entry.ToCharArray()
    |> getCandidates
    |> Seq.map ToString
    |> Seq.filter isValidAndLegible
    |> Seq.toList

This final list of candidates is a list instead of a sequence because it enables me to use list pattern matching to deal with the cases of a single, none, or multiple valid alternatives:

match validCandidates with
| [head] -> head |> formatEntry
| [] -> if entry |> ParseToDigits |> isLegible
        then entry |> formatEntry |> sprintf "%s ERR"
        else entry |> formatEntry |> sprintf "%s ILL"
| _ -> validCandidates
        |> formatAlternatives
        |> sprintf "%s AMB [%s]" (formatEntry entry)

If there's only a single valid alternative, the [head] match is triggered and the decomposed head variable is simply piped to the formatEntry function.

If there's no valid alternative, the [] match is triggered, and the final formatting is then based on whether or not the entry is legible or not.

Finally, if there are multiple candidates, each is formatted and appended as ambiguous alternatives to the original (invalid) input.

In case you'd like to take an even closer look at the code, I've attached the entire solution as a zip file: KataBankOCR.zip (299.62 KB)

Comments

Jonty #

After reading this post I decided to try out some F# and it is very interesting. I'm struggling to work out "best practices" though. It would be great to hear your thoughts on this and how to move from a C# mindset to F#. I'm thinking things like how SOLID principles apply and how to structure your code to keep it maintainable.

2012-06-21 11:16 UTC

Mark Seemann #

FWIW, I found the book Real World Functional Programming: With Examples in F# and C# an excellent introduction to functional concepts for object-oriented programmers.

Soon, I'll write a small blog post on one way in which SOLID relates to Functional Programming.

2012-06-21 12:24 UTC

Simon Skov Boisen #

Excellent series of blogposts - how do you run your xunit tests? Do you run them with the xunit gui runner or using TestDriven.net?

I second 'Real World Functional Programming: With Examples in F# and C#', it's a really great book!

One very minor change to one of your helper-functions could be to use the backward pipe operator when combining the sequences in ReplaceElementAt, that way one doesn't need to have the somewhat confusing reverse append-order. Or simply use seq { } and you won't need the atIndex. See following gist for alternatives:

https://gist.github.com/3049158

2012-07-04 19:35 UTC

Mark Seemann #

I use TestDriven.net to run the xUnit.net tests.

Thanks for the alternatives for the ReplaceElementAt function. I was never happy with the readability of my implementation. I like the seq { } alternative best, so I've updated my own source code :)

2012-07-04 19:42 UTC

Published: Monday, 04 June 2012 05:16:35 UTC

Bank OCR kata in F#: user stories 3 and 4 by Mark Seemann

The core unit test #

Implementation #

Valid and legible #

Formatting the entry #

Finding candidates #

Comments

Wish to comment?

Published

Tags