ploeh blog danish software design
Visual Value Verification
Sometimes, the most efficient way to verify the outcome of executing a piece of code is to visualise it.
Recently, I've been working my way through Real World Haskell, and although some of the exercises in the book are exasperating, others are stimulating and engaging. One of the good exercises is to use the Graham Scan algorithm to find the convex hull for a set of points.
This proved to be unexpectedly difficult for me, but I also found the exercise captivating, so I kept at it. My main problems turned out to be related to the algorithm itself, so during the exercise, I temporarily switched to F# in order to work out the kinks of my implementation. This enabled me to focus on the problem itself without also having to fight with an unfamiliar programming language.
Surprisingly, it turned out that one of my biggest problems was that I didn't have a good way to verify my implementation.
Return values #
Since I was approaching the problem with Functional Programming, it ought to be easy to unit test. After all, Functional design is intrinsically testable. My overall function to find the convex hull looks like this:
let inline hull points = // ...
In simplified form, the type of this function is (^a * ^d) list -> (^a * ^d) list
where the ^a
and ^d
generic type arguments have a whole lot of constraints that I don't want to bother you with. In practice, both ^a
and ^d
can be integers, so that the hull function gets the type (int * int) list -> (int * int) list
. In other words: you supply a list of integer points, and you get a list of integer points back.
Here's a simple example:
> hull [(3, 1); (2, 3); (2, 4); (2, 5); (3, 7); (1, 2); (1, 6)];; val it : (int * int) list = [(3, 1); (3, 7); (2, 5); (1, 6); (1, 2)]
Quick! At a glance: is this result correct or incorrect?
How about this result?
> hull [(5, -2); (5, 6); (-4, 7); (-6, 0); (-8, 0); (-2, 5); (-3, -4); (-2, -2); (-9, -7); (2, -9); (4, -2); (2, -10); (4, -10); (4, -9); (2, -10); (3, -9); (8, 2); (-8, -5); (-9, -4); (5, -6); (6, 4); (8, -10); (-5, 0); (5, 9); (-5, -4); (-6, 8); (0, -9); (7, -4); (6, 4); (-8, -5); (-7, -7); (8, -9); (7, -3); (6, 4); (-6, -8); (-4, 4); (-2, -2); (-6, -10); (0, 1); (5, -7); (-5, 4); (5, -5); (6, 4); (0, 7); (5, 5); (-1, -4); (-6, 0); (-9, 3); (5, 6); (-7, 7); (4, -10); (5, -8); (9, -1); (0, -9); (6, 6); (6, -6); (9, 8); (-10, -2); (-3, 2); (-5, -7)];; val it : (int * int) list = [(-6, -10); (2, -10); (4, -10); (8, -10); (9, -1); (9, 8); (5, 9); (-6, 8); (-7, 7); (-9, 3); (-10, -2); (-9, -7)]
(In the first example, the output is incorrect, but in the second, it's correct.)
It's easy enough to write automated unit tests once you know what the expected outcome should be. In this case, my problem was that I didn't have an easy way to calculate if a given list of points was the correct answer or not. After all, I was trying to implement a function that could be used for this purpose, but I needed to know if the function returned the correct values.
In the beginning, I tried to plot the values into Excel, in order to draw them as diagrams, but that soon turned out to be tedious and inefficient.
Then I considered Property-Based Testing, but I couldn't come up with a good set of properties that didn't involve half of the algorithm I was trying to implement.
Visual Value Verification #
The concept of a convex hull is simple, and easy to verify if you can visualise it. That's what I tried to do with Excel, but here my problem was that the process was too cumbersome.
Instead, I decided to pull in FSharp.Charting, because it enabled me to easily visualise the result of calling the hull function. This is all it takes:
open System open FSharp.Charting let inline hullChart points = let hullPoints = hull points let hullChart = let closedHull = hullPoints @ [hullPoints.Head] Chart.Line(closedHull, Name = "Hull") |> Chart.WithStyling(Color = Drawing.Color.Blue) let pointsChart = Chart.Point(points, Name = "Points") |> Chart.WithStyling(Color = Drawing.Color.Black) [hullChart; pointsChart] |> Chart.Combine |> Chart.WithYAxis(MajorGrid = ChartTypes.Grid(Enabled = false)) |> Chart.WithXAxis(MajorGrid = ChartTypes.Grid(Enabled = false))
The signature of the hullChart function is (^a * ^d) list -> FSharp.Charting.ChartTypes.GenericChart
(where, again, the ^a
and ^d
types are generic type arguments with various type constraints; think of them as numbers). It first calls the hull
function with points
. Then it creates a line chart to draw the hull, and a point chart to plot in all the input points. Finally, it combines both charts into a single chart.
With the hullChart function, it's easy to do ad-hoc testing in F# Interactive and visually inspect the results of calling the hull function with various input. At one point, I had a particular problem with my interpretation of the Graham Scan algorithm, and this was easy to see using the hullChart function, which would produce this chart:
With this chart it's easy to see, at a glance, that the calculated hull is concave, and thus not a convex hull. There's an error in the implementation. (This is the first result set that I asked you to evaluate above.)
Struggling on with the exercise, I was able to solve my immediate problem and produce a convex hull from that particular input. Did that mean that I now had the correct implementation, or could there be other errors? I needed more test results before I felt that I had confidence in my implementation.
This, on the other hand, was now easy to get.
First, I could randomly generate points like this:
let randomPoints (r : Random) = [1..r.Next(1, 100)] |> List.map (fun _ -> (r.Next(-10, 10), r.Next(-10, 10)))
For ad-hoc testing, I could now create a random set of points and show the calculated hull:
> (randomPoints (Random()) |> hullChart).ShowChart();;
Immediately, a window would pop up, enabling me to visually verify the calculated hull value. Literally, verification at a glance.
From Visual Value Verification to automated tests #
You may object to this approach because such testing isn't sustainable. Ultimately, we'd like to have a suite of automated tests that can give us a succeed or failure result.
Still, the ability to visually verify the calculated hull values enabled me to produce a set of input points, as well as calculated hull points that I knew to be correct. I could, for example, use the randomPoints function to produce 100 input sets. For each of these 100 input sets, I could visually inspect the diagrams.
Here's an example of six diagrams, instead of 100, just to give you an idea about how quickly you can verify such a set:
If all of the generated diagrams look okay, you know that for at least these 100 sets, the output of the hull function is correct. You can now capture those input values and the corresponding (correct) output values as a parametrised test. Here's an xUnit.net example with five test cases:
// No [<ClassData>] attribute in xUnit.net 2.0 :( type HullDataAttribute() = inherit Xunit.Sdk.DataAttribute () override this.GetData testMethod = // The following input data comes from randomly generated points. // The expected values come from a prototype of the hull function where // the output was subsequently visually inspected by drawing the input // points and the calculated hull on a coordinate system to verify that // the hull prototype function calculated the correct values. seq { yield [| // Points (input): [(3, 1); (3, 7); (2, 5); (2, 4); (1, 6); (2, 3); (1, 2)] // Expected: [(3, 1); (3, 7); (1, 6); (1, 2)] |] yield [| [(1, -4); (2, 5); (1, 3); (1, -3); (1, -2); (0, 4)] [(1, -4); (2, 5); (0, 4)] |] yield [| [(1, 1); (0, 3); (-2, 1); (-4, 3); (5, 2); (3, 2); (5, 5); (2, 5); (1, 3); (1, -3); (1, -2); (7, -4); (-1, 1); (-3, 0); (-5, -2); (1, -4); (0, 1); (0, 4); (3, -3); (6, 1)] [(1, -4); (7, -4); (6, 1); (5, 5); (2, 5); (-4, 3); (-5, -2)] |] yield [| [(-7, -7); (4, -7); (2, 3); (4, 4); (3, 1); (2, -1); (-3, -5); (4, -2); (-1, -7); (-6, 9); (4, 4); (-8, -2); (9, 4); (3, 0); (7, 0); (-7, 3); (0, 9); (4, -7); (-7, -6); (-1, 7); (6, 5); (7, -3); (-8, -8); (-6, -2); (3, 5); (-5, 7); (8, 1); (3, -2); (-9, -4); (-7, 8)] [(-8, -8); (4, -7); (7, -3); (9, 4); (0, 9); (-6, 9); (-7, 8); (-9, -4)] |] yield [| [(3, -3); (-9, -3); (0, 7); (3, 8); (3, -9); (1, 3); (-9, 5); (-4, 9); (-2, -10); (8, -2); (-4, 2); (-7, -9); (-5, -10); (0, 2); (9, -7); (6, -4); (4, 7); (-9, -7); (2, 1); (-3, -5); (-5, -1); (9, 6); (-3, 1); (6, -6); (-5, -4); (-6, 5); (0, 9); (-2, -9); (-6, -10); (-8, -1); (-4, -9); (8, -1); (-5, -5); (9, -6); (4, -8); (-3, 7); (2, 3); (-8, 6); (3, -4); (3, 4); (-6, -5); (-4, 3); (9, -10); (5, 4); (-1, 9); (9, 1); (-1, 7); (8, -7); (1, -1); (0, -9); (2, 1); (0, -8); (8, -3); (-8, 7); (7, 1); (-2, 8); (-4, -2); (-5, -10); (4, -6); (0, -5); (-1, -6); (5, 4); (-7, 6); (-3, 4); (4, 8); (-6, -7); (5, 2); (-9, 2); (5, -6); (4, 2); (7, 8); (7, 7)] [(-6, -10); (-5, -10); (-2, -10); (9, -10); (9, -7); (9, -6); (9, 1); (9, 6); (7, 8); (0, 9); (-1, 9); (-4, 9); (-8, 7); (-9, 5); (-9, 2); (-9, -3); (-9, -7)] |] } [<Theory; HullData>] let ``hull returns correct result`` (points : (int * int) list) (expected : (int * int) list) = let actual = hull points expected =! actual
(The =!
operator is an assertion operator from Unquote; read it as should equal - i.e. expected should equal actual.)
This gives you a deterministic test suite you can run repeatedly to protect the hull function against regressions.
Summary #
Sometimes, the nature of the problem is such that the easiest way to verify that the System Under Test (SUT) produces the correct results, is to visually verify the resulting value of exercising the SUT. We can call this Visual Value Verification (VVV).
In this article, I used the problem of finding the convex hull for a set of points as an example, but I've encountered other problems where this technique has proven useful. The most prominent that comes to mind is when implementing Conway's Game of Life; that's another problem where, just by looking at lists of numbers, you can't easily verify that your implementation is correct.
Once you've visually verified that output looks correct, you can capture the known input and output into a test case that you can run repeatedly.
Command Query Separation when Queries should have side-effects
How can you adhere to Command Query Separation if your Domain Model requires Queries to have side-effects?
Command Query Separation (CQS) can be difficult until you get the hang of it; then it's not so difficult - just like most other things in life :)
In a previous article, I covered how to retrieve server-generated IDs after Create operations. That article discussed how to prevent Commands from turning into Queries. In the present article, you'll see some examples of how to prevent Queries from turning into Commands.
Context #
This article was triggered by a viewer's question related to my Encapsulation and SOLID Pluralsight course. As I interpret it, the hypothetical scenario is some school or university exam taking software:
"If a student has not submitted a solution to an exercise yet, when and if they look at the exercise hint for the first time, flag that hint as viewed. The points granted to a student's solution will be subtracted by 5 points, if the related hint is flagged as viewed."As stated here, it sounds like a Query (reading the exercise hint) must have a side-effect. This time, we can't easily wave it away by saying that the side-effect is one that the client isn't responsible for, so it'll be OK. If the side-effect had been an audit log, we could have gotten away with that, but here the side-effect is within the Domain Model itself.
How can you implement this business requirement while still adhering to CQS? Perhaps you'd like to pause reading for a moment to reflect on this question; then you can compare your notes to mine.
Is it even worth applying CQS to this problem, or should we simply give up? After all, the Domain Model seems to inherently associate certain Queries with side-effects.
In my opinion, it's exactly in such scenarios that CQS really shines. Otherwise, you're looking at the code as a team developer, and you go: Why did the score just go down? I didn't change anything! You can waste hours when side-effects are implicit. Applying CQS makes side-effects explicit, and as the Zen of Python goes:
Explicit is better than implicit.There are various ways to address this apparently impossible problem. You don't have to use any of them, but the first key to choosing your tools is to have something to choose from.
Contextual types #
With the requirements given above, we don't know what we're building. Is it a web-based application? An app? A desktop application? Let's, for a while, imagine that we're developing an app or desktop application. In my fevered imagination, this sort of application may have all the questions and hints preloaded in memory, or in files, and continually displays the current score on the screen. There may not be further persistent storage, or perhaps the application publishes the final scores for the exam to a central server once the exam is over. Think occasionally connected clients.
In this type of scenario, the most important point is to keep the score up-to-date in memory. This can easily be done with a contextual or 'amplified' type. In this case, we can call it Scored<T>:
public sealed class Scored<T> { public readonly T Item; public readonly int Score; public Scored(T item, int score) { if (item == null) throw new ArgumentNullException(nameof(item)); this.Item = item; this.Score = score; } public Scored<T> Add(int scoreDelta) { return new Scored<T>(this.Item, this.Score + scoreDelta); } public override bool Equals(object obj) { var other = obj as Scored<T>; if (other == null) return base.Equals(obj); return object.Equals(this.Item, other.Item) && object.Equals(this.Score, other.Score); } public override int GetHashCode() { return this.Item.GetHashCode() ^ this.Score.GetHashCode(); } }
The Scored<T> class enables you to carry a score value around within a computation. In order to keep the example as simple as possible, I modelled the score as an integer, but perhaps you should consider refactoring from Primitive Obsession to Domain Modelling; that's a different story, though.
This means you can model your API in such a way that a client must supply the current score in order to retrieve a hint, and the new score is returned together with the hint:
public interface IHintQuery { Scored<Hint> Read(int hintId, int currentScore); }
The Read method is a Query, and there's no implied side-effect by calling it. Since the return type is Scored<Hint>, it should be clear to the client that the score may have changed.
An implementation could look like this:
public class HintQuery : IHintQuery { private readonly IHints hints; public HintQuery(IHints hints) { if (hints == null) throw new ArgumentNullException(nameof(hints)); this.hints = hints; } public Scored<Hint> Read(int hintId, int currentScore) { var valFromInner = this.hints.FirstById(hintId); return new Scored<Hint>(valFromInner, currentScore).Add(-5); } }
The Read method uses an injected (lower-level) Query interface to read the answer hint, packages the result in a Scored<Hint> value, and subtracts 5 points from the score.
Both the score type (int) and Scored<T> are immutable. No side-effects occur while the client reads the answer hint, but the score is nonetheless adjusted.
In this scenario, the score travels around in the memory of the application. Perhaps, after the exam is over, the final score can be sent to a central repository for record-keeping. This architecture could work well in client-side implementations, but may be less suitable in stateless web scenarios.
Pessimistic locking #
If you're developing a web-based exam-taking system, you may want to be able to use stateless web servers for scalability or redundancy reasons. In such cases, perhaps keeping the score in memory isn't a good idea.
You could still use the above model, but the client must remember to save the updated score before returning an HTTP response to the browser. Perhaps you find this unsatisfactorily fail-safe, so here's an alternative: use pessimistic locking.
Essentially, you can expose an interface like this:
public interface IHintRepository { void UnlockHint(int hintId); Hint Read(int hintId); }
If a client attempts to call the Read method without first unlocking the hint, the method will throw an exception. First, you'll have to unlock the hint using the UnlockHint method, which is clearly a Command.
This is less discoverable, because you can't tell by the type signature of the Read method that it may fail for that particular reason, but it safely protects the system from accidentally reading the hint without impacting the score.
(In type systems with Sum types, you can make the design clearer by statically modelling the return type to be one of several mutually exclusive cases: hint, no hint (hintId
doesn't refer to an existing hint), or hint is still locked.)
This sort of interface might in fact align well with a good User Experience, because you might want to ask the user if he or she is sure that (s)he wants to see the hint, given the cost. Such a user interface warning would be followed by a call to UnlockHint if the user agrees to the score deduction.
An implementation of UnlockHint would leave behind a permanent record that the answer hint was unlocked by a particular user, and that record can then subsequently be used when calculating the final score.
Summary #
Sometimes, it can be difficult to see how to both follow CQS and implement the desired business logic. In my experience, it's always possible to recast the problem in such a way that this is possible, but it may take some deliberation before it clicks.
Must you always follow CQS? Not necessarily, but if you understand what your options are, then you know what you're saying no to if you decide not to do it. That's quite a different situation from not having any idea about how to apply the principle.
In this article, I showed two options for reconciling CQS with a Domain Model where a Query seems to have side-effects.
Applications and their side effects
All applications have side-effects, but you can isolate those side-effects at the boundaries.
In my Encapsulation and SOLID Pluralsight course, I introduce Command-Query Separation (CQS) as a fundamental tool that will help you think about encapsulation. (I didn't come up with this myself, but rather picked it up from Bertrand Meyer's Object-Oriented Software Construction.)
Despite the age of the CQS principle, it's still news to many people, and I get lots of questions about it; I attempt to answer them as well as I can. Most questions are about specific situations where the inquirer can't see a way out of issuing a Query and at the same time producing a side-effect.
Perhaps the most common situation comes up when auditing comes into play. In some domains, auditing is a legal requirement. Asking any question must produce the side-effect that an audit record is created.
How can you reconcile such requirements with CQS?
Definition #
It may be helpful to first take a step back and attempt to answer the question: what's a side-effect, anyway?
Perhaps we can learn from Functional Programming (gee, who'd have thunk!?), because Functional Programming is all about taming side-effects. Apparently, Simon Peyton-Jones once said during an introduction to Haskell, that if your program has no side-effects, it cannot print a result, it cannot ask for input, it cannot interact with the network or the file system. All it does is heat up the CPU, after which someone from the audience interjected that heating up the CPU is also a side-effect, so, technically speaking, if you want your program to have no side-effects, you cannot even run it. (I only have this story on second hand, but it makes an important point.)
Clearly, there's no such thing as a truly side-effect free program, so how do we define what a side-effect is?
In strictly Functional languages, a side-effect occurs whenever a function isn't referentially transparent. This fancy term means that you can replace a function call with its return value. If a function call isn't equivalent to its return value, it's either because the function call also has a side-effect, or because some side-effect caused the return value to change.
This is closely related to the popular definition of a Query in CQS: Asking the question mustn't change the answer. This is, though, a weaker statement, because it allows a change in global state (e.g. another process updating a database record) to change the answer between two identical queries.
In a completely different context, in REST it's often helpful to distinguish between safe and idempotent requests. The term safe is closely related to a side-effect free Query. As REST in Practice states (p. 38): "By safe, we mean a GET request generates no server-side effects for which the client can be held responsible" (my emphasis). That can often be a useful distinction when thinking about CQS. A Query may cause a side-effect to happen (such as an audit record being written), but that side-effect doesn't concern the client.
Applications are never side-effect free #
All of the above is useful because there's a large set of side-effects we can ignore in practice. We can ignore that the CPU heats up. We can ignore that web servers log HTTP requests. We can (probably) ignore that audit records are written. Such side-effects don't change the answer of a Query.
There may still be cases where you need to deal explicitly with side-effects. You may wish to acknowledge to a user that a file was written. You may want to return a receipt to a client that your service received a document.
It's important to realise that at the application level, applications are all about side-effects.
- Applications may have GUIs; every time the application updates the screen, that's a side-effect.
- An application may be a REST service; it handles HTTP, which is modelled on the Request-Response pattern. Even POST requests have responses. Everything in HTTP looks like Queries, because responses are return values.
- Applications may write to a database; clearly, side-effects are involved.
Applications, on the other hand, are compositions of all relevant concerns. As I've previously described, at the boundaries, applications aren't Object-Oriented. Neither are they Functional. All applications must have a boundary where Encapsulation, or FP, or CQS, doesn't apply. The trick is to keep that boundary as thin as possible.
Thus, if you must violate CQS, do it in the boundary of the application. As an example, perhaps you're creating a REST service that enables clients to create new resources with POST requests. As a response, you want to return the address of the new resource. That violates CQS, but if you keep that violation at the entry point of the request, you've isolated the violation to the boundary of the application.
Summary #
It can be difficult to follow CQS, until you get your head around it, but it's always possible to apply it - except at the application boundary.
How do we know that we can always apply CQS? We know this from Functional Programming. Strict Functional languages like Haskell model everything as Queries (except at the boundaries), and Haskell is a Turing-complete language. If you can model everything as Queries, it should be clear that you can also separate Commands and Queries: if in doubt, eliminate the Command; FP shows how to do that.
Even if you're working in an Object-Oriented language, learn some Functional Programming. It'll teach you how to apply CQS, which is a cornerstone of Encapsulation.
Comments
Unit testing internals
FAQ: How should you unit test internals? A: Through the public API.
This question seems to come up repeatedly: I have some internal (package-private in Java) code. How do I unit test it?
The short answer is: you unit test it as you unit test all other code: through the public API of the System Under Test (SUT).
Purpose of automated testing #
Details can be interesting, but don't lose sight of the big picture. Why do you test your software with automated tests?
Automated testing (as opposed to manual testing) only serves a single purpose: it prevents regressions. Some would say that it demonstrates that the software works correctly, but that's inaccurate. Automated tests can only demonstrate that the software works correctly if the tests are written correctly, but that's a different discussion.
Assuming that all automated tests are correct, then yes: automated tests also demonstrate that the software works, but it's still regression testing. The tests were written to demonstrate that the software worked correctly once. Running the tests repeatedly only demonstrates that it still works correctly.
What does it mean that the software works correctly? When it comes to automated testing, the verification is only as good as the tests. If the tests are good, the verification is strong. If the tests are weak, the verification is weak.
Consider the purpose of writing the software you're currently being paid to produce. Is the purpose of the software to pass all tests? Hardly. The purpose of the software is to solve some problem, to make it easier to perform some task, or (if you're writing games) to entertain. The tests are proxies of the actual purpose.
It ought to be evident, then, that automated tests should be aligned with the purpose of the software. There's nothing new in this: Dan North introduced Behaviour Driven Development (BDD) in order to emphasise that testing should be done with the purpose of the software in mind. You should test the behaviour of the software, not its implementation.
Various software can have different purposes. The software typically emphasised by BDD tends to be business software that solves a business problem. Often, these are full applications. Other types of software include reusable libraries. These exist to be reusable. Common to all is that they have a reason to exist: they have externally visible behaviour that some consumer cares about.
If you want to test a piece of software to prevent regressions, you should make sure that you're testing the externally visible behaviour of the software.
Combinatorics #
In an ideal world, then, all automated testing should be done against the public interface of the SUT. If the application is a web application, testing should be done against the HTML and JavaScript. If the application is a mobile app, testing should be done somehow by automating user interaction against its GUI. In reality, these approaches to testing tend to be brittle, so instead, you can resort to subcutaneous testing.
Even if you're developing a reusable library, or a command-line executable, if you're doing something even moderately complex, you run into another problem: a combinatorial explosion of possible paths through the code. As J.B. Rainsberger explains much better than I can do, if you combine software modules (e.g. validation, business logic, data access, authentication and authorisation, caching, logging, etc) you can easily have tens of thousands distinct paths through a particular part of your software - all via a single entry point.
This isn't related to BDD, or business problems, or agile... It's a mathematical result. There's a reason the Test Pyramid looks like it does.
When you combine a procedure with four distinct paths with another with five paths, the number of possible paths isn't (4 + 5 =) nine; it's (4 * 5 =) twenty. As you combine units together, you easily reach tens of thousands distinct paths through your software. (This is basic combinatorics).
You aren't going to write tens of thousands of automated tests.
In the ideal world, we would like to only test the behaviour of software against its public interface. In the real world, we have to separate any moderately complex software into modules, and instead test the behaviour of those modules in isolation. This prevents the combinatorial explosion.
If you cut up your software in an appropriate manner, you'll get modules that are independent of each other. Many books and articles have been written about how to do this. You've probably heard about Layered Application Architecture, Hexagonal Architecture, or Ports and Adapters. The recent interest related to microservices is another (promising) attempt at factoring code into smaller units.
You still need to care about the behaviour of those modules. Splitting up your software doesn't change the overall purpose of the software (whatever that is).
Internals #
When I do code reviews, often the code is already factored into separate concerns, but internal classes are everywhere. What could be the reason for that?
When I ask, answers always fall into one of two categories:
- Members (or entire classes) are poorly encapsulated, so the developers don't want to expose these internals for fear of destabilising the system.
- Types or members are kept hidden in order to protect the code base from backwards compatibility issues.
private void CalculateAverage() { this.average = TimeSpan.FromTicks( (long)this.durations.Average(ts => ts.Ticks)); }
This CalculateAverage method is marked private because it's unsafe to call it. this.durations
can be null, in which case the method would throw an exception. It may feel like a solution to lock down the method with an access modifier, but really it only smells of poor encapsulation. Instead, refactor the code to a proper, robust design.
There are valid use cases for the private
and internal
access modifiers, but the majority of the time I see private and internal code, it merely smells of poor design. If you change the design, you could make types and members public, and feel good about it.
The other issue, concerns about compatibility, can be addressed as well. In any case, for most developers, this is essentially a theoretical issue, because most code written isn't for public consumption anyway. If you also control all consumers of your API, you may not need to worry about compatibility. If you need to change the name of a class, just let your favourite refactoring tool do this for you, and all is still good. (I'd still recommend that you should adopt a Ranger approach to your Zoo software, but that's a different topic.)
The bottom line: you don't have to hide most of your code behind access modifiers. There are good alternatives that lead to better designs.
Testing internals #
That was a long detour around reasons for testing, as well as software design in general. How, then, do you test internal code?
Through the public API. Remember: the reason for testing is to verify that the observable behaviour of the SUT is correct. If a class or member is internal, it isn't visible; it's not observable. It may exist in order to support the public API, but if you explicitly chose to not make it visible, you also stated that it can't be directly observable. You can't have it both ways. (Yes, I know that .NET developers will point the [InternalsVisibleTo] attribute out to me, but this attribute isn't a solution; it's part of the problem.)
Why would you test something that has no 'official' existence?
I think I know the answer to that question. It's because of the combinatorial explosion problem. You want the software to solve a particular problem, and to keep everything else 'below the surface'. Unfortunately, as the complexity of the software grows, you realise (explicitly or implicitly) that you can't cover the entire code base with high-level BDD-style tests. You want to unit test the internals.
The problem with doing this is that it's just as brittle as testing through a GUI. Every time you change the internals, you'll have to change your tests.
A well-designed system tends to be more stable when it comes to unit testing. A poorly designed system is often characterised by tests that are too coupled to implementation details. As soon as you decide to change something in such a system, you'll need to change the tests as well.
Recall that automated testing is regression testing. The only information we get from running tests is whether or not existing tests passed or failed. We don't learn if the tests are correct. How do we know that tests are correct? Essentially, we know because we review them, see them fail, and never touch them again. If you constantly have to fiddle with your tests, how do you know they still test the right behaviour?
Design your sub-modules well, on the other hand, and test maintenance tends to be much lower. If you have well-designed sub-modules, though, you don't have to make all types internal. Make them public. They are part of your solution.
Summary #
A system's public API should enable you to exercise and verify its observable behaviour. That's what you should care about, because that's the SUT's reason to exist. It may have internal types and members, but these are implementation details. They exist to support the system's observable behaviour, so test them through the public API.
If you can't get good test coverage of the internal parts through the public API, then why do these internal parts exist? If they exhibit no observable behaviour, couldn't you delete them?
If they do exist for a reason, but you somehow can't reach them through the public API, it's a design smell. Address that smell instead of trying to test internals.
Comments
Dear Mark,
Thank you for this excellent article. I got one question: in the "Internals" section, you state that there are valid use cases for the internal access modifier - can you name some of them for me?
I'm also a proponent of keeping nearly all types public in reusable code bases even if most of them are not considered to be part of the API that a client usually consumes - therefore I would only think about marking a class internal if it cannot protect it's invariants properly and fail fast when it's used in the wrong way. But you also mentioned that in this article, too.
When I did a Google search on the topic, I couldn't find any useful answers either. The best one is probably from Eric Lippert on a Stack Overflow question, stating that big important classes that are hard to verify in the development process should be marked internal. But one can easily counter that by not designing code in such a way.
Anyways, it would be very kind of you if you could provide some beneficial use cases for the use of internal.
Sincerely,
Kenny
Kenny, thank you for writing. The answer given by Eric Lippert does, in my opinion, still paint an appropriate picture. There's always a cost to making types or members public, and I don't always wish to pay that cost. The point that I'm trying to make in the present article is that while this cost exists, it's not so high that it justifies unit testing internals via mechanisms like Reflections or [InternalsVisibleTo]. The cost of doing that is higher than making types or members public.
Still, there are many cases where it's possible to cover internal or private code though a public API. Sometimes, I may be in doubt that the way I've modelled the internal code is the best way to do so, and then I'd rather avoid the cost of making it public. After all, making code public means that you're making a promise that it'll be around without breaking changes for a long time.
Much of my code has plenty of private or internal code. An example is Hyprlinkr, which has private helper methods for one of its central features. These private helpers only exist in order to make the feature implementation more readable, but are not meant for public consumption. They're all covered by tests that exercise the various public members of the class.
Likewise, you can consider the private DurationStatistics class from my Temporary Field code smell article. At first, you may want to keep this class private, because you're not sure that it's stable (i.e. that it isn't going to change). Later, you may realise that you'll need that code in other parts of your code base, so you move it to an internal class. If you're still not convinced that it's stable, you may feel better keeping it internal rather than making it public. Ultimately, you may realise that it is, indeed, stable, in which case you may decide to make it public - or you may decide that no one has had the need for it, so it doesn't really matter, after all.
My goal with this article wasn't to advise against internal code, but only to advise against trying to directly call such code for unit testing purposes. In my experience, when people ask how to unit test internal code, it's a problem they have because they haven't used Test-Driven Development (TDD). If you do TDD, you'll have sufficient coverage, and then it doesn't matter how you choose to organise the internals of your code base.
Public types hidden in plain sight
Instead of making types internal to a package, you can hide them in plain sight as public types.
When I review object-oriented code, I often see lots of internal (package-private in Java) classes and members. Sometimes, most of a code base is written at that access level.
When I ask for the reasoning behind this, the answer is invariably: encapsulation.
Encapsulation is one of the most misunderstood concepts in programming. If you think it's about making everything inaccessible, then there's only one logical conclusion: all your code must have this API:
public class Program { public static int Main(string[] args) { // Everything else goes here! } }
If you've seen my Pluralsight course about encapsulation, you'll know that I prefer Bertrand Meyer's view, as outlined in Object-Oriented Software Construction. Essentially, it's about design by contract (pre- and post-conditions).
Once I start to ask more pointed questions about enthusiastic use of internal-only access, it turns out that often, the underlying reason is that the developers want to be able to change the design of their internal types and members, without breaking existing clients. A noble goal.
Imagine that such a code base has existing clients. While the maintainers can't change the public types without breaking these clients, they can change all internal types as much as they want.
Unfortunately, there are disadvantages to this design strategy. You can't easily unit test such code bases. There are ways, but none of them are good.
What if, instead of making types internal, you made them public? Are there ways to prevent clients from relying on such types? Yes, plenty of ways.
Public, append-only #
My preferred approach to this problem is to make types public anyway. If my overall feeling about a type is that "it seems okay", I make it public, even if I know that there's a risk involved with this.
Consider, as an example, the private DurationStatistics class I extracted in a recent article. In that article, I kept the class private, because I wanted to discuss visibility in a separate article.
You can easily promote the DurationStatistics class to a public class:
public class DurationStatistics
Here you only see the declaration of the class, because the implementation is exactly the same as in the other article. The code is moved out of the Estimator class and put into its own file, though.
How can you be sure that DurationStatistics is 'done'? How can you be sure that you're not going to need to change it later?
You can't.
You can, however, deal with that when it occurs. You can still add new members to the class, but you can't remove members or change the type's name.
If you need to change something (thereby breaking compatibility), then don't change it. Instead, add the new type or member, but leave the old artefact in place and deprecate it. In .NET, you can do this by adorning the member or type with the [Obsolete] attribute. You can even add a message that points clients to the new, preferred way of doing things:
[Obsolete("DurationStatistics is being retired (our fault). Use Foo instead.")] public class DurationStatistics
This will cause a compiler warning in all client code using the DurationStatistics class. You should make the message as helpful and friendly as possible.
This may leave your code with compiler warnings. This is good, because you should work to remove these warnings from your own code base. The only place you should leave them in place is in your unit tests. As long as you have a deprecated type in your published code base, you should also keep it covered by unit tests. This will cause compiler warnings in your unit tests, but you can suppress such warnings explicitly there:
[Theory] [InlineData(new[] { 1, 1, 1 }, 1)] [InlineData(new[] { 2, 3, 4 }, 3)] public void AverageIsCorrect(int[] seconds, int expectedSeconds) { #pragma warning disable 618 var sut = new DurationStatistics( seconds.Select(s => TimeSpan.FromSeconds(s)).ToArray()); #pragma warning restore 618 var actual = sut.Average; Assert.Equal(expectedSeconds, actual.TotalSeconds); }
Keep the 'disable scope' as small as possible, and always remember to restore the warning after disabling it.
Perhaps you think that this approach of just making everything public and dealing with the consequences later is an unprofessional, unhygienic, and ugly way to evolve code, but it's really the logical way to produce and publish SOLID code.
Obsolete by default #
A variant of the above technique is that instead of making a type or member internal, you can deprecate it right away. This will immediately warn clients that they're doing something they aren't supposed to do.
If you ever decide to 'promote' that type to a bona fide member of your API, you can simply remove the [Obsolete] attribute.
Separate namespaces #
You can also hide public types in separate namespaces. If you don't document the types in those 'hidden' namespaces, clients will again have to do something explicit to use them.
This is a softer way of hiding public types in plain sight, because clients get no compiler warnings if they use those 'hidden' types. Still, it can be quite effective.
My experience with maintaining public software (e.g. the now-more-than-six-year-old AutoFixture project) is that the most common problem is that users don't even find the public types I want them to find! If you knew how many times I've seen people reinvent a feature already available in the API... And that's even when all the interesting stuff is in the same namespace.
Putting types in separate namespaces is, in my experience, quite an effective way of hiding them.
Separate libraries #
Ultimately, you can put your volatile types in a separate library. In the Estimation example, you can ship the Estimator class (your 'real' public API) in one library, but put DurationStatistics in another library:
The Estimation library references the Estimation.Statistics library, so it can use all the public types in that library. You can unit test the public types in the Estimation library, but you can also unit test the public types in the Estimation.Statistics library.
When you publish your API, you give clients a reference to Estimation, but not to Estimation.Statistics. You still ship Estimation.Statistics as a dependency of your API, but clients shouldn't reference it.
Specifically, if you wish to publish your API as a NuGet package, you can use the <references>
element to ensure that only the 'official' library is added to a project:
In this library, I installed the (local) Estimation NuGet package, and as you can see, only Ploeh.Samples.Estimation is referenced, whereas Ploeh.Samples.Estimation.Statistics isn't. This means that the client can easily use the official API, e.g. to create a new instance of the Estimator class:
this.estimator = new Estimator(TimeSpan.FromMinutes(1));
On the other hand, if a client developer attempts to use the public DurationStatistics class, that's not possible:
Not only is DurationStatistics not available, all Visual Studio can suggest is to create it; it can't suggest pulling in the appropriate assembly reference, which is a good thing.
The library is still there, it's just not referenced, so Estimator still works at run-time.
The trick to set up a NuGet package in this way is to use the <references>
element in the .nuspec file:
<?xml version="1.0"?> <package > <metadata> <id>Estimation</id> <version>1.0.0</version> <authors>Mark Seemann</authors> <owners>Mark Seemann</owners> <requireLicenseAcceptance>false</requireLicenseAcceptance> <description> This is an example that demonstrates how dependent libraries can hide public types in plain sight. </description> <copyright>Copyright Mark Seemann 2015</copyright> <references> <reference file="Ploeh.Samples.Estimation.dll" /> </references> </metadata> <files> <file src="..\Estimation\bin\Debug\Ploeh.Samples.Estimation.dll" target="lib\net45" /> <file src="..\Estimation\bin\Debug\Ploeh.Samples.Estimation.Statistics.dll" target="lib\net45" /> </files> </package>
Notice that both Ploeh.Samples.Estimation.dll and Ploeh.Samples.Estimation.Statistics.dll are included in the NuGet package, but that only Ploeh.Samples.Estimation.dll should be referenced when the NuGet package is added.
For good measure I should point out that in order to create these demo files, I only installed the Estimation NuGet package from my local disk (using the -Source switch), so don't go and look for it on nuget.org.
Summary #
There are plenty of ways to hide public types in plain sight. Often, one or more of these options are better than misusing the internal/package-private access modifier. This doesn't mean that you must never make a class internal, but it means that you should consider the advantages and disadvantages of all options before making such a decision.
The main disadvantage of using the internal access modifier is that you can't easily unit test such a class. The options provided here should give you plenty of alternatives that will enable you to easily unit test classes while still hiding them from client developers.
Comments
The problem with the [InternalsVisibleTo] attribute is exactly that it enables you to unit test internals, which you shouldn't.
Dear Mark,
Thanks for another great article. I totally agree with your point of view, but I can also see why Børge might be somewhat unsatisfied with the answer to not use the InternalsVisibleTo attribute in the context of automated testing.
I think that using internal to hide certain types of a reusable code base is a violation of the Open/Closed Principle, because there is no easy way to extend the internal types without touching the source code (you can use reflection to call internal methods, but that's annoying, and you simply cannot derive from an internal class or implement an internal interface.)
What are your thoughts on this argument?
With regards,
Kenny
Kenny, thank you for writing. Overall, I like that additional argument. In order to keep this discussion balanced, I think it's worth pointing out that your argument certainly applies to reusable software, but that lots of software isn't (or rather, shouldn't be) reusable. Udi Dahan has pointed out that reuse may be overrated, or even have negative consequences, and while I've been disagreeing with certain parts of that claim, in general I think there's much wisdom in that observation.
Some software is written to be reusable, but much software shouldn't be. This is the sort of software I call Zoo software. Your argument about violation of the OCP certainly applies to Wildlife Software, but only indirectly to Zoo software.
Temporary Field code smell
Temporary Field is a well-known code smell. Here's an example and remedy.
The Temporary Field code smell was described more than a decade ago, but I keep encountering it when doing code reviews. Despite its vintage, I couldn't find a good example, so I decided to provide one.
The code smell is described in Refactoring:
"Sometimes you see an object in which an instance variable is set only in certain circumstances. Such code is difficult to understand, because you expect an object to need all of its variables. [...]
"A common case of temporary field occurs when a complicated algorithm needs several variables. Because the programmer didn't want to pass around a huge parameter list (who does?), he put them in fields."
- Refactoring, Martin Fowler et al., Addison-Wesley 1999. p. 84
Unfortunately, Refactoring doesn't provide an example, and I couldn't find a good, self-contained example on the web either.
Example: estimate a duration #
In this example, a developer was asked to provide an estimate of a duration, based on a collection of previously observed durations. The requirements are these:
- There's a collection of previously observed durations. These must be used as statistics upon which to base the estimate.
- It's better to estimate too high than too low.
- Durations are assumed to be normal distributed.
- The estimate should be higher than the actual duration in more than 99% of the times.
- If there are no previous observations, a default estimate must be used as a fall-back mechanism.
It's not that bad, actually. Because the distribution is assumed to be normal, you can find a good estimate by calculating the average, and add three times the standard deviation.
Here's how our developer attempted to solve the problem:
public class Estimator { private readonly TimeSpan defaultEstimate; private IReadOnlyCollection<TimeSpan> durations; private TimeSpan average; private TimeSpan standardDeviation; public Estimator(TimeSpan defaultEstimate) { this.defaultEstimate = defaultEstimate; } public TimeSpan CalculateEstimate( IReadOnlyCollection<TimeSpan> durations) { if (durations == null) throw new ArgumentNullException(nameof(durations)); if (durations.Count == 0) return this.defaultEstimate; this.durations = durations; this.CalculateAverage(); this.CalculateStandardDeviation(); var margin = TimeSpan.FromTicks(this.standardDeviation.Ticks * 3); return this.average + margin; } private void CalculateAverage() { this.average = TimeSpan.FromTicks( (long)this.durations.Average(ts => ts.Ticks)); } private void CalculateStandardDeviation() { var variance = this.durations.Average(ts => Math.Pow( (ts - this.average).Ticks, 2)); this.standardDeviation = TimeSpan.FromTicks((long)Math.Sqrt(variance)); } }
The CalculateEstimate method directly uses the temporary field durations
, as well as implicitly the fields average
and standardDeviation
. The reason for this is to avoid passing parameters around. Both the average and the standard deviation depend on the durations
, but the standard deviation also depends on the average.
These dependencies are difficult to see in this code snippet:
this.durations = durations; this.CalculateAverage(); this.CalculateStandardDeviation();
What if I wanted to switch the order around?
this.durations = durations; this.CalculateStandardDeviation(); this.CalculateAverage();
This compiles! It also produces a result if you invoke the CalculateEstimate method. No exception is thrown, but the result returned is incorrect!
Not only is this code difficult to understand, it's also brittle. Furthermore, it's not thread-safe.
This was most likely done with the best of intentions. After all, in Clean Code you learn that methods with zero arguments are better than methods with one argument (which are better than methods with two arguments, and so on).
There are better ways to factor the code, though.
Extract Class #
In Refactoring, the suggested cure is to extract a class that contains only the temporary fields. This refactoring is called Extract Class. Your first step can be to introduce a private, nested class within the containing class. In the case of the example, you might call this class DurationStatistics. It can be used from CalculateEstimate in this way:
public TimeSpan CalculateEstimate( IReadOnlyCollection<TimeSpan> durations) { if (durations == null) throw new ArgumentNullException(nameof(durations)); if (durations.Count == 0) return this.defaultEstimate; var stats = new DurationStatistics(durations); var margin = TimeSpan.FromTicks(stats.StandardDeviation.Ticks * 3); return stats.Average + margin; }
It's now much clearer what's going on. You have some statistics based on the durations
, and those contain both the average and the standard deviation. The flow of data is also clear: you need the stats
to get the average and the standard deviation, and you need stats.StandardDeviation
in order to calculate the margin
, and the margin
before you can calculate the return value. If you attempt to move those lines of code around, it's no longer going to compile.
This solution is also thread-safe.
Here's the full code of the Estimator class, after the Extract Class refactoring:
public class Estimator { private readonly TimeSpan defaultEstimate; public Estimator(TimeSpan defaultEstimate) { this.defaultEstimate = defaultEstimate; } public TimeSpan CalculateEstimate( IReadOnlyCollection<TimeSpan> durations) { if (durations == null) throw new ArgumentNullException(nameof(durations)); if (durations.Count == 0) return this.defaultEstimate; var stats = new DurationStatistics(durations); var margin = TimeSpan.FromTicks(stats.StandardDeviation.Ticks * 3); return stats.Average + margin; } private class DurationStatistics { private readonly IReadOnlyCollection<TimeSpan> durations; private readonly Lazy<TimeSpan> average; private readonly Lazy<TimeSpan> standardDeviation; public DurationStatistics(IReadOnlyCollection<TimeSpan> durations) { if (durations == null) throw new ArgumentNullException(nameof(durations)); if (durations.Count == 0) throw new ArgumentException( "Empty collection not allowed.", nameof(durations)); this.durations = durations; this.average = new Lazy<TimeSpan>(this.CalculateAverage); this.standardDeviation = new Lazy<TimeSpan>(this.CalculateStandardDeviation); } public TimeSpan Average { get { return this.average.Value; } } public TimeSpan StandardDeviation { get { return this.standardDeviation.Value; } } private TimeSpan CalculateAverage() { return TimeSpan.FromTicks( (long)this.durations.Average(ts => ts.Ticks)); } private TimeSpan CalculateStandardDeviation() { var variance = this.durations.Average(ts => Math.Pow( (ts - this.average.Value).Ticks, 2)); return TimeSpan.FromTicks((long)Math.Sqrt(variance)); } } }
As you can see, the Estimator class now only has a single read-only field, which ensures that it isn't temporary.
The original intent of not passing parameters around is still preserved, so this solution still adheres to that Clean Code principle.
The DurationStatistics class lazily calculates the average and standard deviation, and memoizes their results (or rather, that's what Lazy<T> does).
Instances of DurationStatistics have a shorter lifetime than the containing Estimator object.
The DurationStatistics class is a private, nested class, but a next step might be to pull it out to a public class in its own right. You don't have to do this. Sometimes I leave such classes as private classes, because they only exist to organise the code better; their purpose and behaviour are still narrow in scope, and associated with the containing class. In other cases, such a refactoring may uncover a great way to model a particular type of problem. If that's the case, making it a public class could make the entire code base better, because you've now introduced a reusable concept.
Summary #
Class fields are intended to be used by the class. They constitute the data part of data with behaviour. If you have temporary values, either pass them around as arguments, or extract a new class to contain them.
Comments
How about passing everything as parameters and returning the calculated value as result, would this be considered dirty code?
Just trying to get the most out of this example.
Tom, thank you for writing. Great observation about passing arguments around instead! In this case, passing the necessary arguments around would also be a good solution. It would look like this:
public class Estimator { private readonly TimeSpan defaultEstimate; public Estimator(TimeSpan defaultEstimate) { this.defaultEstimate = defaultEstimate; } public TimeSpan CalculateEstimate( IReadOnlyCollection<TimeSpan> durations) { if (durations == null) throw new ArgumentNullException(nameof(durations)); if (durations.Count == 0) return this.defaultEstimate; var average = CalculateAverage(durations); var standardDeviation = CalculateStandardDeviation(durations, average); var margin = TimeSpan.FromTicks(standardDeviation.Ticks * 3); return average + margin; } private static TimeSpan CalculateAverage( IReadOnlyCollection<TimeSpan> durations) { return TimeSpan.FromTicks((long)durations.Average(ts => ts.Ticks)); } private static TimeSpan CalculateStandardDeviation( IReadOnlyCollection<TimeSpan> durations, TimeSpan average) { var variance = durations.Average(ts => Math.Pow( (ts - average).Ticks, 2)); return TimeSpan.FromTicks((long)Math.Sqrt(variance)); } }
This refactoring also eliminates the temporary fields, and is probably even easier to understand.
To be quite honest, that's what I would have done with this particular example if it had been production code. The reason I wanted to show the Extract Class refactoring instead is that passing arguments doesn't scale well when you add more intermediate values. In this example, the maximum number of arguments you have to pass is two, which is easily within acceptable limits, but what if you have 15 intermediate values?
Passing 15 method arguments around is well beyond most people's threshold, so instead they sometimes resort to temporary fields. The point here is that this isn't necessary, because you have the alternative of extracting classes.
Why didn't I show an example with 15 intermediate values, then? Well, first, I couldn't think of a realistic example, and even if I could, it would be so complicated that no one would read the article :)
Sounds reasonable - Passing one or two around is okay but at a certain point an Extract Class would make more sense.
No, it's a good example otherwise it would be too confusing as you mentioned. Interesting post, thanks!
Ad hoc Arbitraries with FsCheck.Xunit
When using FsCheck with xUnit.net, you can define ad hoc Arbitraries in-line in your test functions.
Writing properties with FsCheck and using xUnit.net as a test host is a nice combination. Properties are written as normal functions annotated with the Property attribute:
[<Property>] let ``findNeighbors returns 8 cells`` (cell : int * int) = let actual : Set<int * int> = findNeighbors cell 8 =! actual.Count
FsCheck takes care of generating values for the cell
argument. This works well in many cases, but sometimes you need a bit more control over the range of values being generated by FsCheck.
Motivating example #
You may already have guessed it from the above code snippet, but the example for this article is a property-based approach to Conway's Game of Life. One of the properties of the game is that any live cell with more than three live neighbours dies.
This means that you have to write a property where one of the values is the number of live neighbours. That number must be a (randomly generated) number between 4 and 8 (including both ends), because the maximum number of neighbours in a cell grid is eight. How do you get FsCheck to give you a number between 4 and 8?
Crude solution: conditional property #
There are various solutions to the problem of getting fine control over certain values generated by FsCheck, but the ones I first learned turn out to be problematic.
One attempt is to use conditional properties, where you supply a boolean expression, and FsCheck will throw away all properties that fail the boolean test:
[<Property>] let ``Any live cell with more than three live neighbors dies`` (cell : int * int) (neighborCount : int) = (3 < neighborCount && neighborCount <= 8) ==> lazy let neighborCells = findNeighbors cell |> pickRandom neighborCount let actual = calculateNextState (cell :: neighborCells) cell Dead =! actual
The ==>
operator is an FsCheck-specific operator that indicates that FsCheck should disregard all properties where the input arguments don't satisfy the boolean condition on the left side. The use of lazy ensures that FsCheck will only attempt to evaluate the property when the condition is true.
That's simple and tidy, but unfortunately doesn't work. If you attempt to run this property, you'll get a result like this:
Test 'Ploeh.Katas.GameOfLifeProperties.Any live cell with more than three live neighbors dies' failed: Arguments exhausted after 34 tests.
The reason is that FsCheck generates random integers for neighborCount
, and most of these values (statistically) fall outside of the range 4-8. After a (default) maximum of 1000 attempts, FsCheck gives up, because at that time, it's only managed to find 34 values that satisfy the condition, but it wants to find 100.
What a disappointment.
Crude solution: custom Arbitrary #
Another apparent solution is to define a custom Arbitrary for FsCheck.Xunit. The mechanism is to define a static class with your custom rule, and register that with the Property attribute. The class must have a static method that returns an Arbitrary<'a>.
In this particular example, you'll need to define a custom Arbitrary that only picks random numbers between 4 and 8. That's easy, but there's a catch: if you change the way int
values are generated, you're also going to impact the generated cell
values, because a cell here is an int * int
tuple.
Since you only need a small number, you can cheat and customize byte values instead:
type ByteBetween1and8 = static member Byte () = Gen.elements [1uy .. 8uy] |> Arb.fromGen [<Property(Arbitrary = [| typeof<ByteBetween1and8> |])>] let ``Any live cell with more than three live neighbors dies`` (cell : int * int) (neighborCount : byte) = neighborCount > 3uy ==> lazy let neighborCells = findNeighbors cell |> pickRandom (int neighborCount) let actual = calculateNextState (cell :: neighborCells) cell Dead =! actual
The ByteBetween1and8 type is a static class with a single static member that returns Arbitrary<byte>. By using the Arbitrary property of the Property attribute (!), you can register this custom Arbitrary with the Property.
This 'solution' side-steps the issue by using a substitute data type instead of the desired data type. There are several problems with this:
- You have to convert the byte value back to an integer in order to use it with the System Under Test:
int neighborCount
. 3uy
is less readable than3
.- A newcomer will wonder why
neighborCount
is a byte instead of an int. - You can't generalise this solution. It works because F# has more than one number type, but if you need to generate strings, you're out of luck: there's only one (normal) type of string in .NET.
This way works, but is hardly elegant or safe.
Ad hoc, in-line Arbitraries #
There's a safe way to define ad hoc, in-line Arbitraries with FsCheck.Xunit:
[<Property>] let ``Any live cell with > 3 live neighbors dies`` (cell : int * int) = let nc = Gen.elements [4..8] |> Arb.fromGen Prop.forAll nc (fun neighborCount -> let liveNeighbors = cell |> findNeighbors |> shuffle |> Seq.take neighborCount |> Seq.toList let actual : State = calculateNextState (cell :: liveNeighbors |> shuffle |> set) cell Dead =! actual)
Using Prop.forAll enables you to execute your property with a custom Arbitrary, but mixed with the 'normally' generated values that the function receives via its arguments. In this example, nc
is an Arbitrary<int>. Notice how it's explicitly used to populate the neighborCount
value of the property, whereas the cell
value arrives via normal means.
This is type-safe, because nc
is an Arbitrary<int>, which means that neighborCount
is statically inferred to be an int
.
If you need more than a single ad hoc Arbitrary, you can always create Arbitraries for each of them, and then use Gen.map2, Gen.map3, and so on, to turn those individual Arbitraries into a single Arbitrary of a tuple. You can then use that tuple with Prop.forAll.
Summary #
FsCheck is a well-designed library that you can combine in lots of interesting ways. In this article you learned how to use Prop.forAll to evaluate a property with a mix of normal, arbitrarily generated values, and an ad hoc, in-line Arbitrary.
Addendum 2016-03-01: You can write such properties slightly better using the backward pipe operator.
When x, y, and z are great variable names
A common complaint against Functional Programming is the terse naming: x and y for variables, and f for functions. There are good reasons behind these names, though. Learn to love them here.
One of the facets of Function Programming that bothered my when I first started to look into it, is that often in examples, variables and functions have terribly terse names like x
, y
, f
, and so on. I wasn't alone, feeling like that, either:
"Functional programmer: (noun) One who names variables "x", names functions "f", and names code patterns "zygohistomorphic prepromorphism"" - James IryIn this article, I'm not going to discuss zygohistomorphic prepromorphism, but I am going to discuss names like
x
and f
.
Descriptive names #
When I started my Functional Programming journey, I came from a SOLID Object-Oriented background, and I had read and internalised Clean Code - or so I thought.
Readable code should have descriptive names, and f
and x
hardly seem descriptive.
For a while, I thought that the underlying reason for those 'poor' names was that the people writing all that Functional example code were academics with little practical experience in software development. It seems I'm not the only person who had that thought.
It may be true that Functional Programming has a root in mathematics, and that it has grown out of academia rather than industry, but there are good reasons that some names seem rather generic.
Generics #
In statically typed Functional languages like F# or Haskell, you rarely declare the types of functions and arguments. Instead, types are inferred, based on usage or implementation. It often turns out that functions are more generic than you first thought when you started writing it.
Here's a simple example. When I did the Diamond kata with Property-Based Testing, I created this little helper function along the way:
let isTwoIdenticalLetters x = let hasIdenticalLetters = x |> Seq.distinct |> Seq.length = 1 let hasTwoLetters = x |> Seq.length = 2 hasIdenticalLetters && hasTwoLetters
As the name of the function suggests, it tells us if x
is a string of two identical letters. It returns true for strings such as "ff", "AA", and "11", but false for values like "ab", "aA", and "TTT".
Okay, so there's already an x
there, but this function works on any string, so what else should I have called it? In C#, I'd probably called it text
, but that's at best negligibly better than x
.
Would you say that, based on the nice, descriptive name isTwoIdenticalLetters, you understand what the function does?
That may not be the case.
Consider the function's type: seq<'a> -> bool when 'a : equality
. What!? That's not what we expected! Where's the string
?
This function is more generic than I had in mind when I wrote it. System.String implements seq<char>, but this function can accept any seq<'a> (IEnumerable<T>), as long as the type argument 'a
supports equality comparison.
So it turns out that text
would have been a bad argument name after all. Perhaps xs
would have been better than x
, in order to indicate the plural nature of the argument, but that's about as much meaning as we can put into it. After all, this all works as well:
> isTwoIdenticalLetters [1; 1];; val it : bool = true > isTwoIdenticalLetters [TimeSpan.FromMinutes 1.; TimeSpan.FromMinutes 1.];; val it : bool = true > isTwoIdenticalLetters [true; true; true];; val it : bool = false
That function name is misleading, so you'd want to rename it:
let isTwoIdenticalElements x = let hasIdenticalLetters = x |> Seq.distinct |> Seq.length = 1 let hasTwoLetters = x |> Seq.length = 2 hasIdenticalLetters && hasTwoLetters
That's better, but now the names of the values hasIdenticalLetters and hasTwoLetters are misleading as well. Both are boolean values, but they're not particularly about letters.
This may be more honest:
let isTwoIdenticalElements x = let hasIdenticalElements = x |> Seq.distinct |> Seq.length = 1 let hasTwoElements = x |> Seq.length = 2 hasIdenticalElements && hasTwoElements
This is better, but now I'm beginning to realize that I've been thinking too much about strings and letters, and not about the more general question this function apparently answers. A more straightforward (depending on your perspective) implementation may be this:
let isTwoIdenticalElements x = match x |> Seq.truncate 3 |> Seq.toList with | [y; z] -> y = z | _ -> false
This may be slightly more efficient, because it doesn't have to traverse the sequence twice, but most importantly, I think it looks more idiomatic.
Notice the return of 'Functional' names like y
and z
. Although terse, these are appropriate names. Both y
and z
are values of the generic type argument 'a
. If not y
and z
, then what would you call them? element1
and element2
? How would those names be better?
Because of F#'s strong type inference, you'll frequently experience that if you use as few type annotations as possible, the functions often turn out to be generic, both in the technical sense of the word, but also in the English interpretation of it.
Likewise, when you create higher-order functions, functions passed in as arguments are often generic as well. Such a function could sometimes be any function that matches the required type, which means that f
is often the most appropriate name for it.
Scope #
Another problem I had with the Functional naming style when I started writing F# code was that names were often short. Having done Object-Oriented Programming for years, I'd learned that names should be sufficiently long to be descriptive. As Code Complete explains, teamMemberCount
is better than tmc
.
Using that argument, you'd think that element1
and element2
are better names than y
and z
. Let's try:
let isTwoIdenticalElements x = match x |> Seq.truncate 3 |> Seq.toList with | [element1; element2] -> element1 = element2 | _ -> false
At this point, the discussion becomes subjective, but I don't think this change is helpful. Quite contrary, these longer names only seem to add more noise to the code. Originally, the distance between where y
and z
are introduced and where they're used was only a few characters. In the case of z
, that distance was 9 characters. After the rename, the distance between where element2
is introduced and used is now 16 characters.
There's nothing new about this. Remarkably, I can find support for my dislike of long names in small scopes in Clean Code (which isn't about Functional Programming at all). In the last chapter about smells and heuristics, Robert C. Martin has this to say about scope and naming:
"The length of a name should be related to the length of the scope. You can use very short variable names for tiny scopes, but for big scopes you should use longer names.
"Variable names like
i
andj
are just fine if their scope is five lines long."
Do you use variable names like i
in for
loops in C# or Java? I do, so I find it appropriate to also use short names in small functions in F# and Haskell.
Well-factored Functional code consists of small, concise functions, just as well-factored SOLID code consists of small classes with clear responsibilities. When functions are small, scopes are small, so it's only natural that we encounter many tersely named variables like x
, y
, and f
.
It's more readable that way.
Summary #
There are at least two good reasons for naming values and functions with short names like f
, x
, and y
.
- Functions are sometimes so generic that we can't say anything more meaningful about such values.
- Scopes are small, so short names are more readable than long names.
Comments
Dave, thank you for writing. FWIW, I don't think there's anything wrong with longer camel-cased names when a function or a value is more explicit. As an example, I still kept the name of the example function fairly long and explicit: isTwoIdenticalElements
.
When I started with F#, I had many years of experience with writing C# code, and in the beginning, my F# code was more verbose than it is today. What I'm trying to explain with this article isn't that the short names are terse for the sake of being terse, but rather because sometimes, the functions and values are so generic that they could be anything. When that happens, f
and x
are good names. When functions and values are less generic, the names still ought to be more descriptive.
Type Driven Development: composition
When you develop a system with an outside-in technique like Type Driven Development, you'll eventually have all the required building blocks. Now you need to compose them into an application. This post shows an example.
In my article about Type Driven Development, I demonstrated how to approach a problem in an iterative fashion, using the F# type system to do outside-in development, and in a follow-up article, I showed you how to implement one of the inferred methods. In this article, you'll see how to compose all the resulting building blocks into an application.
Building blocks #
In the first article, you learned that apart from the functions defined in that article itself, you'd need four other functions:
- ReadyData -> bool
- unit -> Timed<MessageHandler option>
- NoMessageData -> bool
- unit -> Timed<'a>
As you can see in my Type-Driven Development with F# Pluralsight course, some of the other implementations turn out to have function arguments themselves. It's not quite enough with only those four functions. Still, the final number of implementation functions is only 9.
Here are all the building blocks (excluding the types and functions related to Timed<'a>):
- run : (PollingConsumer -> PollingConsumer) -> PollingConsumer -> PollingConsumer
- transition : (ReadyData -> bool) -> (unit -> Timed<MessageHandler option>) -> (NoMessageData -> bool) -> (unit -> Timed<'a>) -> PollingConsumer -> PollingConsumer
- shouldIdle : TimeSpan -> DateTimeOffset -> NoMessageData -> bool
- idle : TimeSpan -> unit -> Timed<unit>
- shouldPoll : (TimeSpan list -> TimeSpan) -> DateTimeOffset -> ReadyData -> bool
- poll : (unit -> 'a option) -> ('a -> 'b) -> (unit -> DateTimeOffset) -> unit -> Timed<MessageHandler option>
- calculateExpectedDuration : TimeSpan -> TimeSpan list -> TimeSpan
- simulatedPollForMessage : Random -> unit -> unit option
- simulatedHandle : Random -> unit -> unit
Composition #
When the Polling Consumer starts, it can start by figuring out the current time, and calculate some derived values from that and some configuration values, as previously explained.
let now' = DateTimeOffset.Now let stopBefore' = now' + TimeSpan.FromMinutes 1. let estimatedDuration' = TimeSpan.FromSeconds 2. let idleDuration' = TimeSpan.FromSeconds 5.
The estimatedDuration'
value is a TimeSpan containing a (conservative) estimate of how long time it takes to handle a single message. It's only used if there are no already observed message handling durations, as the algorithm then has no statistics about the average execution time for each message. This value could come from a configuration system, or a command-line argument. In a recent system, I just arbitrarily set it to 2 seconds.
Given these initial values, we can compose all other required functions:
let shouldPoll' = shouldPoll (calculateExpectedDuration estimatedDuration') stopBefore' let r' = Random() let handle' = simulatedHandle r' let pollForMessage' = simulatedPollForMessage r' let poll' = poll pollForMessage' handle' Clocks.machineClock let shouldIdle' = shouldIdle idleDuration' stopBefore' let idle' = idle idleDuration' let transition' = transition shouldPoll' poll' shouldIdle' idle' let run' = run transition'
The composed run'
function has the type PollingConsumer -> PollingConsumer
. The input PollingConsumer value is the start state, and the function will return another PollingConsumer when it's done. Due to the way the run
function is implemented, this return value is always going to be StoppedState.
Execution #
All that's left to do is to execute the run'
function with a ReadyState as the initial state:
let result' = run'(ReadyState([] |> Timed.capture Clocks.machineClock))
A ReadyState value is required as input, and the ReadyState case constructor takes a ReadyData value as input. ReadyData is an alias for Timed<TimeSpan list>. An the beginning, the Polling Consumer hasn't observed any messages, so the TimeSpan list should be empty.
The empty TimeSpan list
must be converted to Timed<TimeSpan list>
, which can be done by piping it into Timed.capture
, using the machine clock.
When you execute the run' function, it may produce output like this:
Polling Sleeping Polling Sleeping Polling Handling Polling Sleeping Polling Sleeping Polling Sleeping Polling Handling Polling Sleeping Polling Sleeping Polling Sleeping Polling Sleeping Polling Handling Polling Handling Polling Sleeping Real: 00:00:59.392, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
This is because I set up some of the functions to print to the console so that we can see what's going on.
Notice that the state machine ran for 59 seconds and 392 milliseconds, exiting just before the minute was up.
Summary #
Once you have all the appropriate building blocks, you can compose your desired system and run it. Notice how much this resembles the Composition Root pattern, only with functions instead of objects.
If you want to see more details about this example, or get access to the full source code, you can watch my Type-Driven Development with F# Pluralsight course. Please be aware that only certain subscription levels will give you source code access.
Type Driven Development: implementing shouldIdle
Type Driven Development is an outside-in technique. Once you have the overall behaviour defined, you need to implement the details. Here's an example.
In my article about Type Driven Development, I demonstrated how to approach a problem in an iterative fashion, using the F# type system to do outside-in development. With the overall behaviour in place, there's still work to be done.
From type inference of the higher-order functions' arguments, we know that we still need to implement functions with these signatures:
- ReadyData -> bool
- unit -> Timed<MessageHandler option>
- NoMessageData -> bool
- unit -> Timed<'a>
NoMessageData -> bool
function. If you want to see how to implement the other three functions, you can watch my Type-Driven Development with F# Pluralsight course.
The NoMessageData -> bool
function is defined as the shouldIdle
argument to the transitionFromNoMessage higher-order function. The purpose of the shouldIdle function is to determine whether there's enough remaining time to idle.
Development #
Since we know the signature of the function, we can start by declaring it like this:
let shouldIdle (nm : NoMessageData) : bool =
Although it doesn't have to be called shouldIdle, in this case, I think the name is appropriate.
In order to determine if there's enough time left to idle, the function must know what time it is right now. Recall that, by design, PollingConsumer states are instantaneous, while transitions take time. The time a transition starts and stops is captured by a Timed<'a> value.
The nm
argument has the type NoMessageData
, which is an alias for Timed<TimeSpan list>
. The Timed
part contains information about when the transition into the No message state started and stopped. Since being in a state has no duration, nm.Stopped
represents the time when shouldIdle executes. That's part of the solution, so we can start typing:
let shouldIdle (nm : NoMessageData) : bool = nm.Stopped
This doesn't yet compile, because nm.Stopped
is a DateTimeOffset value, but the function is declared as returning bool.
If we imagine that we add the idle duration to the current time, it should gives us the time it'd be when idling is done. That time should be less than the time the Polling Consumer must exit:
let shouldIdle (nm : NoMessageData) : bool = nm.Stopped + idleDuration < stopBefore
This still doesn't compile because idleDuration
and stopBefore
are undefined, but this is easy to fix: promote them to arguments:
let shouldIdle idleDuration stopBefore (nm : NoMessageData) : bool = nm.Stopped + idleDuration < stopBefore
If you paid attention to the previous article, you'll notice that this is exactly the same technique I used for the transitionFromNoMessage function. Apparently, we're not done with outside-in development yet.
Type inference #
The function now compiles, and has the type TimeSpan -> DateTimeOffset -> NoMessageData -> bool
.
Once again, I've used the trick of promoting an undefined value to a function argument, and let type inference take care of the rest. This also works here. Since nm.Stopped
is a DateTimeOffset value, and we're adding something to it with the +
operator, idleDuration
has to be of a type that supports adding to DateTimeOffset. The only thing you can add to DateTimeOffset is a TimeSpan, so idleDuration
is inferred to be a TimeSpan value.
When you add a TimeSpan value to a DateTimeOffset value, you get another DateTimeOffset value back, so the type of the expression nm.Stopped + idleDuration
is DateTimeOffset. The entire return expression compares that DateTimeOffset value with the <
operator, which requires that both the left-hand and the right-hand expressions have the same type. Ergo must stopBefore
also be a DateTimeOffset value.
While we set out to implement a function with the type NoMessageData -> bool
, we eventually created a function with the type TimeSpan -> DateTimeOffset -> NoMessageData -> bool
, which isn't quite what we need.
Partial application #
The extra arguments can be removed again with partial function application. When the Polling Consumer application starts, it can easily calculate when it ought to stop:
let now' = DateTimeOffset.Now let stopBefore' = now' + TimeSpan.FromMinutes 1.
This assumes that the Polling Consumer should run for a maximum of 1 minute.
Likewise, we can create an idle duration value:
let idleDuration' = TimeSpan.FromSeconds 5.
Here, the value is hard-coded, but it could have gone in a configuration file instead, or be passed in as a command-line argument.
Given these values, we can now partially apply the function:
let shouldIdle' = shouldIdle idleDuration' stopBefore'
Since we're not supplying the third NoMessageData argument for the function, the return value of this partial application is a new function with the type NoMessageData -> bool
- exactly what we need.
Summary #
In this article, you saw how to approach the implementation of one of the functions identified with the outside-in Type Driven Development technique. If you want to see the other three functions implemented, a much more detailed discussion of the technique, as well as the entire code base with commit messages, you can watch my Type-Driven Development with F# Pluralsight course.
Comments
Philippe, thank you for writing. That's a great observation, and one that I must admit that I hadn't considered myself!
At least, in this case encapsulation is still intact because pre- and post-conditions are preserved. You can't leave the system in an incorrect state.
The reason I described the option using Scored<T> before the pessimistic locking alternative is that I like the first option best. Among other benefits, it doesn't suffer from temporal coupling.
Hi Mark, those are all nice solutions!
I think there are also other options, for example sending a "Excercice hint viewed" notification which could then be handled by a subscriber calling a command.
But this is at the cost of some indirection, so it's nice to have other choices.
Loïc, thank you for writing. I'm sure there are other alternatives than the ones I've outlined. The purpose of the article wasn't to provide an exhaustive list of options, but rather explain that it is possible to adhere to the CQS, even though sometimes it seems difficult.
Specifically, are you suggesting to send a notification when the Query is made? Isn't that a side-effect?
There are some alternatives way in which I would consider handling this if I'm being honest. We always want to retrieve the hint. We singularly want to reduce the person's score by 5 points if they have not seen this hint before. This depreciation in points is idempotent and should only be executed if the hint hasn't been viewed before. Contextual information associated to the returned hint, such as last time viewed by current user, would inform the triggering of the command.
I think this is OK, because we care whether a user has viewed a hint. A hint having been viewed by a user means something, so returning it from the query feels valid. Acting up on this accordingly also feels valid, but the command itself becomes nicely idempotent as it understand the single-hit decrease in the points.