ploeh blog https://blog.ploeh.dk danish software design en-us Mark Seemann Mon, 23 Feb 2026 13:20:48 UTC Mon, 23 Feb 2026 13:20:48 UTC TDD as induction https://blog.ploeh.dk/2026/02/23/tdd-as-induction/ Mon, 23 Feb 2026 13:20:00 UTC <div id="post"> <p> <em>A metaphor.</em> </p> <p> In the mid 2010s I was working with a Danish software development organisation, effectively acting as a lead developer. Because of a shortage of salaried employees, we needed to hire freelancers, and after I had exhausted my local network, I turned to international contacts. One (excellent) addition to the team was <a href="https://mikehadlow.com/">Mike Hadlow</a>, who worked out of England. </p> <p> On his first day, we had him clone the repository and run the tests. About five minutes later, we received a message from him (paraphrasing from memory): "Guys, I have three failing tests. Is this expected?" </p> <p> No, we didn't expect that. The team had used test-driven development (TDD) for the code. It had hundreds of tests, all of them deterministic. Or so we thought. </p> <p> It didn't take long to figure out that three tests failed on Mike's computer because it, naturally, was configured with the UK English locale, whereas so far, everyone had been running with the Danish locale. In Danish, like many other languages, comma is the <a href="https://en.wikipedia.org/wiki/Decimal_separator">decimal separator</a> and period the thousands separator. As readers of this article will know, in English, it's the other way around. </p> <p> The three tests failed because they expected Danish formatting rules to be in effect. </p> <p> I don't remember the specifics, but once we had identified the root cause, fixing it was easy. Be more explicit in the arrange phase, or be less explicit in the assertion phase. </p> <p> The lesson was that even tests written with TDD make implicit assumptions about the environment. </p> <h3 id="43ae8708afdc4c3696978bd0f262f947"> Horizontal scaling <a href="#43ae8708afdc4c3696978bd0f262f947">#</a> </h3> <p> A decade earlier, a colleague taught me that the most difficult scale-out was going from one to two. This was in the early <a href="https://en.wikipedia.org/wiki/2000s">noughties</a>, and the challenge of the day was scaling out servers. Already back then, we were running into the problem of stagnating CPU clock speed improvements. For decades, computers had become faster each year, so if you had performance issues, often you could wait a year or two and buy a faster machine. </p> <p> In the early 2000s, this stopped being the rule, and chip manufacturers instead started to add more processors to a single chip. This solved some problems, but not all. Another attempt to address performance problems was to scale out instead of up. Instead of buying a faster, more expensive computer, you'd buy another computer like the one you already had, and somehow distribute the workload. If you could make that work, that made better economic sense than buying more expensive equipment only to decommission the old machine. </p> <p> The problem, however, was that at the time, most software was designed with the implicit assumption that it would run on one machine only. Not client software, perhaps, but certainly database servers, and often application servers, too. Going from one to two machines was not a trivial undertaking. </p> <p> On the other hand, once you had done the hard work of enabling, say, a web site to run on two servers, it would typically be trivial to make it run on three, or four. </p> <h3 id="e508a8a372d94d949d4d22fa9a508215"> Two as many <a href="#e508a8a372d94d949d4d22fa9a508215">#</a> </h3> <p> The notion that the most difficult scale-out is going from one to two made such a deep impression that it's been with me ever since. It seems to generalise to other fields, too. That going from the singular to the plural is where you find most barriers. Once you've enabled having two of something, then the actual number seems to be of lesser importance. </p> <p> It took me a long time to come to terms with the notion that the number two is only a 'representation' for any plural number. One reason, I think, is that my thinking may have been tainted by an innocuous phrase that my mother often uttered: "En, to, mange" or, in English, <em>one, two, many</em>. </p> <p> As any 'real' software tester will tell you, it's actually nought, one, many. It took me many years of test-driven development (TDD) to finally accept that when testing for plurality, it was often good enough to test with collections of two values. In my early TDD years, I would often insist on adding a test case for the 'three' case, but over the years I learned that this extra step didn't enable me to move forward. In the parlance of the <a href="https://blog.cleancoder.com/uncle-bob/2013/05/27/TheTransformationPriorityPremise.html">transformation priority premise</a>, adding such a test case lead to no transformation. </p> <p> Once I, grudgingly, accepted that <em>two is many</em>, I started noticing other patterns and connections. </p> <h3 id="cddd9fe6d7eb4dbf82f5bd47c9faf48e"> TDD and inductive reasoning <a href="#cddd9fe6d7eb4dbf82f5bd47c9faf48e">#</a> </h3> <p> Much has already been said about TDD, particularly example-driven development, as a sort of <a href="https://en.wikipedia.org/wiki/Inductive_reasoning">inductive reasoning</a>. You start with one example, and implement the simplest thing that could possibly work. You add another example, and the System Under Test becomes slightly more sophisticated. After <a href="/2018/11/12/what-to-test-and-not-to-test">enough</a> iterations, you have a working solution. </p> <p> This looks like inductive reasoning, in that you are generalising from the specific to the general. </p> <p> Such an analogy calls for criticism, because inductive reasoning in general suffers from <a href="https://en.wikipedia.org/wiki/Problem_of_induction">fundamental epistemological problems</a>. How do we know that we can safely generalise from finite examples? </p> <p> We can, because TDD is not a process of uncovering some natural law. The problem of induction, typically, is that in natural science, researchers attempt to uncover underlying relationships; cause and effect. Their area of study, however, is the result of natural processes. Or, if a researcher studies economics, perhaps a result of complex social interactions. In scientific settings, the object of study is <em>not</em> man-made, and you can't ask anyone for the correct answer. </p> <p> With TDD, the situation is different. You <em>can</em> consult the source code. In fact, if TDD is done right and you made no mistakes, the System Under Test (SUT) should be the generalisation of all the examples. </p> <p> Of course, to err is human, so you could have made mistakes, but with TDD we are on much more solid ground than is usually the case in epistemology. </p> <p> This seems to suggest that TDD has more in common with <a href="https://en.wikipedia.org/wiki/Formal_science">formal science</a> than with <a href="https://en.wikipedia.org/wiki/Natural_science">natural</a> or <a href="https://en.wikipedia.org/wiki/Social_science">social science</a>. </p> <h3 id="bea84041894a49dbb25730d576b23079"> Tests as statements <a href="#bea84041894a49dbb25730d576b23079">#</a> </h3> <p> Consider a test following the <a href="https://xp123.com/3a-arrange-act-assert/">Arrange Act Assert</a> pattern. As the last step indicates, a test is an assertion. It's a claim that if things are arranged just so, and a particular action is taken, posterior state will have certain verifiable properties. We might consider such a construction a formal statement. Formal, in the sense that it's expressed in a formal (programming) language, and a statement because its truth value is either true (i.e. <em>passed</em>) or false (i.e. either <em>failed</em>, crashed, or hanging). </p> <p> Excluding property-based testing from the discussion, a test is still an example. We shouldn't infer a system's general behaviour from a single example, but when viewed collectively, we may, as discussed above, engage in inductive reasoning. For the rest of this article, however, that is not what I have in mind. Rather, I want to talk about an independent kind of generalisation; a different dimension, if you will. </p> <p> <img src="/content/binary/behaviour-adaptability-coordinate-system.png" alt="Coordinate system with behaviour along the x-axis and adaptability along the y-axis." width="400"> </p> <p> So far, I have discussed how we may infer a system's behaviour from examples. The more examples you provide, the more you trust the induction. </p> <p> In the rest of this article, I will discuss how replicating a test to multiple environments tend to demonstrate increased adaptability. In this light, a single test is a statement about one single example, but the statement is now assumed to be universal. It should hold in all circumstances described by it. </p> <p> What does that mean? </p> <h3 id="f9862fac1636481f823587149d52ef0a"> Tests are the first clients <a href="#f9862fac1636481f823587149d52ef0a">#</a> </h3> <p> As I wrote a long time ago, in <a href="/2011/11/10/TDDimprovesreusability">an otherwise too confrontational article</a>, unit tests are the first clients of the SUT's APIs. Only once tests pass do you put the SUT to use in its intended context. The function/class/module/component that you test-drove now becomes part of the overall solution. The View Model correctly helps render the user interface. The Domain Model makes the right decision. A security component correctly rejects unauthorised users. </p> <p> When you integrate a test-driven unit in a larger system, any test (even a manual test) of that system is a secondary test. Often, you simply verify that the composition of smaller elements work as intended. Occasionally, an integration test reveals that the unit doesn't work in the new context. </p> <p> This is expected. It's the reason integration testing is important. </p> <p> When unit tests succeed, but integration tests fail, the reason is usually that the unit tests are too parochial. Integration test failures reveal that the unit has to handle situations that you hadn't thought of. Sometimes, the problem is that input is more varied than you initially thought. Other times, like the above story about Danish and UK locales, it turns out that the test made implicit assumptions that ought to be explicit. </p> <p> While this error-discovery process is normal, in my experience, once you've addressed bugs that only manifest in a new context, additional contexts tend to unearth few new problems. You find most defects in the first context, which is the automated test environment. You find a few more test once you move the code to a new execution context. After that, however, error discovery tends to dry out. </p> <p> <img src="/content/binary/execution-context-vs-errors-chart.png" alt="Bar chart showing execution context on the horizontal axis and errors found on the vertical axis. The execution context labelled '1' has the highest bar; the bar labelled '2' has a bar only a tenth the size, and the labels '3', '4', and '5' has no bars." width="350"> </p> <p> In my desire to make a point, I'm deliberately simplifying things. It is not, however, my intention to mislead anyone. In reality, you do sometimes find new errors in the third or fourth context. Some errors, as everyone knows, only manifest in production, and only in certain mysterious circumstances. In other words, the above chart is deceptive in the sense that it seems to claim that the third, fourth, etc. contexts reveal no additional bugs. This is not the case. </p> <p> That said, in my experience the relationship is clearly non-linear, and for a long time, I wondered about that. </p> <h3 id="60ee11b7ed99492ba8a6373be092fcd8"> Mathematical induction <a href="#60ee11b7ed99492ba8a6373be092fcd8">#</a> </h3> <p> Although the following is, at best, an imperfect metaphor, this reminds me of <a href="https://en.wikipedia.org/wiki/Mathematical_induction">mathematical induction</a>. You start with the statement that a particular example (implemented as a test) works in a single environment (typically a developer machine). Call this statement <em>P(1)</em>. </p> <p> <img src="/content/binary/single-dev-box-exercising-system.png" alt="A box labelled 'Dev box' with an arrow pointing to a box labelled 'System'." width="400"> </p> <p> Already when you synchronise your code with coworkers' code, the example or use case now executes on multiple other machines; <em>P(2), P(3)</em>, etc. </p> <p> <img src="/content/binary/multiple-dev-boxes-exercising-system.png" alt="Multiple, overlapping boxes labelled 'Dev boxes' with an arrow pointing to a box labelled 'System'." width="400"> </p> <p> As the initial anecdote about locale-dependent tests shows, you may already find a problem here. In many cases, however, the development machines are sufficiently identical that any single test is effectively running in the same context. In this sense, you may still be establishing that the first statement, <em>P(1)</em>, holds. </p> <p> If so, you may discover problems in execution contexts that differ from developer machines to a larger degree. </p> <p> <img src="/content/binary/dev-boxes-ci-cd-and-production-exercising-system.png" alt="Boxes labelled respectively 'Dev boxes', 'CI/CD', and 'Production, each with an arrow pointing to a box labelled 'System'." width="400"> </p> <p> Sometimes with mathematical induction, you need to establish more than a single base case. You may, for example, first prove <em>P(1)</em> and <em>P(2)</em>. The induction step then assumes <em>P(n-2)</em> and <em>P(n-1)</em> in order to prove <em>P(n)</em>. </p> <p> Although the metaphor is flawed in more than one way, the non-linear relationship between environments and defect discovery reminds me of this kind of induction. Experience indicates that if an example works in the first and second context, it typically <a href="/2023/07/17/works-on-most-machines">works in new contexts</a>. </p> <h3 id="5fc0809d33ab424495e8ac4e904133f2"> Implicit assumptions <a href="#5fc0809d33ab424495e8ac4e904133f2">#</a> </h3> <p> This induction-like relationship sometimes falls apart, as the opening anecdote illustrates. Sometimes, as the anecdote example shows, the problem is not with the implementation, but with the test. In mathematics, it may turn out that a proof makes implicit assumptions, and that it doesn't hold as universally as first believed. An example is that <a href="https://en.wikipedia.org/wiki/Euler_characteristic">Euler believed that the characteristics of all polyhedra was constant</a>, but failed to take non-convex shapes into account. </p> <p> In the same way, tests may inadvertently assume that some property is universal. Later, you may discover that such an assumption, for example about locale, is not as universal as you thought. </p> <p> This explains why my <a href="/dippp">DIPPP</a> coauthor <a href="https://blogs.cuttingedge.it/steven/about/">Steven van Deursen</a> correctly insisted that <a href="/2019/01/21/some-thoughts-on-anti-patterns">Ambient Context should be classified as an anti-pattern</a>. Otherwise, it's too easy to forget essential pre-conditions, and thus make it easier to introduce bugs that only appear in certain contexts. </p> <p> This is one of many reasons I prefer <a href="https://www.haskell.org/">Haskell</a> over most other programming languages. Haskell APIs don't make implicit assumptions about execution context. Or, rather, they have deterministic behaviour according to 'standards' which are often English; e.g. a decimal number like <code>12.3</code> always renders as <code>"12.3"</code>, and never as <code>"12,3"</code>, as it would in German, Danish, etc. </p> <p> Even so, as <a href="http://conal.net/blog/posts/notions-of-purity-in-haskell">Conal Elliot complains</a>, <a href="https://hackage-content.haskell.org/package/base/docs/System-Info.html#v:os">some APIs</a> are not as deterministic as one might hope. </p> <p> The bottom line is that when writing tests, one has to carefully and explicitly state all relevant assumptions as part of the test. </p> <h3 id="4bc5500cbcaf4242a5b2764704e387de"> Conclusion <a href="#4bc5500cbcaf4242a5b2764704e387de">#</a> </h3> <p> As imperfect a metaphor as it is, I find comfort in comparing defect discovery using automated tests with induction. After decades of test-driven development, I've wondered if there's a deeper reason that <a href="/2023/07/17/works-on-most-machines">if test-driven code works on one machine, it tends to work on most machines</a>, and that the relationship seems to be distinctly non-linear. </p> <p> An automated test, if it properly describes all relevant context, is effectively a statement that a particular example always behaves the same. We may, then, choose to believe that if it works in one context, and we've seen it work in one additional, arbitrary context, it seems likely that it will work in most other contexts. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2026/02/23/tdd-as-induction Critiquing tests https://blog.ploeh.dk/2026/02/16/critiquing-tests/ Mon, 16 Feb 2026 13:10:00 UTC <div id="post"> <p> <em>Two attempts to measure the quality of automated test suites.</em> </p> <p> While test-driven development remains, in my view, the <a href="/2025/10/20/epistemology-of-software">most scientific</a> approach to software testing, I realize that it's still a minority practice. Furthermore, with the rise of AI, it's becoming increasingly common to <a href="/2026/01/26/ai-generated-tests-as-ceremony">let LLMs generate tests</a>. </p> <p> Being practical about it, we need to explore how to critique tests; how to measure or evaluate the quality of tests we never wrote, and that we never saw fail. </p> <p> I'm aware of two technical measurements, as well as handful of heuristics that we may apply, but I think we may need more. Thus, this overview is only preliminary. </p> <h3 id="1ee1a5c5c0e24ddcab4825bcbf8b82c9"> Code coverage <a href="#1ee1a5c5c0e24ddcab4825bcbf8b82c9">#</a> </h3> <p> The notion of <a href="https://martinfowler.com/bliki/TestCoverage.html">code coverage</a> has long, with good reason, been dismissed as 'not really helpful'. And indeed, <a href="/2015/11/16/code-coverage-is-a-useless-target-measure">code coverage is a useless target measure</a>, because it's too easy to game. </p> <p> Perhaps we should reevaluate that position, now that it looks as though tests will increasingly be written by <a href="https://en.wikipedia.org/wiki/Large_language_model">LLMs</a>. While human developers will game simple incentives, who knows what LLMs will do? In any case, as test generation becomes automated, we need no longer care that much about agents 'gaming' the system. </p> <p> After all, when people game an incentive system, the problem is two-fold. First, direct outcomes may have adverse effects. In the context of testing, tests written to attain a certain level of test coverage may be of poor quality, requiring too much maintenance. Second, there's <a href="https://en.wikipedia.org/wiki/Opportunity_cost">opportunity cost</a>. The time spent writing poor tests could, perhaps, be spent doing something more valuable. </p> <p> The first concern is still relevant when asking LLMs to generate tests, but the second concern may be of less importance. Assuming that LLM-generated tests are relatively inexpensive, the least we may ask of such tests is a high coverage ratio. </p> <p> This is not much of a quality measure, but rather a minimum bar. If you ask an LLM to generate tests, and all it can do is to achieve 30% coverage, that really isn't impressive. In the end, it's up to you to determine <a href="/2018/11/12/what-to-test-and-not-to-test">what to test and not to test</a>, but for LLM-generated tests, I would expect high coverage. </p> <p> After all, <a href="/2025/11/10/100-coverage-is-not-that-trivial">reaching 100% coverage is not that trivial</a>, so expecting high coverage means <em>something</em>. </p> <p> The next technique may also indirectly reveal problems with path coverage, but is less available. Most mainstream languages or programming platforms come with some kind of coverage tool, whereas mutation testing is rarer. </p> <h3 id="2581e18ec1bb4b749c4c3cb5a3a4bacd"> Mutation testing <a href="#2581e18ec1bb4b749c4c3cb5a3a4bacd">#</a> </h3> <p> <a href="https://en.wikipedia.org/wiki/Mutation_testing">Mutation testing</a> is the process of changing (mutating) particular code parts of the System Under Test and then run tests to see if any of them fail. If, for example, you can change a greater-than operator to a greater-than-or-equal operator, and no tests fail, this indicates that the tests don't cover an edge case. </p> <p> As I understand it, originally mutation testing mostly targeted <a href="https://en.wikipedia.org/wiki/Relational_operator">relational operators</a>, replacing <code>&gt;=</code> with <code>&gt;</code> or perhaps even <code>&lt;</code>, replacing <code>==</code> with <code>!=</code> and so on. <a href="/2025/04/10/characterising-song-recommendations">The last time</a> I used <a href="https://stryker-mutator.io/">Stryker</a> for C#, however, it went much further than that, by, for example, trying to remove filter expressions from query pipelines, and so on. </p> <p> Mutation testing overlaps code coverage in that it also identifies uncovered branches, but it can flush out additional problems. Even so, mutation testing is not always an option. The first problem is that, if you want to automate the process, the solution is language-specific. If, for example, you want to mutate <a href="https://en.wikipedia.org/wiki/Equivalence_relation">equality relations</a>, in most languages you'd look for the <code>==</code> operator. Even so, in C# you need to change that to <code>!=</code>, while in <a href="https://www.haskell.org/">Haskell</a> the opposite operator is <code>/=</code>. And in <a href="https://fsharp.org/">F#</a>, the operator to look for is <code>=</code>, to be replaced with <code>&lt;&gt;</code>. </p> <p> That said, you might think that you could write a simple search-and-replace script to get the job done, but consider that a character like <code>&lt;</code> may have multiple meanings in a code base. In C# and <a href="https://www.java.com">Java</a>, for example, <code>&lt;</code> and <code>&gt;</code> are also used to indicate generic type arguments, and in Haskell those characters are also used for compound operators such as <code>&gt;&gt;=</code>. </p> <p> A mutation-testing tool must know about the language it targets. To be on the safe side, it's probably best to at least have a parser so that you can manipulate <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">abstract syntax trees</a>. </p> <p> Then, for each mutation, the tool needs to run the test suite in question, keeping track of which mutations cause test failures, and which ones don't. I'm not saying that this is impossibly difficult, but it's also not entirely trivial. </p> <p> Another problem with mutation testing is that it takes time. Consider changing <em>every</em> relational operator in your code base. How many do you have? Thousands? Then consider how much time it takes to run the test suite. Now multiply those two numbers. </p> <p> And this is only for single mutations. If you want to also test combinations of mutations, the number is now exponential rather than linear. For most code bases, this is impractical. You can see how code coverage is a practical alternative. </p> <h3 id="532ac39edef14dbd9b79495a8c3d76f9"> Heuristics <a href="#532ac39edef14dbd9b79495a8c3d76f9">#</a> </h3> <p> In addition to code coverage and mutation testing, if I were given a unit test suite and had to evaluate its quality (but prevented from <a href="/2025/11/03/empirical-characterization-testing">treating each test as a Characterization Test</a>), I'd also consider the following. </p> <p> As a rule of thumb, tests should have a <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> of <em>1</em>. In many languages, you can get a report of cyclomatic complexity. If such a report finds tests with a cyclomatic complexity greater than <em>1</em>, this bears investigation. <a href="/2020/12/07/branching-tests">Unless it's a parametrised test</a>, it probably shouldn't contain loops or branching. </p> <p> Even simpler than cyclomatic complexity, you may consider something as basic as the size of each test. How many lines of code is it? What's the line width? Does it fit into <a href="/2019/11/04/the-80-24-rule">a reasonably-sized box</a>? </p> <p> Furthermore, measure the running time of the new tests. In itself, this doesn't tell you anything about correctness, but if some tests are suspiciously slow, this <em>could</em> be caused because a test is awaiting some other event, and suspending its thread while doing that. Such tests are not only slow, but may also be incorrect because using timeouts or similar for thread synchronization tends to be faulty in non-deterministic ways. </p> <p> While we are on the topic of non-determinism, try running the test multiple times, and make sure that the results are consistent over several runs. </p> <p> Finally, if you have the choice, favour tests written in the language with the most powerful type system. For example, if the System Under Test (SUT) is written in JavaScript, you can target it from tests written in a selection of languages. I'd rather see LLM-generated tests in <a href="https://www.typescriptlang.org/">TypeScript</a> than in JavaScript, because the TypeScript type checker can catch errors that may go unnoticed in JavaScript. I haven't kept up with that ecosystem, but perhaps <a href="https://www.purescript.org/">PureScript</a> is an even better choice than TypeScript. </p> <p> Likewise, if the SUT is a .NET application, I'd trust LLM-generated tests written in <a href="https://fsharp.org/">F#</a> over tests written in C#. </p> <p> Not all ecosystems give you such a choice, but if possible, favour tests written in a language with a powerful type checker. </p> <p> Additionally, run linters or static code analysis on the tests, and treat warnings as errors. And be sure to scan the code for pragmas that suppress warnings. </p> <p> There's quite a bit to look after. Perhaps a <a href="/ref/checklist-manifesto">checklist</a> would be helpful. </p> <h3 id="2e7c965daa4a4d2fadd7e832a30ca21d"> Conclusion <a href="#2e7c965daa4a4d2fadd7e832a30ca21d">#</a> </h3> <p> Using LLMs to generate tests will almost certainly become increasingly common. This raises the fundamental question: How do we know that the tests do what we want them to? </p> <p> While you could go systematically through each test and apply the process for <a href="/2025/11/03/empirical-characterization-testing">empirical Characterization Testing</a>, I doubt most people have the patience or discipline. As a next-best solution, we may look for ways to critique the tests, or rather, measure their quality. </p> <p> For the time being, I can think of two tools for this purpose: Code coverage and mutation testing. None are particularly reassuring, so this seems to me to be a field where more research and development would be beneficial. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2026/02/16/critiquing-tests Simplifying assertions with lenses https://blog.ploeh.dk/2026/02/09/simplifying-assertions-with-lenses/ Mon, 09 Feb 2026 13:28:00 UTC <div id="post"> <p> <em>Get ready for some cryptic infix operators.</em> </p> <p> In a <a href="/2025/12/22/test-specific-eq">previous article</a> I left you with a remaining problem: A test with an assertion weaker than warranted. In this article, you'll see a few tests like that, and how using lenses may improve the situation. </p> <h3 id="bd06a1fffee14b6db64d27d4fe0aa959"> Weak tests <a href="#bd06a1fffee14b6db64d27d4fe0aa959">#</a> </h3> <p> The previous article already showed an example of a test I wasn't fully happy with. For convenience, I'll repeat it here. </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Groom&nbsp;two&nbsp;finches&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;samaritan)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;cheater)&nbsp;(mkStdGen&nbsp;1) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.groom&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;=&nbsp;Just&nbsp;&lt;$&gt;&nbsp;Pair &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;finchEq&nbsp;$&nbsp;samaritan&nbsp;{&nbsp;Galapagos.finchHP&nbsp;=&nbsp;16&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,&nbsp;finchEq&nbsp;$&nbsp;cheater&nbsp;{&nbsp;Galapagos.finchHP&nbsp;=&nbsp;13&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;) &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;(cellFinchEq&nbsp;&lt;$&gt;&nbsp;Pair&nbsp;actual)&nbsp;@?=&nbsp;expected</pre> </p> <p> Another test exhibits the same problem, but since it's simpler, we'll start with that. </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Age&nbsp;finch&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;samaritan)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.age&nbsp;cell &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;=&nbsp;finchEq&nbsp;$&nbsp;samaritan&nbsp;{&nbsp;Galapagos.finchRoundsLeft&nbsp;=&nbsp;3&nbsp;} &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;cellFinchEq&nbsp;actual&nbsp;@?=&nbsp;Just&nbsp;expected</pre> </p> <p> As you read on, you'll see what makes those tests awkward, but in short, they only compare the <code>Finch</code> part of a cell, rather than comparing entire cells. The reason is that full comparisons make the tests more complicated, and less readable. </p> <h3 id="758686058a8247d79d1972203ac1e8de"> Replacing Pair with both <a href="#758686058a8247d79d1972203ac1e8de">#</a> </h3> <p> The problem is one that I rarely run into, because, as I outlined in the previous article (and many times before), if a test is difficult to write, I usually consider a simpler design. Because of <a href="https://www.haskell.org/">Haskell</a>'s awkward copy-and-update syntax, I tend to avoid nested record types. (This also applies to <a href="https://fsharp.org/">F#</a>.) Even so, it helps to know that when you run into nested records, lenses may be a proper response. </p> <p> Since I prefer to avoid nested data types, I don't use lenses much, but when I have to, I tend to use the <a href="https://hackage.haskell.org/package/lens">lens</a> package, only because I'm of the impression that it's comprehensive and current. </p> <p> Even so, I only rarely use it, so whenever I decide to pull it in, I need to get reacquainted with it. While I was spelunking the documentation, I came across the <a href="https://hackage-content.haskell.org/package/lens/docs/Control-Lens-Combinators.html#v:both">both</a> function, and realized that it solves essentially the same problem as <code>Pair</code> from the previous article. So, to get an easy start, I decided to replace <code>Pair</code> with <code>both</code>, before proceeding with my actual pursuit. </p> <p> The <code>"Groom two finches"</code> test then looked like this: </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Groom&nbsp;two&nbsp;finches&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;samaritan)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;cheater)&nbsp;(mkStdGen&nbsp;1) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.groom&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;Just&nbsp;$&nbsp;finchEq&nbsp;$&nbsp;samaritan&nbsp;{&nbsp;Galapagos.finchHP&nbsp;=&nbsp;16&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,&nbsp;Just&nbsp;$&nbsp;finchEq&nbsp;$&nbsp;cheater&nbsp;{&nbsp;Galapagos.finchHP&nbsp;=&nbsp;13&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;) &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;(actual&nbsp;&amp;&nbsp;both&nbsp;%~&nbsp;cellFinchEq)&nbsp;@?=&nbsp;expected</pre> </p> <p> Notice that <code>actual&nbsp;&amp;&nbsp;both&nbsp;%~&nbsp;cellFinchEq</code> replaces <code>cellFinchEq&nbsp;&lt;$&gt;&nbsp;Pair&nbsp;actual</code>. In isolation, this is hardly more readable, but on the other hand, I believe that <a href="/2015/08/03/idiomatic-or-idiosyncratic">people often mistake unfamiliarity with things being hard to understand</a>. If I imagine that all developers working with this code base are familiar with the lens library, <code>actual&nbsp;&amp;&nbsp;both&nbsp;%~&nbsp;cellFinchEq</code> may be perfectly legible. </p> <h3 id="936f0c369c614a52a3934904707b6476"> Strengthening assertions the hard way <a href="#936f0c369c614a52a3934904707b6476">#</a> </h3> <p> Consider the <code>"Age finch"</code> test. The <code>samaritan</code> <code>Finch</code> value has <code>finchRoundsLeft = 4</code>. After each round of the <a href="https://en.wikipedia.org/wiki/Cellular_automaton">cellular automaton</a>, the <code>age</code> function decreases the value by one. </p> <p> If I wanted to make that explicit, and also compare the actual <code>CellState</code> to the expected <code>CellState</code>, I could do it with standard Haskell language features, but the test starts to become awkward. </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Age&nbsp;finch&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;samaritan)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.age&nbsp;cell &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;=&nbsp;cellStateEq&nbsp;$&nbsp;cell &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;Galapagos.cellFinch&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(\f&nbsp;-&gt;&nbsp;f&nbsp;{&nbsp;Galapagos.finchRoundsLeft&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Galapagos.finchRoundsLeft&nbsp;f&nbsp;-&nbsp;1&nbsp;})&nbsp;&lt;$&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Galapagos.cellFinch&nbsp;cell &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;cellStateEq&nbsp;actual&nbsp;@?=&nbsp;expected</pre> </p> <p> This is clunky for a number of reasons: The <code>Galapagos.cellFinch</code> field returns the finch found in that cell, but since the cell may also be empty, the return value is a <code>Maybe Finch</code>. This means that any modification must be done with a projection; either <code>fmap</code> or, as shown here, <code>&lt;$&gt;</code>. Inside the lambda expression, I need to query <code>Galapagos.finchRoundsLeft</code> to get the current value, and then use copy-and-update syntax to bind the new value to <code>Galapagos.finchRoundsLeft</code>. And then this entire expression must be bound to <code>Galapagos.cellFinch</code> in order to update <code>cell</code>. </p> <p> To summarize, both <code>Galapagos.finchRoundsLeft</code> and <code>Galapagos.cellFinch</code> has to appear twice. </p> <p> The other test, <code>"Groom two finches"</code>, involves two cells, so that's just double the cumber. </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Groom&nbsp;two&nbsp;finches&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;samaritan)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;cheater)&nbsp;(mkStdGen&nbsp;1) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.groom&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;cell1&nbsp;{&nbsp;Galapagos.cellFinch&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(\f&nbsp;-&gt;&nbsp;f&nbsp;{&nbsp;Galapagos.finchHP&nbsp;=&nbsp;16&nbsp;})&nbsp;&lt;$&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Galapagos.cellFinch&nbsp;cell1&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,&nbsp;cell2&nbsp;{&nbsp;Galapagos.cellFinch&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(\f&nbsp;-&gt;&nbsp;f&nbsp;{&nbsp;Galapagos.finchHP&nbsp;=&nbsp;13&nbsp;})&nbsp;&lt;$&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Galapagos.cellFinch&nbsp;cell2&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;) &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;(actual&nbsp;&amp;&nbsp;both&nbsp;%~&nbsp;cellStateEq)&nbsp;@?=&nbsp;(expected&nbsp;&amp;&nbsp;both&nbsp;%~&nbsp;cellStateEq)</pre> </p> <p> This demonstrates why I originally took a shortcut. Even without trying it out in practice, I have enough experience with Haskell (and F#) to predict exactly this situation. Fortunately, there's a way out. </p> <h3 id="b19743aec3304f2eb007fa25e823d142"> Setting an inner value <a href="#b19743aec3304f2eb007fa25e823d142">#</a> </h3> <p> Not being well-versed in the lens library, I found it prudent to proceed in small steps. My next move was to update <code>finchRoundsLeft</code> in the above <code>"Age finch"</code> test. While I quickly found the <a href="https://hackage-content.haskell.org/package/lens/docs/Control-Lens-Operators.html#v:-45--126-">-~</a> operator, I then had to figure out how to define an <code>ASetter</code> for <code>finchRoundsLeft</code>. </p> <p> All documentation points to making use of <a href="https://hackage-content.haskell.org/package/lens/docs/Control-Lens-Combinators.html#v:makeLenses">makeLenses</a>, but that comes with requirements that I couldn't fulfil. I couldn't change the existing definition of <code>Finch</code>, so I couldn't name the fields according to the required naming convention. I tried to use <a href="https://hackage-content.haskell.org/package/lens/docs/Control-Lens-Combinators.html#v:makeLensesWith">makeLensesWith</a> from another module, but I couldn't make it work. It's possible that you can make it work if you know what you are doing, but I didn't. </p> <p> In the end, I just wrote an explicit setter function for <code>finchRoundsLeft</code>: </p> <p> <pre><span style="color:#2b91af;">setRoundsLeft</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Functor</span>&nbsp;f &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">=&gt;</span>&nbsp;(<span style="color:blue;">Galapagos</span>.<span style="color:blue;">Rounds</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;f&nbsp;<span style="color:blue;">Galapagos</span>.<span style="color:blue;">Rounds</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:blue;">Galapagos</span>.<span style="color:blue;">Finch</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;f&nbsp;<span style="color:blue;">Galapagos</span>.<span style="color:blue;">Finch</span> setRoundsLeft&nbsp;f&nbsp;x&nbsp;= &nbsp;&nbsp;(\r&nbsp;-&gt;&nbsp;x&nbsp;{&nbsp;Galapagos.finchRoundsLeft&nbsp;=&nbsp;r&nbsp;})&nbsp;&lt;$&gt; &nbsp;&nbsp;f&nbsp;(Galapagos.finchRoundsLeft&nbsp;x)</pre> </p> <p> This enabled me to rewrite the <code>"Age finch"</code> test to this: </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Age&nbsp;finch&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;samaritan)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.age&nbsp;cell &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;=&nbsp;cellStateEq&nbsp;$&nbsp;cell &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;Galapagos.cellFinch&nbsp;=&nbsp;(setRoundsLeft&nbsp;-~&nbsp;1)&nbsp;&lt;$&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Galapagos.cellFinch&nbsp;cell &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;cellStateEq&nbsp;actual&nbsp;@?=&nbsp;expected</pre> </p> <p> Granted, it's not much of an improvement, but it gave me an idea of how to proceed. </p> <h3 id="c4fb295fe4e14813bcfe2f30f8d71465"> Composing setters <a href="#c4fb295fe4e14813bcfe2f30f8d71465">#</a> </h3> <p> Not only did I need a setter for <code>finchRoundsLeft</code>, I also needed one for <code>cellFinch</code>. Again, not being able to identify a way to do this in an easier way, I wrote another explicit setter for that purpose: </p> <p> <pre><span style="color:#2b91af;">setFinch</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Functor</span>&nbsp;f &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">=&gt;</span>&nbsp;(<span style="color:#2b91af;">Maybe</span>&nbsp;<span style="color:blue;">Galapagos</span>.<span style="color:blue;">Finch</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;f&nbsp;(<span style="color:#2b91af;">Maybe</span>&nbsp;<span style="color:blue;">Galapagos</span>.<span style="color:blue;">Finch</span>)) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:blue;">Galapagos</span>.<span style="color:blue;">CellState</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;f&nbsp;<span style="color:blue;">Galapagos</span>.<span style="color:blue;">CellState</span> setFinch&nbsp;f&nbsp;x&nbsp;= &nbsp;&nbsp;(\finch&nbsp;-&gt;&nbsp;x&nbsp;{&nbsp;Galapagos.cellFinch&nbsp;=&nbsp;finch&nbsp;})&nbsp;&lt;$&gt;&nbsp;f&nbsp;(Galapagos.cellFinch&nbsp;x)</pre> </p> <p> Armed with that I could finally rewrite <code>"Age finch"</code> to something nice. </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Age&nbsp;finch&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;samaritan)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.age&nbsp;cell &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cellStateEq&nbsp;$&nbsp;cell&nbsp;&amp;&nbsp;setFinch&nbsp;.&nbsp;_Just&nbsp;.&nbsp;setRoundsLeft&nbsp;-~&nbsp;1 &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;cellStateEq&nbsp;actual&nbsp;@?=&nbsp;expected</pre> </p> <p> Likewise, with the addition of <code>setHP</code>, I could also rewrite <code>"Groom two finches"</code>: </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Groom&nbsp;two&nbsp;finches&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;samaritan)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;cheater)&nbsp;(mkStdGen&nbsp;1) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.groom&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;cell1&nbsp;&amp;&nbsp;setFinch&nbsp;.&nbsp;_Just&nbsp;.&nbsp;setHP&nbsp;.~&nbsp;16 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,&nbsp;cell2&nbsp;&amp;&nbsp;setFinch&nbsp;.&nbsp;_Just&nbsp;.&nbsp;setHP&nbsp;.~&nbsp;13 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;) &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;(actual&nbsp;&amp;&nbsp;both&nbsp;%~&nbsp;cellStateEq)&nbsp;@?=&nbsp;(expected&nbsp;&amp;&nbsp;both&nbsp;%~&nbsp;cellStateEq)</pre> </p> <p> That's not too bad, if I may say so. </p> <h3 id="db1e27a7e16f421297bb6103167a19cd"> Combinator golf <a href="#db1e27a7e16f421297bb6103167a19cd">#</a> </h3> <p> Sometimes I get carried away. It's really nothing to worry about, but only to play with options in order to learn, I decided to address the duplication in the above assertion. Notice that is goes <code>&amp;&nbsp;both&nbsp;%~&nbsp;cellStateEq</code> twice. That's not something that should bother me, and in any case, if you apply the <a href="https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)">rule of three</a>, it's too early to refactor. </p> <p> Even so, I wanted that little bit of extra exercise, so I pulled in <a href="https://hackage.haskell.org/package/base/docs/Data-Function.html#v:on">on</a> and rewrote the assertion. All the other code is identical to the previous listing. </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Groom&nbsp;two&nbsp;finches&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;samaritan)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;cheater)&nbsp;(mkStdGen&nbsp;1) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.groom&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;cell1&nbsp;&amp;&nbsp;setFinch&nbsp;.&nbsp;_Just&nbsp;.&nbsp;setHP&nbsp;.~&nbsp;16 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,&nbsp;cell2&nbsp;&amp;&nbsp;setFinch&nbsp;.&nbsp;_Just&nbsp;.&nbsp;setHP&nbsp;.~&nbsp;13 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;) &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;(<span style="color:#2b91af;">(@?=)</span>&nbsp;`on`&nbsp;(both&nbsp;%~&nbsp;cellStateEq))&nbsp;actual&nbsp;expected</pre> </p> <p> To be clear, I do, myself, consider this last edit frivolous. I wouldn't recommend it, and wouldn't use it in a code base shared with other people, but I still find it enjoyable. </p> <h3 id="db2a8bac20c74e6da40a00c0f94cd13e"> Conclusion <a href="#db2a8bac20c74e6da40a00c0f94cd13e">#</a> </h3> <p> Nested data structures present problems in functional programming, particularly in Haskell, where the record syntax leaves something to be desired. Updating a value nested inside another value is, with plain vanilla code, awkward. </p> <p> This kind of situation is the main use case for lenses. In this article, you saw how I refactored awkward tests with the <em>lens</em> package. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2026/02/09/simplifying-assertions-with-lenses Code that fits in a context window https://blog.ploeh.dk/2026/02/02/code-that-fits-in-a-context-window/ Mon, 02 Feb 2026 12:17:00 UTC <div id="post"> <p> <em>AI-friendly code?</em> </p> <p> On what's left of software-development social media, I see people complaining that as the size of a software system grows, <a href="https://en.wikipedia.org/wiki/Large_language_model">large language models</a> (LLMs) have an increasingly hard time advancing the system without breaking something else. Some people speculate that the <a href="https://en.wikipedia.org/wiki/Context_window">context windows</a> size limit may have something to do with this. </p> <p> As a code base grows, an LLM may be unable to fit all of it, as well as the surrounding discussion, into the context window. Or so I gather from what I read. </p> <p> This doesn't seem too different from limitations of the human brain. To be more precise, a brain is not a computer, and while they share similarities, there are also significant differences. </p> <p> Even so, a major hypothesis of mine is that what makes programming difficult for humans is that <a href="https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two">our short-term memory is shockingly limited</a>. Based on that notion, a few years ago I wrote <a href="/2021/06/14/new-book-code-that-fits-in-your-head">a book called Code That Fits in Your Head</a>. </p> <p> In the book, I describe a broad set of heuristics and practices for working with code, based on the hypothesis that working memory is limited. One of the most important ideas is the notion of Fractal Architecture. Regardless of the abstraction level, the code is composed of only a few parts. As you look at one part, however, you find that it's made from a few smaller parts, and so on. </p> <p> <img src="/content/binary/decayed-rusty-lace.svg" alt="A so-called 'hex-flower', rendered with aesthetics in mind." width="400"> </p> <p> I wonder if those notions wouldn't be useful for LLMs, too. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2026/02/02/code-that-fits-in-a-context-window AI-generated tests as ceremony https://blog.ploeh.dk/2026/01/26/ai-generated-tests-as-ceremony/ Mon, 26 Jan 2026 09:05:00 UTC <div id="post"> <p> <em>On epistemological soundness of using LLMs to generate automated tests.</em> </p> <p> For decades, software development <a href="https://x.com/hillelogram/status/1445435617047990273">thought leaders</a> have tried to convince the industry that test-driven development (TDD) should be the norm. <a href="/2025/10/20/epistemology-of-software">I think so too</a>. Even so, the majority of developers don't use TDD. If they write tests, they add them after having written production code. </p> <p> With the rise of <a href="https://en.wikipedia.org/wiki/Large_language_model">large language models</a> (LLMs, so-called AI) many developers see new opportunities: Let LLMs write the tests. </p> <p> Is this a good idea? </p> <p> After having thought about this for some time, I've come to the interim conclusion that it seems to be missing the point. It's tests as ceremony, rather than <a href="/2025/10/20/epistemology-of-software">tests as an application of the scientific method</a>. </p> <h3 id="1ff6fbf8c9e14618bc1a831b92ebbb66"> How do you know that LLM-generated code works? <a href="#1ff6fbf8c9e14618bc1a831b92ebbb66">#</a> </h3> <p> People who are enthusiastic about using LLMs for programming often emphasise the the amount of code they can produce. <a href="/2025/09/22/its-striking-so-quickly-the-industry-forgets-that-lines-of-code-isnt-a-measure-of-productivity">It's striking so quickly the industry forgets that lines of code isn't a measure of productivity</a>. We already had trouble with the amount of code that existed back when humans wrote it. Why do we think that accelerating this process is going to be an improvement? </p> <p> When people wax lyrical about all the code that LLMs generated, I usually ask: <em>How do you know that it works?</em> To which the most common answer seems to be: I looked at the code, and it's fine. </p> <p> This is where the discussion becomes difficult, because it's hard to respond to this claim without risking offending people. For what it's worth, I've personally looked at much code and deemed it correct, only to later discover that it contained defects. How do people think that bugs make it past code review and into production? </p> <p> It's as if some variant of <a href="https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect">Gell-Mann amnesia</a> is at work. Whenever a bug makes it into production, you acknowledge that it 'slipped past' vigilant efforts of quality assurance, but as soon as you've fixed the problem, you go back to believing that code-reading can prevent defects. </p> <p> To be clear, I'm a big proponent of code reviews. To <a href="/2020/05/25/wheres-the-science">the degree that any science is done in this field</a>, research indicates that it's one of the better ways of catching bugs early. My own experience supports this to a degree, but an effective code review is a concentrated effort. It's not a cursory scan over dozens of code files, followed by LGTM. </p> <p> The world isn't black or white. There are stories of LLMs producing near-ready forms-over-data applications. Granted, this type of code is often repetitive, but uncomplicated. It's conceivable that if the code looks reasonable and smoke tests indicate that the application works, it most likely does. Furthermore, not all software is born equal. In <a href="/2018/11/12/what-to-test-and-not-to-test">some systems, errors are catastrophic, whereas in others, they're merely inconveniences</a>. </p> <p> There's little doubt that LLM-generated software is part of our future. This, in itself, may or may not be fine. We still need, however, to figure out how that impacts development processes. What does it mean, for example, related to software testing? </p> <h3 id="f4fc01e761264964bf73e5f4001e489c"> Using LLMs to generate tests <a href="#f4fc01e761264964bf73e5f4001e489c">#</a> </h3> <p> Since automated tests, such as unit tests, are written in a programming language, the practice of automated testing has always been burdened with the obvious question: If we write code to test code, how do we know that the test code works? <a href="http://en.wikipedia.org/wiki/Quis_custodiet_ipsos_custodes%3F">Who watches the watchmen?</a> Is it going to be <a href="http://en.wikipedia.org/wiki/Turtles_all_the_way_down">turtles all the way down</a>? </p> <p> The answer, as argued in <a href="/2025/10/20/epistemology-of-software">Epistemology of software</a>, is that seeing a test fail is an example of the scientific method. It corroborates the (often unstated, implied) hypothesis that a new test, of a feature not yet implemented, should fail, thereby demonstrating the need for adding code to the System Under Test (SUT). This doesn't <em>prove</em> that the test is correct, but increases our rational belief that it is. </p> <p> When using LLMs to generate tests for existing code, you skip this step. How do you know, then, that the generated test code is correct? That all tests pass is hardly a useful criterion. Looking at the test code may catch obvious errors, but again: Those people who already view automated tests as a chore to be done with aren't likely to perform a thorough code reading. And even a proper review may fail to unearth problems, such as <a href="/2019/10/14/tautological-assertion">tautological assertions</a>. </p> <p> Rather, using LLMs to generate tests may lull you into a false sense of security. After all, now you have tests. </p> <p> What is missing from this process is an understanding of why tests work in the first place. Tests work best when you have seen them fail. </p> <h3 id="a78b57c393f941a9a879e7a19ccf61cc"> Toward epistemological soundness <a href="#a78b57c393f941a9a879e7a19ccf61cc">#</a> </h3> <p> Is there a way to take advantage of LLMs when writing tests? This is clearly a field where we have yet to discover better practices. Until then, here are a few ideas. </p> <p> When writing tests after production code, you can still apply <a href="/2025/11/03/empirical-characterization-testing">empirical Characterization Testing</a>. In this process, you deliberately temporarily sabotage the SUT to see a test fail, and then revert that change. When using LLM-generated tests, you can still do this. </p> <p> Obviously, this requires more work, and takes more time, than 'just' asking an LLM to generate tests, run them, and check them in, but it would put you on epistemologically safer ground. </p> <p> Another option is to ask LLMs to follow TDD. On what's left of technical social media, I see occasional noises indicating that people are doing this. Again, however, I think the devil is in the details. What is the actual process when asking an LLM to follow TDD? </p> <p> Do you ask the LLM to write a test, then review the test, run it, and see it fail? Then stage the code changes? Then ask the LLM to pass the test? Then verify that the LLM <em>did not</em> change the test while passing it? Review the additional code change? Commit and repeat? If so, this sounds epistemologically sound. </p> <p> If, on the other hand, you let it go in a fast loop where the only observations your human brain can keep up with is that test status oscillates between red and green, then you're back to where we started: This is essentially ex-post tests with extra ceremony. </p> <h3 id="22986a515b8c4deba31dcc59501465c1"> Cargo-cult testing <a href="#22986a515b8c4deba31dcc59501465c1">#</a> </h3> <p> These days, most programmers have heard about <a href="https://en.wikipedia.org/wiki/Cargo_cult_programming">cargo-cult programming</a>, where coders perform ceremonies hoping for favourable outcomes, confusing cause and effect. </p> <p> Having LLMs write unit tests strikes me as a process with little epistemological content. Imagine, for the sake of argument, that the LLM never produces code in a high-level programming language. Instead, it goes straight to machine code. Assuming that you don't read machine code, how much would you trust the generated system? Would you trust it more if you asked the LLM to write tests? What does a test program even indicate? You may be given a program that ostensibly tests the system, but how do you know that it isn't a simulation? A program that only looks as though it runs tests, but is, in fact, unrelated to the actual system? </p> <p> You may find that a contrived thought experiment, but this is effectively the definition of <a href="https://en.wikipedia.org/wiki/Vibe_coding">vibe coding</a>. You don't inspect the generated code, so the language becomes functionally irrelevant. </p> <p> Without human engagement, tests strike me as mere ceremony. </p> <h3 id="a30e7891e7494761ab593f851cb5dd81"> Ways forward <a href="#a30e7891e7494761ab593f851cb5dd81">#</a> </h3> <p> It would be naive of me to believe that programmers stop using LLMs to generate code, including unit tests. Are there techniques we can apply to put software development back on more solid footing? </p> <p> As always when new technology enters the picture, we've yet to discover efficient practices. Meanwhile, we may attempt to apply the knowledge and experience we have from the old ways of doing things. </p> <p> I've already outlined a few technique to keep you on good epistemological footing, but I surmise that people who already find writing tests a chore aren't going to take the time to systematically apply the techniques for empirical Characterization Testing. </p> <p> Another option is to turn the tables. Instead of writing production code and asking LLMs to write tests, why not write tests, and ask LLMs to implement the SUT? This would entail a mostly <a href="/2025/09/15/greyscale-box-test-driven-development">black-box approach to TDD</a>, but still seems scientific to me. </p> <p> For some reason I've never understood, however, most people dislike writing tests, so this is probably unrealistic, too. As a supplement, then, we should explore <a href="/2026/02/16/critiquing-tests">ways to critique tests</a>. </p> <h3 id="ae3876d0b61846dcb5b702ed38c49a69"> Conclusion <a href="#ae3876d0b61846dcb5b702ed38c49a69">#</a> </h3> <p> It may seem alluring to let LLMs relieve you of the burden it is to write automated tests. If, however, you don't engage with the tests it generates, you can't tell what guarantees they give. If so, what benefits do the tests provide? Do automated testing become mere ceremony, intended to give you a nice warm feeling with little real protection? </p> <p> I think that there are ways around this problem, some of which are already in view, but some of which we have probably yet to discover. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2026/01/26/ai-generated-tests-as-ceremony Filtering as domain logic https://blog.ploeh.dk/2026/01/19/filtering-as-domain-logic/ Mon, 19 Jan 2026 21:03:00 UTC <div id="post"> <p> <em>Performance and correctness are two independent concerns with overlapping solutions.</em> </p> <p> How do you design, implement, maintain, and test complex filter logic as part of out-of-process (e.g. database) queries? </p> <p> One option is to implement parts of the filtering logic twice: Once as an easily-testable in-memory implementation to ensure correctness, and another, possibly simpler, query using the query language (usually, <a href="https://en.wikipedia.org/wiki/SQL">SQL</a>) of the data source. </p> <p> Does this not imply duplication of effort? Yes, to a degree it does. Should you always do this? No, only when warranted. As usual, I present this idea as an option you may consider; a tool for your software design tool belt. You decide if it's useful in your particular context. </p> <h3 id="35c355ddcb294fe8b1969866c0ccbfc0"> Motivation <a href="#35c355ddcb294fe8b1969866c0ccbfc0">#</a> </h3> <p> When extracting data from a data source, an application usually needs <em>some</em> of the data, but not all of it. If the software system in question has a certain size, the subset required for an operation is only a miniscule fraction of the entire database. For example, a user may want to see his or her latest order in a web shop, but the entire system contains millions of orders. Another example could be a system for managing help desk requests: Each supporter may need a dashboard of open cases assigned to him or her, but the system holds millions of tickets, and most of them are closed. </p> <p> If a data store supports server-side querying, for example with SQL or <a href="https://en.wikipedia.org/wiki/Cypher_(query_language)">Cypher</a>, it's reasonable to let the data store itself do the filtering. </p> <p> As anyone who has worked professionally with SQL can attest, SQL queries can become complicated. When this happens, you may become concerned with the correctness of a query. Does it include all the data it should? Does it exclude irrelevant data? If you later change a query, how can you verify that it still works as intended? How do you even version it? </p> <p> Automated testing can address several of these concerns, but testing against a real database, while possible, tends to be cumbersome and slow. Do alternatives, or augmentations, exist? </p> <h3 id="a19c1ec099474e6c8bbf932ec8a9db21"> How it works <a href="#a19c1ec099474e6c8bbf932ec8a9db21">#</a> </h3> <p> If a server-side query threatens to become too complicated, consider shifting some of the work to clients. You may retain some filtering logic in the server-side query, but only enough to keep performance good, and simple enough that you are no longer concerned about its correctness. </p> <p> Implement the difficult filtering logic in a client-side library. Since you implement this part in a programming language of your choice, you can use any tool or technique available in that context to <a href="/2025/10/20/epistemology-of-software">ensure correctness</a>: Test-driven development, static code analysis, type checking, property-based testing, <a href="/2025/11/10/100-coverage-is-not-that-trivial">code coverage</a>, mutation testing, etc. </p> <p> Using a funnel as a symbol of filtering, this diagram depicts the idea: </p> <p> <img src="/content/binary/double-funnel-architecture.png" alt="Two upside-down funnel connect the database with the application." width="150"> </p> <p> Normally a funnel is only useful when the widest part faces up, but on the other hand, we usually depict application architectures with the database under the the application. You have to imagine data being 'sucked up' through the funnels. </p> <p> In reality, the two filters will differ, but have overlapping functionality. </p> <p> <img src="/content/binary/server-and-client-side-filters.png" alt="Two sets labelled service-side filter and client-side filter, with substantial intersection." width="200"> </p> <p> If based on a relational database, the server-side query will still hold table joins and column projections that are effectively irrelevant to the client-side Domain Model. On the other hand, while the server-side query may apply a rough filter, the more detailed selection of what is, and is not, included happens in the client. </p> <p> The server-side query is defined using the query language of the data store, such as SQL or Cypher. The client-side query is part of the application code base, and written in the same programming language. </p> <h3 id="7deab4b58fc947a981c90692eb1c74c3"> When to use it <a href="#7deab4b58fc947a981c90692eb1c74c3">#</a> </h3> <p> Use this pattern if a server-side query becomes so complicated that you are concerned about its correctness, or if correctness is an essential part of a Domain Model's contract. </p> <p> While it is <a href="/2025/04/28/song-recommendations-as-an-impureim-sandwich">conceptually possible to load the entire data store's data into memory</a>, this is often prohibitively expensive in terms of time and memory. It is often necessary to retain some filtering logic (e.g. one or more SQL <code>WHERE</code> clauses) on the server to pare down data to acceptable sizes. This implies a degree of duplicated logic, since the client-side filter shouldn't assume that any filtering has been applied. </p> <p> Duplication comes with its own set of problems, even if this looks like <a href="/2026/01/05/coupling-from-a-big-o-perspective">the benign kind</a>. Alternatives include keeping all logic on the database server, which is viable if the logic is simple, or can be sufficiently simplified. Another alternative is to perform all filtering in the client, which may be an attractive solution if the entire data set is small. </p> <h3 id="cf56cc6f96e94c11a32e4183079dba28"> Encapsulation <a href="#cf56cc6f96e94c11a32e4183079dba28">#</a> </h3> <p> If a Domain Model is composed of <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>, data must be supplied as normal input arguments. In a more object-oriented style, data may arrive as <a href="http://xunitpatterns.com/indirect%20input.html">indirect input</a>. In both object-oriented and <a href="/2022/10/24/encapsulation-in-functional-programming">functional architecture, encapsulation is important</a>. This entails being explicit about invariants and pre- and postconditions; i.e. <em>contracts</em>. </p> <p> To enforce preconditions, a Domain Model must ensure that input is correct. While it could choose to reject input if it contains 'too much' data, a <a href="https://martinfowler.com/bliki/TolerantReader.html">Tolerant Reader</a> should instead pare the data down to size. This implies that filtering should be part of a Domain Model's contract. </p> <p> This further implies that a Domain Model becomes less vulnerable to changes in data access code. </p> <h3 id="8bf1dfdee19345ec83a378beb74ffe39"> Implementation details <a href="#8bf1dfdee19345ec83a378beb74ffe39">#</a> </h3> <p> Server-side filtering (with e.g. SQL) is often difficult to test with sufficient rigour. The point of moving the complex filtering logic to the Domain Model is that this makes it easier to test, and thereby to maintain. </p> <p> If no filtering takes place on the server, however, the entire data set of the system would have to be transmitted to, and filtered on, the client. This is usually too expensive, so some filtering must still take place at the data source. The whole point of this exercise is that the 'correct' filtering is too complicated to maintain as a server-side query, so whatever filtering still takes place on the server only happens for performance reasons, and can be simpler, as long as it's wider. </p> <p> Specifically, the simplified server-side query can (and probably should) be wider, in the sense that it returns <em>more</em> data than is required for the correctness of the overall system. The client, receiving more data than strictly required, can perform more sophisticated (and testable) filtering. </p> <p> The simplified filtering on the server must not, on the other hand, narrow the result set. If relevant data is left out at the source, the client has no chance to restore it, or even know that it exists. </p> <h3 id="03f340a6bd8d40999aaafe98dfb622fa"> Motivating example <a href="#03f340a6bd8d40999aaafe98dfb622fa">#</a> </h3> <p> The code base that accompanies <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> contains an example. When a user attempts to make a restaurant reservation, the system must look at existing reservations on the same date to check whether it has a free table. Many restaurants operate with seating windows, and the logic involved in figuring out if a time slot is free is easy to get wrong. On top of that, the decision logic needs to take opening hours and last seating into account. The book, as well as the article <a href="/2020/01/27/the-maitre-d-kata">The Maître d' kata</a>, has more details. </p> <p> Based on information about seating duration, opening hours, and so on, it seems as though it should be possible to form an exact SQL query that <em>only</em> returns existing reservations that overlap the new reservation. Even so, this struck me as error-prone. Instead, I decided to make input filtering part of the Domain Model. </p> <p> The Domain Model in question, an immutable class named <code>MaitreD</code>, uses the <code>WillAccept</code> method to decide whether to accept a reservation request. Apart from the <code>candidate</code> reservation, it also takes as parameters <code>existingReservations</code> as well as the current time. </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">bool</span>&nbsp;<span style="font-weight:bold;color:#74531f;">WillAccept</span>( &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">DateTime</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">now</span>, &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">IEnumerable</span>&lt;<span style="color:#2b91af;">Reservation</span>&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">existingReservations</span>, &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Reservation</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">candidate</span>)</pre> </p> <p> The function uses the <code>existingReservations</code> to filter so that only the relevant reservations are considered: </p> <p> <pre><span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">seating</span>&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">Seating</span>(SeatingDuration,&nbsp;<span style="font-weight:bold;color:#1f377f;">candidate</span>.At); <span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">relevantReservations</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">existingReservations</span>.<span style="font-weight:bold;color:#74531f;">Where</span>(<span style="font-weight:bold;color:#1f377f;">seating</span>.<span style="font-weight:bold;color:#74531f;">Overlaps</span>);</pre> </p> <p> As implied by this code snippet, a specialized Domain Model named <code>Seating</code> contains the actual filtering logic: </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">bool</span>&nbsp;<span style="font-weight:bold;color:#74531f;">Overlaps</span>(<span style="color:#2b91af;">Reservation</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherReservation</span>) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">if</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">otherReservation</span>&nbsp;<span style="color:blue;">is</span>&nbsp;<span style="color:blue;">null</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">throw</span>&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">ArgumentNullException</span>(<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">otherReservation</span>)); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">other</span>&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">Seating</span>(SeatingDuration,&nbsp;<span style="font-weight:bold;color:#1f377f;">otherReservation</span>.At); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;<span style="font-weight:bold;color:#74531f;">Overlaps</span>(<span style="font-weight:bold;color:#1f377f;">other</span>); } <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">bool</span>&nbsp;<span style="font-weight:bold;color:#74531f;">Overlaps</span>(<span style="color:#2b91af;">Seating</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">other</span>) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">if</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">other</span>&nbsp;<span style="color:blue;">is</span>&nbsp;<span style="color:blue;">null</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">throw</span>&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">ArgumentNullException</span>(<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">other</span>)); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;Start&nbsp;<span style="font-weight:bold;color:#74531f;">&lt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">other</span>.End&nbsp;&amp;&amp;&nbsp;<span style="font-weight:bold;color:#1f377f;">other</span>.Start&nbsp;<span style="font-weight:bold;color:#74531f;">&lt;</span>&nbsp;End; }</pre> </p> <p> Notice how the core implementation, the overload that takes another <code>Seating</code> object, implements a <a href="https://en.wikipedia.org/wiki/Binary_relation">binary relation</a>. To extrapolate from <a href="/ref/ddd">Domain-Driven Design</a>, whenever you arrive at 'proper' mathematics to describe the application domain, it's usually a sign that you've arrived at something fundamental. </p> <p> The <code>Overlaps</code> functions are <code>public</code> and easy to unit test in their own right. Even so, in the code base that accompanies Code That Fits in Your Head, there are no tests that directly exercise these functions, since they only grew out of refactoring the implementation of <code>MaitreD.WillAccept</code>, which is covered by many tests. Since the <code>Overlaps</code> functions only emerged as a result of test-driven development, they <a href="/2021/09/13/unit-testing-private-helper-methods">might as well have been private helper methods</a>, but I later needed them for verifying some unrelated test outcomes. </p> <p> The filtering performed in <code>WillAccept</code> will throw away any reservations that don't overlap. Even if <code>existingReservations</code> contained the entire data set from the database, it would still be correct. Given, however, that there could be hundreds of thousands of reservations, it seems prudent to perform some coarse-grained filtering in the database. </p> <p> The <code>ReservationsController</code> that calls <code>WillAccept</code> first queries the database, getting all the reservation on the relevant date. </p> <p> <pre><span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">reservations</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;Repository &nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>.At) &nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);</pre> </p> <p> Now that I write this description, I realize this query, while wide in one sense, could actually be too narrow. None of my test restaurants have a last seating after midnight, but I wouldn't rule that out in certain cultures. If so, it's easy to widen the coarse-grained query to include reservations for the day before (for breakfast restaurants, perhaps) and the day after, assuming that no seating lasts more than 24 hours. </p> <p> All that said, the point is that <code>ReadReservations(restaurant.Id, reservation.At)</code> (which is an extension method) performs a simple, coarse-grained query for reservations that may be relevant to consider, given the candidate reservation. This query should return a 'gross' data set that contains all relevant, but also some irrelevant, reservations, thereby keeping the query simple. An indeed, the actual database interaction is this parametrised query: </p> <p> <pre><span style="color:blue;">SELECT</span>&nbsp;[PublicId]<span style="color:gray;">,</span>&nbsp;[At]<span style="color:gray;">,</span>&nbsp;[Name]<span style="color:gray;">,</span>&nbsp;[Email]<span style="color:gray;">,</span>&nbsp;[Quantity] <span style="color:blue;">FROM</span>&nbsp;[dbo]<span style="color:gray;">.</span>[Reservations] <span style="color:blue;">WHERE</span>&nbsp;[RestaurantId]&nbsp;<span style="color:gray;">=</span>&nbsp;@RestaurantId&nbsp;<span style="color:gray;">AND</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@Min&nbsp;<span style="color:gray;">&lt;=</span>&nbsp;[At]&nbsp;<span style="color:gray;">AND</span>&nbsp;[At]&nbsp;<span style="color:gray;">&lt;=</span>&nbsp;@Max</pre> </p> <p> This range query should be simple enough that a few integration tests should be sufficient to give you confidence that it works correctly. </p> <h3 id="83c3068d8d9e4aebaa4e5a8b3c0a1686"> Consequences <a href="#83c3068d8d9e4aebaa4e5a8b3c0a1686">#</a> </h3> <p> The main benefit from a design like this is that it shifts some of the burden of correctness to the Domain Model, which is easier to test, maintain, and version than is typically the case for query languages. An added advantage is improved separation of concerns. </p> <p> In practice, server-side filtering tends to mix two independent concerns: Performance and correctness. Filtering is important for performance, because the alternative is to transmit all rows to the client. Filtering is also important for correctness, because the code making use of the data should only consider data relevant for its purpose. Exclusive server-side filtering performs both of these tasks, thereby mixing concerns. Moving filtering for correctness to a Domain Model can make explicit that these are two separate concerns. </p> <p> While a Domain Model can implement in-memory filtering, it can only deal with data that is too wide; that is, it can identify and remove superfluous data. If, on the other hand, the dataset passed to the Domain Model lacks relevant records, the Domain Model can't detect that. The above discussion about the reservation system contains a concrete discussion of such a problem. Thus, Domain-based filtering does not alleviate developers from the burden of ensuring that any server-side filtering is sufficiently permissible. </p> <p> Another consequence of this design is that as server-side queries become more coarse-grained, this could increase potential cache hit ratios. If you somehow cache queries, when queries become more general, there will be less variation, and thus caches will need fewer entries that will statistically be hit more often. This applies to CQRS-style architectures, too. </p> <p> Consider the restaurant reservation example, above. Since queries are only distinguished by date, you can easily cache query results by date, and all reservation requests for a given date may go through that cache. If, as a counter-example, all filtering took place in the database, a query for a reservation at 18:00 would be different from a query for 18:30, and so on. This would make a hypothetical cache bigger, and decrease the frequency of cache hits. </p> <h3 id="b94bae149c32415fa8708498c1c4cdbb"> Test evidence <a href="#b94bae149c32415fa8708498c1c4cdbb">#</a> </h3> <p> When I originally decided that <code>WillAccept</code> should perform in-memory filtering, my motivation was one of correctness. I was concerned whether I could get the seating overlap detection correct without comprehensive testing, and I thought that it would be easier to test a function doing in-memory filtering than to drive all of this via integration tests involving a real <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a> instance. (Not that I don't know how to do this. The code base accompanying the book has examples of tests that exercise the database. These tests are, however, more work to write and maintain, and they execute slower.) </p> <p> As discussed in <a href="/2026/01/05/coupling-from-a-big-o-perspective">Coupling from a big-O perspective</a>, I much later realized that I actually had no test coverage of edge cases related to querying the database. It was only after attempting to write such a test that I realized that the design had the consequence that a marginal error in the database query had no impact on the correctness of the overall system. Here's that test: </p> <p> <pre>[<span style="color:#2b91af;">Fact</span>] <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&nbsp;<span style="font-weight:bold;color:#74531f;">AttemptEdgeCaseBooking</span>() { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">twentyFour7</span>&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">Restaurant</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;247, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#a31515;">&quot;24/7&quot;</span>, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">MaitreD</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">opensAt</span>:&nbsp;<span style="color:#2b91af;">TimeSpan</span>.<span style="color:#74531f;">FromHours</span>(0), &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">lastSeating</span>:&nbsp;<span style="color:#2b91af;">TimeSpan</span>.<span style="color:#74531f;">FromHours</span>(0), &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">seatingDuration</span>:&nbsp;<span style="color:#2b91af;">TimeSpan</span>.<span style="color:#74531f;">FromDays</span>(1), &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">tables</span>:&nbsp;<span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">Standard</span>(1))); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">db</span>&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">FakeDatabase</span>(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">now</span>&nbsp;=&nbsp;<span style="color:#2b91af;">DateTime</span>.Now; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">sut</span>&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">ReservationsController</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">SystemClock</span>(), &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">InMemoryRestaurantDatabase</span>(<span style="font-weight:bold;color:#1f377f;">twentyFour7</span>), &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">db</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">r1</span>&nbsp;=&nbsp;<span style="color:#2b91af;">Some</span>.Reservation.<span style="font-weight:bold;color:#74531f;">WithDate</span>(<span style="font-weight:bold;color:#1f377f;">now</span>.<span style="font-weight:bold;color:#74531f;">AddDays</span>(3).Date); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">Post</span>(<span style="font-weight:bold;color:#1f377f;">twentyFour7</span>.Id,&nbsp;<span style="font-weight:bold;color:#1f377f;">r1</span>.<span style="font-weight:bold;color:#74531f;">ToDto</span>()); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">r2</span>&nbsp;=&nbsp;<span style="color:#2b91af;">Some</span>.Reservation.<span style="font-weight:bold;color:#74531f;">WithDate</span>(<span style="font-weight:bold;color:#1f377f;">now</span>.<span style="font-weight:bold;color:#74531f;">AddDays</span>(2).Date); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">ar</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">Post</span>(<span style="font-weight:bold;color:#1f377f;">twentyFour7</span>.Id,&nbsp;<span style="font-weight:bold;color:#1f377f;">r2</span>.<span style="font-weight:bold;color:#74531f;">ToDto</span>()); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">IsAssignableFrom</span>&lt;<span style="color:#2b91af;">CreatedAtActionResult</span>&gt;(<span style="font-weight:bold;color:#1f377f;">ar</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;More&nbsp;assertions&nbsp;could&nbsp;go&nbsp;here.</span> }</pre> </p> <p> This test is an attempt to cover the edge case related to how the system queries the database, just like the Moq-based test shown in <a href="/2025/09/15/greyscale-box-test-driven-development">Greyscale-box test-driven development</a>. The idea is to create a reservation that just barely touches a reservation the following day, and thereby trigger a test failure when a change is made to the query, similar to how the Moq-based test fails. Even with a custom restaurant, I can't, however, get this test to fail, because of the Domain-based filtering, which keeps the system working correctly. </p> <p> It was then that I realized that what I had inadvertently done was to strengthen the contract of <code>WillAccept</code>, compared to a more stereotypical design. Who knew test-driven development could lead to better encapsulation? </p> <h3 id="1ba293a2ce314f8291ea0310de461932"> Conclusion <a href="#1ba293a2ce314f8291ea0310de461932">#</a> </h3> <p> Some queries may become so complicated that they are difficult to maintain. Bugs creep in, you address them, only to reanimate regressions. When this happens, consider moving the complicated parts of data filtering to the client, preferably to a Domain Model. This enables you to test the filtering logic with as much rigour as is required. </p> <p> For small databases, you may read the entire dataset into memory, but usually you will need to retain some coarse-grained filtering on the database server. </p> <p> This design, while more complicated than letting a query language like SQL handle all filtering, can lead to better encapsulation and separation of concerns. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2026/01/19/filtering-as-domain-logic Two regimes of Git https://blog.ploeh.dk/2026/01/12/two-regimes-of-git/ Mon, 12 Jan 2026 08:23:00 UTC <div id="post"> <p> <em>Using Git for CI is not the same as Tactical Git.</em> </p> <p> <a href="https://git-scm.com/">Git</a> is such a versatile tool that when discussing it, interlocutors may often talk past each other. One person's use is so different from the way the next person uses it that every discussion is fraught with risk of misunderstandings. This happens to me a lot, because I use Git in two radically different ways, depending on context. </p> <p> Should you rebase? Merge? Squash? Cherry-pick? </p> <p> Often, being more explicit about a context can help address confusion. </p> <p> I know of at least two ways of using Git that differ so much from each other that I think we may term them two different regimes. The rules I follow in one regime don't all apply in the other, and vice versa. </p> <p> In this article I'll describe both regimes. </p> <h3 id="d0a8a4be292646719815e959011a0e9d"> Collaboration <a href="#d0a8a4be292646719815e959011a0e9d">#</a> </h3> <p> Most people use Git because it facilitates collaboration. Like other source-control systems, it's a way to share a code base with coworkers, or open-source contributors. <a href="https://en.wikipedia.org/wiki/Continuous_integration">Continuous Integration</a> is a subset in this category, and to my knowledge still the best way to collaborate. </p> <p> When I work in this regime, I follow one dominant rule: Once history is shared with others, it should be considered immutable. When you push to a shared instance of the repository, other people may pull your changes. Changing the history after having shared it is going to confuse most Git clients. It's much easier to abstain from editing shared history. </p> <p> What if you shared something that contains an error? Then fix the error and push that update, too. Sometimes, you can use <a href="https://git-scm.com/docs/git-revert">git revert</a> for this. </p> <p> A special case is reserved for mistakes that involve leaking security-sensitive data. If you accidentally share a password, a revert doesn't rectify the problem. The data is still in the history, so this is a singular case where I know of no better remedy than rewriting history. That is, however, quite bothersome, because you now need to communicate to every other collaborator that this is going to happen, and that they may be best off making a new clone of the repository. If there's a better way to address such situations, I don't know of it, but would be happy to learn. </p> <p> Another consequence of the Collaboration regime follows from the way pull requests are typically implemented. In GitHub, sending a pull request is a two-step process: First you push a branch, and then you click a button to send the pull request. I usually use the GitHub web user interface to review my own pull-request branch before pushing the button. Occasionally I spot an error. At this point I consider the branch 'unshared', so I may decide to rewrite the history of that branch and force-push it. Once, however, I've clicked the button and sent the pull request, I consider the branch shared, and the same rules apply: Rewriting history is not allowed. </p> <p> One implication of this is that the set of Git actions you need to know is small: You can effectively get by with <a href="https://git-scm.com/docs/git-add">git add</a>, <a href="https://git-scm.com/docs/git-commit">commit</a>, <a href="https://git-scm.com/docs/git-pull">pull</a>, <a href="https://git-scm.com/docs/git-push">push</a>, and possibly a few more. </p> <p> Many of the 'advanced' Git features, such as <a href="https://git-scm.com/docs/git-rebase">rebase</a> and squash, allow you to rewrite history, so aren't allowed in this regime. </p> <h3 id="513088e0595e493a94a6aa96fa0ae92d"> Tactical Git <a href="#513088e0595e493a94a6aa96fa0ae92d">#</a> </h3> <p> As far as I can tell, Git wasn't originally created for this second use case, but it turns out that it's incredibly useful for local management of code files. This is what I've previously described as <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">Tactical Git</a>. </p> <p> Once you realize that you have a version-control system at your fingertips, the opportunities are manifold. You can perform experiments in a branch that only exists on your machine. You may, for example, test alternative API design ideas, implementations, etc. There's no reason to litter the code base with commented-out code because you're afraid that you'll need something later. Just commit it on a local branch. If it later turns out that the experiment didn't turn out to your liking, commit it anyway, but then check out <code>master</code>. You'll leave the experiment on your local machine, and it's there if you need it later. </p> <p> You can even used failed experiments as evidence that a particular idea has undesirable consequences. Have you ever been in a situation where a coworker suggests a new way of doing things. You may have previously responded that you've already tried that, and it didn't work. How well did that answer go over with your coworker? </p> <p> He or she probably wasn't convinced. </p> <p> What if, however, you've <em>kept that experiment on your own machine?</em> Now you can say: "Not only have I already tried this, but I'm happy to share the relevant branch with you." </p> <p> You can see an example of that in listing 8.10 in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>. This code listing is based on a side-branch never merged into <code>master</code>. If you have the book, you also have access to the entire Git repository, and you can check for yourself that commit <code>0bb8068</code> is a dead-end branch named <code>explode-maitre-d-arguments</code>. </p> <p> Under the Tactical Git regime, you can also go back and edit mistakes when working on code that you haven't yet shared. I use <a href="https://www.industriallogic.com/blog/whats-this-about-micro-commits/">micro-commits</a>, so I tend to check in small commits often. Sometimes, as I'm working with the code, I notice that I made a mistake a few commits ago. Since I'm a neat freak, I often use interactive rebase to go back and correct my mistakes before sharing the history with anyone else. I don't do that to look perfect, but rather to leave behind a legible trail of changes. If I already know that I made a mistake before I've shared my code with anyone else, there's no reason to burden others with both the mistake and its rectification. </p> <p> In general, I aim to leave as nice a Git history as possible. This is not only for my collaborators' sake, but for my own, too. Legible Git histories and micro-commits make it easier to troubleshoot later, as <a href="/2020/10/05/fortunately-i-dont-squash-my-commits">this story demonstrates</a>. </p> <p> The toolset useful for Tactical Git is different than for collaboration. You still use <code>add</code> and <code>commit</code>, of course, but I also use (interactive) <code>rebase</code> often, as well as <a href="https://git-scm.com/docs/git-stash">stash</a> and <a href="https://git-scm.com/docs/git-branch">branch</a>. Only rarely do I need <a href="https://git-scm.com/docs/git-cherry-pick">cherry-pick</a>, but it's useful when I do need it. </p> <h3 id="47b93db51f6a422296ebbd3fb0439401"> Conclusion <a href="#47b93db51f6a422296ebbd3fb0439401">#</a> </h3> <p> When discussing good Git practices, it's easy to misunderstand each other because there's more than one way to use Git. I know of at least two radically different modes: Collaboration and Tactical Git. The rules that apply under the Collaboration regime should not all be followed slavishly when in the Tactical Git regime. Specifically, the rule about rewriting history is almost turned on its head. Under the Collaboration regime, do not rewrite Git history; under the Tactical Git regime, rewriting history is encouraged. </p> </div> <div id="comments"> <hr> <h2 id="comments-header"> Comments </h2> <div class="comment" id="b4e8f2a1d6c94e3a7b5f9d2c8e1a4f6b"> <div class="comment-author">Carlos Schults <a href="#b4e8f2a1d6c94e3a7b5f9d2c8e1a4f6b">#</a></div> <div class="comment-content"> <p> Hi, Mark. Thanks for the article. Regarding the issue of secrets being added to the soure code, wouldn't it be better to rotate the secrets (i.e, change the password, revoke the API key, etc), instead of changing shared history? Unless that can't be done for some reason, of course. </p> </div> <div class="comment-date">2026-01-12 16:52 UTC</div> </div> <div class="comment" id="3aeed7b08dc244f2976333678c664f1a"> <div class="comment-author"><a href="/">Mark Seemann</a> <a href="#3aeed7b08dc244f2976333678c664f1a">#</a></div> <div class="comment-content"> <p> Thank you for writing. Honestly, that option hadn't crossed my mind, but whenever possible, that sounds like the best alternative. </p> </div> <div class="comment-date">2026-01-13 06:21 UTC</div> </div> <div class="comment" id="6e2d978c6a7f450095fd2c0707f5bf4a"> <div class="comment-author"><a href="https://github.com/harshvchawla">harshvchawla</a> <a href="#6e2d978c6a7f450095fd2c0707f5bf4a">#</a></div> <div class="comment-content"> <blockquote> A special case is reserved for mistakes that involve leaking security-sensitive data. If you accidentally share a password, a revert doesn't rectify the problem. The data is still in the history, so this is a singular case where I know of no better remedy than rewriting history. That is, however, quite bothersome, because you now need to communicate to every other collaborator that this is going to happen, and that they may be best off making a new clone of the repository. If there's a better way to address such situations, I don't know of it, but would be happy to learn. </blockquote> <p> I recently learnt about https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository but even the page mentions how it comes with its own burden </p> </div> <div class="comment-date">2026-02-01 06:33 UTC</div> </div> </div> <hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2026/01/12/two-regimes-of-git Coupling from a big-O perspective https://blog.ploeh.dk/2026/01/05/coupling-from-a-big-o-perspective/ Mon, 05 Jan 2026 11:45:00 UTC <div id="post"> <p> <em>Don't repeat yourself (DRY) implies O(1) edits.</em> </p> <p> Here's a half-baked idea: We may view coupling in software through the lens of <a href="https://en.wikipedia.org/wiki/Big_O_notation">big-O notation</a>. Since this isn't yet a fully-formed idea of mine, this is one of those articles I write in order to learn from the process of having to formulate the idea to other people. </p> <h3 id="27a38591f54342e9845fe2e67fa2b893"> Widening the scope of big-O analysis <a href="#27a38591f54342e9845fe2e67fa2b893">#</a> </h3> <p> Big-O analysis is usually described in terms of functions on ℝ (the real numbers), such as O(n), O(lg n), O(n<sup>3</sup>), O(2<sup>n</sup>) and so on. This is somewhat ironic because when analysing algorithm efficiency, <em>n</em> is usually an integer (i.e. <em>n</em> ∈ ℕ). That, however, suits me fine, because it establishes precedence for what I have in mind. </p> <p> Usually, big-O analysis is applied to algorithms, and usually by measuring an abstract notion of an 'instruction step'. You can, however, also apply such analysis to other aspects of resource utilization. Even within the confines of algorithm analysis, you may instead of instruction count be concerned with memory consumption. In other words, you may analyze an algorithm in order to determine that it uses O(n<sup>2</sup>) memory. </p> <p> With that in mind, nothing prevents you from widening the scope further. While I tend to be disinterested in the small-scale performance optimizations involved with algorithms, I have a keen eye on how it applies to software architecture. In modern computers, CPU cycles are fast, but network hops are still noticeable to human perception. For example, the well-known <em>n+1 problem</em> really just implies O(n) network calls. Given that a single network hop may already (depending on topology and distance) be observable, even moderate numbers of <em>n</em> (e.g. 100) may be a problem. </p> <p> What I have in mind for this article is to once more transplant the thinking behind big-O notation to a new area. Instead of instructions or network calls, let O(...) indicate the number of edits you have to make in a code base in order to make a change. If we want to be more practical about it, we may measure this number in how many methods or functions we need to edit, or, even more coarsely, the number of files we need to change. </p> <h3 id="09584e8476bd4ef39a204a8069c12d0e"> Don't Repeat Yourself <a href="#09584e8476bd4ef39a204a8069c12d0e">#</a> </h3> <p> In this view, the old <a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself">DRY principle</a> implies O(1) edits. You create a single point in your code base responsible for a given behaviour. If you need to make changes, you edit a single part of the code base. This seems obvious. </p> <p> What the big-O perspective implies, however, is that a small constant number of edits may be fine, too. For instance, 'dual' coupling, where two code blocks change together, is not that uncommon. This could for example be where you model messages on an internal queue. Every time you add a new message type, you'll need to define both how to send it (i.e. what data it contains and how it serializes) and how to handle it. If you are using a statically typed language, you can use a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a> or <a href="https://en.wikipedia.org/wiki/Visitor_pattern">Visitor</a> to keep track of all message types, which means that the type checker will remind you if you forget one or the other. </p> <p> In big-O notation, we simplify all constants to <em>1</em>, so even if you have systematic, but constant, coupling like this, we would still consider it O(1). In other words, if your architecture contains <em>some</em> coupling that remains constant, we may deem it O(1) and perhaps benign. </p> <p> This also suggests why we have a heuristic like the <a href="https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)">rule of three</a>. Dual duplication is still O(1), and as long as the coupling stays constant, there's no evidence that it's growing. Once you make the third copy does evidence begin to suggest that the coupling is O(n) rather than O(1). </p> <h3 id="e0f1270bc4b24a97bc373f61052011dd"> Small values of 1 <a href="#e0f1270bc4b24a97bc373f61052011dd">#</a> </h3> <p> Big-O notation is concerned with comparing orders of magnitude, which is why specific constants are simplified to <em>1</em>. The number <em>1</em> is a stand-in for any constant value, <em>1</em>, <em>2</em>, <em>10</em>, og even six billion. When editing source code, however, the actual number of edits does matter. In the following sections, I'll give concrete examples where '1' is small. </p> <h3 id="4020dbeed059499e9d345d3657a57e2f"> Test-specific equality <a href="#4020dbeed059499e9d345d3657a57e2f">#</a> </h3> <p> The first example we may consider is <a href="http://xunitpatterns.com/test-specific%20equality.html">test-specific equality</a>. My <a href="/2010/06/29/IntroducingAutoFixtureLikeness">first treatment</a> related to this topic was in 2010, and <a href="/2012/06/21/TheResemblanceidiom">again in 2012</a>. Since then, I've come to view the need for test-specific equality as a test smell. If you are doing test-driven development (which you <a href="/2025/10/20/epistemology-of-software">chiefly should</a>), giving your objects or values <a href="/2021/05/03/structural-equality-for-better-tests">sane equality semantics makes testing much easier</a>. And a well-known benefit of test-driven development (TDD) is that <a href="/2011/11/10/TDDimprovesreusability">code that is easy to test is easy to use</a>. </p> <p> Still, if you must work with mutable objects (as in naive object-oriented design), you can't give objects structural equality. And as I recently rediscovered, functional programming doesn't entirely shield you from this kind of problem either. Functions, for example, don't have clear equality semantics in practice, so when bundling data and behaviour (does that <a href="/2018/01/22/function-isomorphisms">sound familiar</a>?), data structures can't have structural equality. </p> <p> Still, TDD suggests that you should reconsider your API design when that happens. Sometimes, however, part of an API is locked. I recently described such a situation, which prompted me to write <a href="/2025/12/22/test-specific-eq">test-specific Eq instances</a>. In short, the <a href="https://www.haskell.org/">Haskell</a> data type <code>Finch</code> was not an <code>Eq</code> instance, so I added this test-specific data type to improve testability: </p> <p> <pre><span style="color:blue;">data</span>&nbsp;FinchEq&nbsp;=&nbsp;FinchEq &nbsp;&nbsp;{&nbsp;feqID&nbsp;::&nbsp;Int &nbsp;&nbsp;,&nbsp;feqHP&nbsp;::&nbsp;Galapagos.HP &nbsp;&nbsp;,&nbsp;feqRoundsLeft&nbsp;::&nbsp;Galapagos.Rounds &nbsp;&nbsp;,&nbsp;feqColour&nbsp;::&nbsp;Galapagos.Colour &nbsp;&nbsp;,&nbsp;feqStrategyExp&nbsp;::&nbsp;Exp&nbsp;} &nbsp;&nbsp;<span style="color:blue;">deriving</span>&nbsp;(<span style="color:#2b91af;">Eq</span>,&nbsp;<span style="color:#2b91af;">Show</span>)</pre> </p> <p> Later, I also introduced a second test-specific data structure, <code>CellStateEq</code> to address the equivalent problem that <code>CellState</code> isn't an <code>Eq</code> instance. This means that I have two representations of essentially the same kind of data. If I, much later, learn that I need to add, remove, or modify a field of, say, <code>Finch</code>, I would also need to edit <code>FinchEq</code>. </p> <p> There's a clear edit-time coupling with constant value <em>2</em>. When I edit one, I also need to edit the other. In big-O perspective, we could say that the specific value of <em>1</em> is <em>2</em>, or <em>1~2</em>, and so the edits required to maintain this part of the code base is of the order O(1). </p> <h3 id="fca4412f1ed3446694a8e027a1b2666d"> Maintaining Fake objects <a href="#fca4412f1ed3446694a8e027a1b2666d">#</a> </h3> <p> Another interesting example is the one that originally elicited this chain of thought. In <a href="/2025/09/15/greyscale-box-test-driven-development">Greyscale-box test-driven development</a> I showed an example of how using interactive white-box testing with <a href="http://xunitpatterns.com/Configurable%20Test%20Double.html#Dynamically%20Generated%20Test%20Double">Dynamically Generated Test Doubles</a> (AKA Dynamic Mocks) leads to <a href="http://xunitpatterns.com/Fragile%20Test.html">Fragile Tests</a>. More on this later, but I also described how using <a href="http://xunitpatterns.com/Fake%20Object.html">Fake Objects</a> and <a href="/2019/02/18/from-interaction-based-to-state-based-testing">state-based testing</a> doesn't have the same problem. </p> <p> In <a href="https://bsky.app/profile/ladeak.net/post/3lyvldkwf6c2h">a response on Bluesky</a> Laszlo (<a href="https://bsky.app/profile/ladeak.net">@ladeak.net</a>) pointed out that this seemed to imply a three-way coupling that I had, frankly, overlooked. </p> <p> You can review the full description of the example in the article <a href="/2025/09/15/greyscale-box-test-driven-development">Greyscale-box test-driven development</a>, but in summary it proceeds like this: We wish to modify an implementation detail related to how the system queries its database. Specifically, we wish to change an inclusive integer-based upper bound to an exclusive bound. Thus, we change the relevant part of the <a href="https://en.wikipedia.org/wiki/SQL">SQL</a> <code>WHERE</code> clause from </p> <p> <pre><span style="color:maroon;">@Min&nbsp;&lt;=&nbsp;[At]&nbsp;AND&nbsp;[At]&nbsp;&lt;=&nbsp;@Max&quot;</span></pre> </p> <p> to </p> <p> <pre><span style="color:maroon;">@Min&nbsp;&lt;=&nbsp;[At]&nbsp;AND&nbsp;[At]&nbsp;&lt;&nbsp;@Max&quot;</span></pre> </p> <p> Specifically, the single-character edit removes <code>=</code> from the rightmost <code>&lt;=</code>. </p> <p> Since this modification changes the implied contract, we also need to edit the calling code. That's another single-line edit that changes </p> <p> <pre><span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">max</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">min</span>.<span style="font-weight:bold;color:#74531f;">AddDays</span>(1).<span style="font-weight:bold;color:#74531f;">AddTicks</span>(-1);</pre> </p> <p> to </p> <p> <pre><span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">max</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">min</span>.<span style="font-weight:bold;color:#74531f;">AddDays</span>(1);</pre> </p> <p> What I had overlooked was that I should also have changed the single test-specific Fake object used for state-based testing. Since I changed the contract of the <code>IReservationsRepository</code> interface, and since <a href="/2023/11/13/fakes-are-test-doubles-with-contracts">Fakes are Test Doubles with contracts</a>, it follows that the <code>FakeDatabase</code> class must also change. </p> <p> This I had overlooked because no tests based on <code>FakeDatabase</code> failed. More on that in <a href="/2026/01/19/filtering-as-domain-logic">a future post</a>, but the required edit is easy enough. Change </p> <p> <pre>.<span style="font-weight:bold;color:#74531f;">Where</span>(<span style="font-weight:bold;color:#1f377f;">r</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">min</span>&nbsp;<span style="font-weight:bold;color:#74531f;">&lt;=</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">r</span>.At&nbsp;&amp;&amp;&nbsp;<span style="font-weight:bold;color:#1f377f;">r</span>.At&nbsp;<span style="font-weight:bold;color:#74531f;">&lt;=</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">max</span>).<span style="font-weight:bold;color:#74531f;">ToList</span>());</pre> </p> <p> to </p> <p> <pre>.<span style="font-weight:bold;color:#74531f;">Where</span>(<span style="font-weight:bold;color:#1f377f;">r</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">min</span>&nbsp;<span style="font-weight:bold;color:#74531f;">&lt;=</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">r</span>.At&nbsp;&amp;&amp;&nbsp;<span style="font-weight:bold;color:#1f377f;">r</span>.At&nbsp;<span style="font-weight:bold;color:#74531f;">&lt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">max</span>).<span style="font-weight:bold;color:#74531f;">ToList</span>());</pre> </p> <p> Again, the edit involves deleting a single <code>=</code> character. </p> <p> <img src="/content/binary/three-way-dependency.png" alt="A box labelled 'Calling code' with arrows to two other boxes: One labelled FakeDatabase, and another labelled SqlReservationsRepository." width="500"> </p> <p> Still, in this example, not only two, but three files are coupled. With the perspective of big-O notation, however, we may say that <em>1~3</em>, and the order of edits required to maintain this part of the code base remains O(1). Later in this article, I will discuss the maintenance burden of dynamic mocks, which I consider to be O(n). Thus, even if I have a three-way coupling, I don't expect the coupling to grow over time. That's the point: I prefer O(1) over O(n). </p> <h3 id="8d57760e1eb94d01881585eefa6d0a71"> Large values of 1 <a href="#8d57760e1eb94d01881585eefa6d0a71">#</a> </h3> <p> As I'm sure practical programmers know, big-O notation has limitations. First, as <a href="https://doc.cat-v.org/bell_labs/pikestyle">Rob Pike observed</a>, "<em>n</em> is usually small". More germane to this discussion </p> <blockquote> <p> "algorithms have big constants." </p> <footer><cite><a href="https://doc.cat-v.org/bell_labs/pikestyle">Notes on Programming in C</a>, Rob Pike, 1989</cite></footer> </blockquote> <p> In this context, this implies that the constant we're deliberately ignoring when we label something O(1) could, in theory, be significant. We don't write O(2,000,000), but if we did, it would look like more, wouldn't it? Even if it doesn't depend on <em>n</em>. </p> <p> It looks to me that when we discuss source code edits, <em>5</em> or <em>6</em> could already be considered large. </p> <h3 id="aabab4ba51e245d289e78f11b446ea9b"> Layers <a href="#aabab4ba51e245d289e78f11b446ea9b">#</a> </h3> <p> Although software design thought leaders have denounced layered software architecture more than a decade ago, I don't entirely agree with that position. That, however, is a topic for a different article. In any case, I still regularly see examples of design that involves a <em>UI DTO</em>, a <em>Domain Model</em>, and a <em>Data Access layer</em>. </p> <p> As I enumerated in 2012, a simple operation, such as adding a <em>label</em> field, involves at least six steps. </p> <blockquote> <ol> <li>"A Label column must be added to the database schema and the DbTrack class.</li> <li>"A Label property must be added to the Track class.</li> <li>"The mapping from DbTrack to Track must be updated.</li> <li>"A Label property must be added to the TopTrackViewModel class.</li> <li>"The mapping from Track to TopTrackViewModel must be updated.</li> <li>"The UI must be updated."</li> </ol> <footer><cite><a href="/2012/02/09/IsLayeringWorththeMapping">Is Layering Worth the Mapping?</a>, 2012</cite></footer> </blockquote> <p> People often complain about all the seemingly redundant work involved with such layering, and I don't blame them. At least, if there's no clear motivation for a design like that, and no evident benefit, it looks like redundant work. While you can make good use of separating concerns across layers, that's outside the scope of this article. In the naive way most often employed, it seems like mindless ceremony. </p> <p> Even so, how would we denote the above enumeration in terms of big-O notation? Adding a <em>label</em> field is an O(1) edit. </p> <p> How so? Adding, changing, or deleting a field in a particular database table always entails the same number of steps (six) as outlined above. If, in addition to the Track table you want to add, say, an Album table, you create it according to the three-layer model. This again means that every edit of <em>that</em> table involves six steps. It's still O(1), with <em>1~6</em>, but already it hurts. </p> <p> Apparently, six may be a 'large constant'. </p> <h3 id="3e44a59d433c4b03acdff65a491c51ac"> Linear edits <a href="#3e44a59d433c4b03acdff65a491c51ac">#</a> </h3> <p> So far, we've exclusively examined multiple examples of O(1) edits. Some of them, particularly the layered-architecture example, may seem counterintuitive at first. If it requires editing six different 'blocks' of code to make a single change (not counting tests!) is still O(1), then does <em>anything</em> constitute O(n), or any other kind of relationship? </p> <p> To be realistic, I don't think we're in an analytical regime that allows us fine distinctions like identifying any kind of code organization to be, say O(lg n) or O(n lg n). On the other hand, examples of O(n) abound. </p> <p> Every time you run into the <a href="https://en.wikipedia.org/wiki/Shotgun_surgery">Shotgun Surgery</a> anti-pattern, you are looking at O(n) edits. As a simple example, consider poorly-factored logging, as for example shown initially in <a href="/2020/03/23/repeatable-execution">Repeatable execution</a>. In such situations, you have <em>n</em> classes that log. If you need to change how logging is done, you must change <em>n</em> classes. </p> <p> More generally, the main (unstated) goal of the DRY principle is to turn O(n) edits into O(1) edits. </p> <p> Every junior developer already knows this. Notwithstanding, there's a category of code where even senior programmers routinely forget this. </p> <h3 id="587527167f7c44be89d51536ccae34f6"> Linear test coupling <a href="#587527167f7c44be89d51536ccae34f6">#</a> </h3> <p> When it comes to automated testing, many developers treat test code stepmotherly. The most common mistake is the misguided notion that copy-and-paste code is fine in test code. <a href="/2025/12/01/treat-test-code-like-production-code">It's not</a>. Duplicated test code means that when you make a change in the System Under Test, <em>n</em> tests break, and you will have to fix each one individually, a clear O(n) edit (where <em>n</em> is the number of tests). </p> <p> A more subtle example of an O(n) test maintenance burden can be found in test code that uses dynamic mocks. When you use a configurable mock object, each test contains isolated configuration code related to that specific test. </p> <p> Let's look at an example. Consider the <a href="https://github.com/moq/moq4">Moq</a>-based tests from <a href="/2019/02/25/an-example-of-interaction-based-testing-in-c">An example of interaction-based testing in C#</a>. One test contains this <a href="https://xp123.com/3a-arrange-act-assert/">Assert</a> phase: </p> <p> <pre>readerTD &nbsp;&nbsp;&nbsp;&nbsp;.Setup(r&nbsp;=&gt;&nbsp;r.Lookup(user.Id.ToString())) &nbsp;&nbsp;&nbsp;&nbsp;.Returns(<span style="color:#2b91af;">Result</span>.Success&lt;<span style="color:#2b91af;">User</span>,&nbsp;<span style="color:#2b91af;">IUserLookupError</span>&gt;(user)); readerTD &nbsp;&nbsp;&nbsp;&nbsp;.Setup(r&nbsp;=&gt;&nbsp;r.Lookup(otherUser.Id.ToString())) &nbsp;&nbsp;&nbsp;&nbsp;.Returns(<span style="color:#2b91af;">Result</span>.Success&lt;<span style="color:#2b91af;">User</span>,&nbsp;<span style="color:#2b91af;">IUserLookupError</span>&gt;(otherUser));</pre> </p> <p> Another test arranges the same two <a href="https://martinfowler.com/bliki/TestDouble.html">Test Doubles</a>, but configures the second differently. </p> <p> <pre>readerTD &nbsp;&nbsp;&nbsp;&nbsp;.Setup(r&nbsp;=&gt;&nbsp;r.Lookup(user.Id.ToString())) &nbsp;&nbsp;&nbsp;&nbsp;.Returns(<span style="color:#2b91af;">Result</span>.Success&lt;<span style="color:#2b91af;">User</span>,&nbsp;<span style="color:#2b91af;">IUserLookupError</span>&gt;(user)); readerTD &nbsp;&nbsp;&nbsp;&nbsp;.Setup(r&nbsp;=&gt;&nbsp;r.Lookup(otherUserId)) &nbsp;&nbsp;&nbsp;&nbsp;.Returns(<span style="color:#2b91af;">Result</span>.Error&lt;<span style="color:#2b91af;">User</span>,&nbsp;<span style="color:#2b91af;">IUserLookupError</span>&gt;( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">UserLookupError</span>.InvalidId));</pre> </p> <p> Yet more tests arrange the System Under Test (SUT) in other combinations. Refer to the article for the full example. </p> <p> Such tests don't contain duplication per se, but each test is coupled to the SUT's dependencies. When you change one of the interfaces, you break O(n) tests, and you have to fix each one individually. </p> <p> As suggested earlier in the article, this is the reason to favour Fake Objects. While an interface change may still break the tests, the effort to correct them is O(1) edits. </p> <h3 id="aecc149a7055454bae3e287fd5688c02"> Two kinds of coupling <a href="#aecc149a7055454bae3e287fd5688c02">#</a> </h3> <p> The big-O perspective on coupling suggests that there are two kinds of coupling: O(1) coupling and O(n) coupling. We can find duplication in both categories. </p> <p> In the O(1) case, duplication is somehow limited. It may be that you are following the rule of three. This allows two copies of a piece of code to exist. It may be that you've made a particular architectural decision, such as using Fake Objects for testing (triplication), or using layered architecture (sextuplication). In these cases, there's a fixed number of edits that you have to make, and in principle, you should know where to make them. </p> <p> I tend to be less concerned about this kind of coupling because it's manageable. In many cases, you may be able to lean on the compiler to guide you through the task of making a change. In other cases, you could have a checklist. Consider the above example of layered architecture. A checklist would enumerate the six separate steps you need to perform. Once you've checked off all six, you're done. </p> <p> It may be slow, tedious work, but it's generally safe, because you are unlikely to forget a spot. </p> <p> The O(n) case is where real trouble lies. This is the case when you copy and paste a snippet of code every time you need it somewhere new. When, later, you discover that there's a bug in the original 'source', you need to find all the places it occurs. Typical copy-paste code is often slightly modified after paste, so a naive search-and-replace strategy is likely to miss some instances. </p> <p> Of course, if you've copied a whole method, function, class, or module, you may still be able to find it by name, but if you've only copied an unnamed block of code, that will not work either. </p> <h3 id="9c636a02c64d4c14bb54a0425fc52f0c"> Not all edits are equally difficult <a href="#9c636a02c64d4c14bb54a0425fc52f0c">#</a> </h3> <p> To be fair, we should acknowledge that not all edit are equally difficult. There are kinds of changes you can automate. Most modern code editors come with refactoring support. In the case of testing with dynamic mocks, for example, you can rename methods, rearrange parameter lists, or remove a parameter. </p> <p> Even so, some edits are harder. Changing the return type of a method tends to break calling code in most <a href="/2019/12/16/zone-of-ceremony">high-ceremony languages</a>. Likewise, changing a primitive parameter (an integer, a Boolean, a string) to a complex object is non-trivial, as is adding a parameter with no obvious good default value. This is when O(n) coupling hurts. </p> <h3 id="d5b66e9ae9d1444e930554768d276e76"> Limitations <a href="#d5b66e9ae9d1444e930554768d276e76">#</a> </h3> <p> So far, we've considered O(1) and O(n) edits. Are there O(lg n) edits, O(n<sup>2</sup>), or even O(2<sup>n</sup>) edits? </p> <p> I can't rule it out, and if the reader can furnish some convincing examples, I'd be keen to learn about them. To be honest, though, I'm not sure it's that helpful. One could perhaps construe an example where inheritance creates a quadratic growth of subclasses, because someone is trying to model two independent features in a single inheritance tree. This, however, is just bad design, and we don't need the big-O lens to tell us that. </p> <h3 id="93726a996b024da48ae4ae861b52bc49"> Conclusion <a href="#93726a996b024da48ae4ae861b52bc49">#</a> </h3> <p> As a thought experiment, one may adopt big-O notation as a viewpoint on code organisation. This seems particularly valuable when distinguishing between benign and malignant duplication. Duplication usually entails coupling. For a 'code architect', one of the most important tasks is to reduce, or at least control, coupling. </p> <p> Some coupling is of the order O(1). Hidden in this notation is a constant, which may indicate that a change can be made with a single edit, two edits, six edits, and so on. Even if the actual number is 'large', you can put tools in place to minimize risk: A simple checklist may be enough, or perhaps you can leverage a static type system. </p> <p> Other coupling is of the order O(n). Here, a single change must be made in O(n) different places, where <em>n</em> tends to grow over time, and there's no clear way to systematically find and identify them all. This kind of coupling strikes me as more dangerous than O(1) coupling. </p> <p> When I sometimes seem to have a cavalier attitude to duplication, it's likely because I've already subconsciously identified a particular duplication as of the order O(1). </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2026/01/05/coupling-from-a-big-o-perspective Git integration is ten years away https://blog.ploeh.dk/2025/12/29/git-integration-is-ten-years-away/ Mon, 29 Dec 2025 09:03:00 UTC <div id="post"> <p> <em>We'll get commercial nuclear fusion earlier.</em> </p> <p> Although, as I've <a href="/2023/07/24/is-software-getting-worse">described earlier</a>, I tend to be conservative about updating my laptop, I tend to make exceptions for <a href="https://visualstudio.microsoft.com/">Visual Studio</a> and <a href="https://code.visualstudio.com/">Visual Studio Code</a>. I was recently perusing the "what's new" notes after updating one or the other, and among all the new AI capabilities that I'm not interested in, I noticed something else: 'improved Git integration.' </p> <p> As I reflected on that, a thought occurred to me. It seems to me that I've seen these update notes for at least a decade. Improved Git integration. </p> <p> I'm not even exaggerating. Git support for Visual Studio was <a href="https://www.hanselman.com/blog/git-support-for-visual-studio-git-tfs-and-vs-put-into-context">announced in 2013</a>. It has, indeed, been around for a long time, and I've been blissfully ignoring it throughout. Even so, it struck me when reading release notes in 2025, that the product in question had improved Git integration. </p> <p> Is it not done yet? </p> <p> Apparently not. </p> <p> It wasn't done ten years ago? Is there any reason to believe that it's done now? Or are we witnessing some reverse <a href="https://en.wikipedia.org/wiki/Lindy_effect">Lindy effect</a>? The longer something has been in development, the longer you may expect it to be in development yet? </p> <p> Sarcasm aside, you don't need Git integration in your development environment. Do yourself a favour and learn the <a href="/2024/05/20/fundamentals">fundamentals</a> of Git. It takes a few hours to learn the basics, a few days to become more comfortable with it, but from then, no 'integration' need hold you back. You don't have to wait for the next update. <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">Use Git tactically</a> today. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/12/29/git-integration-is-ten-years-away Test-specific Eq https://blog.ploeh.dk/2025/12/22/test-specific-eq/ Mon, 22 Dec 2025 08:15:00 UTC <div id="post"> <p> <em>Adding Eq instances for better assertions.</em> </p> <p> Most well-written unit tests follow some variation of the <a href="https://xp123.com/3a-arrange-act-assert/">Arrange Act Assert</a> pattern. In the Assert phase, you may write a sequence of assertions that verify different aspects of what 'success' means. Even so, it boils down to this: You check that the <em>expected</em> outcome is equal to the <em>actual</em> outcome. Some testing frameworks like to turn the order around, but the idea remains the same. After all, <a href="https://en.wikipedia.org/wiki/Symmetric_relation">equality is symmetric</a>. </p> <p> The ideal assertion is one that simply checks that <em>actual is equal to expected</em>. Some languages allow custom infix operators, in which case it's natural to define this fundamental assertion as an operator, such as <a href="https://hackage-content.haskell.org/package/tasty-hunit/docs/Test-Tasty-HUnit.html#v:-64--63--61-">@?=</a>. </p> <p> Since this is <a href="https://www.haskell.org/">Haskell</a>, however, the <code>@?=</code> operator comes with type constraints. Specifically, what we compare must be an <code>Eq</code> instance. In other words, the type in question must support the <code>==</code> operator. What do you do when a type is no <code>Eq</code> instance? </p> <h3 id="cd497aa35e354cddb387a4da5c0d65b2"> No Eq <a href="#cd497aa35e354cddb387a4da5c0d65b2">#</a> </h3> <p> In a recent article you saw how <a href="/2025/12/15/tautological-assertions-are-not-always-caused-by-aliasing">a complicated test induced a tautological assertion</a>. The main reason that the test was complicated was that the values involved were not <code>Eq</code> instances. </p> <p> This got me thinking: Might <a href="http://xunitpatterns.com/test-specific%20equality.html">test-specific equality</a> help? </p> <p> The easiest way to find out is to try. In this article, you'll see how that experiment turns out. First, however, you need a quick introduction to the problem space. The task at hand was to implement a <a href="https://en.wikipedia.org/wiki/Cellular_automaton">cellular automaton</a>, ostensibly modelling <a href="https://en.wikipedia.org/wiki/Darwin%27s_finches">Galápagos finches</a> meeting. When two finches encounter each other, they play out a game of <a href="https://en.wikipedia.org/wiki/Prisoner%27s_dilemma">Prisoner's Dilemma</a> according to a strategy implemented in a domain-specific language. </p> <p> Specifically, a finch is modelled like this: </p> <p> <pre><span style="color:blue;">data</span>&nbsp;Finch&nbsp;=&nbsp;Finch &nbsp;&nbsp;{&nbsp;finchID&nbsp;::&nbsp;Int, &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">finchHP</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">HP</span>, &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">finchRoundsLeft</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Rounds</span>, &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">--&nbsp;The&nbsp;colour&nbsp;is&nbsp;used&nbsp;for&nbsp;visualisation,&nbsp;but&nbsp;has&nbsp;no&nbsp;semantic&nbsp;significance. </span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">finchColour</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Colour</span>, &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">--&nbsp;The&nbsp;current&nbsp;strategy. </span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">finchStrategy</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Strategy</span>, &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">--&nbsp;The&nbsp;expression&nbsp;that&nbsp;is&nbsp;evaluated&nbsp;to&nbsp;produce&nbsp;the&nbsp;strategy. </span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">finchStrategyExp</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Exp</span> &nbsp;&nbsp;}</pre> </p> <p> The <code>Finch</code> data type is not an <code>Eq</code> instance. The reason is that <code>Strategy</code> is effectively a free monad over this functor: </p> <p> <pre><span style="color:blue;">data</span>&nbsp;EvalOp&nbsp;a &nbsp;&nbsp;=&nbsp;ErrorOp&nbsp;Error &nbsp;&nbsp;|&nbsp;MeetOp&nbsp;(FinchID&nbsp;-&gt;&nbsp;a) &nbsp;&nbsp;|&nbsp;GroomOp&nbsp;(Bool&nbsp;-&gt;&nbsp;a) &nbsp;&nbsp;|&nbsp;IgnoreOp&nbsp;(Bool&nbsp;-&gt;&nbsp;a)</pre> </p> <p> Since <code>EvalOp</code> is a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum</a> of functions, it can't be an <code>Eq</code> instance, and this then applies transitively to <code>Finch</code>, as well as the <code>CellState</code> container that keeps track of each cell in the cellular grid: </p> <p> <pre><span style="color:blue;">data</span>&nbsp;CellState&nbsp;=&nbsp;CellState &nbsp;&nbsp;{&nbsp;cellFinch&nbsp;::&nbsp;Maybe&nbsp;Finch, &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">cellRNG</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">StdGen</span> &nbsp;&nbsp;}</pre> </p> <p> An important part of working with this particular code base is that the API is given, and must not be changed. </p> <p> Given these constraints and data types, is there a way to improve test assertions? </p> <h3 id="3b73b7b375c24f40ac22d6274d806a60"> Smelly tests <a href="#3b73b7b375c24f40ac22d6274d806a60">#</a> </h3> <p> The lack of <code>Eq</code> instances makes it difficult to write simple assertions. The worst test I wrote is probably this, making use of a predefined example <code>Finch</code> value named <code>flipflop</code>: </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Cell&nbsp;1&nbsp;reproduces&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;flipflop)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;Nothing&nbsp;(mkStdGen&nbsp;6)&nbsp;<span style="color:green;">--&nbsp;seeded&nbsp;to&nbsp;reprod </span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.reproduce&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">--&nbsp;Sanity&nbsp;check&nbsp;on&nbsp;first&nbsp;finch.&nbsp;Unfortunately,&nbsp;CellState&nbsp;is&nbsp;no&nbsp;Eq </span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">--&nbsp;instance,&nbsp;so&nbsp;we&nbsp;can&#39;t&nbsp;just&nbsp;compare&nbsp;the&nbsp;entire&nbsp;record.&nbsp;Instead, </span>&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">--&nbsp;using&nbsp;HP&nbsp;as&nbsp;a&nbsp;sample: </span>&nbsp;&nbsp;&nbsp;&nbsp;(Galapagos.finchHP&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;(<span style="color:blue;">fst</span>&nbsp;actual))&nbsp;@?=&nbsp;Just&nbsp;20 &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">--&nbsp;New&nbsp;finch&nbsp;should&nbsp;have&nbsp;HP&nbsp;from&nbsp;params: </span>&nbsp;&nbsp;&nbsp;&nbsp;(Galapagos.finchHP&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;(<span style="color:blue;">snd</span>&nbsp;actual))&nbsp;@?=&nbsp;Just&nbsp;14 &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">--&nbsp;New&nbsp;finch&nbsp;should&nbsp;have&nbsp;lifespan&nbsp;from&nbsp;params: </span>&nbsp;&nbsp;&nbsp;&nbsp;(Galapagos.finchRoundsLeft&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;(<span style="color:blue;">snd</span>&nbsp;actual))&nbsp;@?= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Just&nbsp;23 &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">--&nbsp;New&nbsp;finch&nbsp;should&nbsp;have&nbsp;same&nbsp;colour&nbsp;as&nbsp;parent: </span>&nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;Galapagos.finchColour&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;(<span style="color:blue;">snd</span>&nbsp;actual))&nbsp;@?= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Galapagos.finchColour&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;cell1 &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">--&nbsp;More&nbsp;assertions,&nbsp;described&nbsp;by&nbsp;their&nbsp;error&nbsp;messages: </span>&nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;(Galapagos.finchID&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;(<span style="color:blue;">fst</span>&nbsp;actual))&nbsp;/= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(Galapagos.finchID&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;(<span style="color:blue;">snd</span>&nbsp;actual)))&nbsp;@? &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#a31515;">&quot;Finches&nbsp;have&nbsp;same&nbsp;ID,&nbsp;but&nbsp;they&nbsp;should&nbsp;be&nbsp;different.&quot;</span> &nbsp;&nbsp;&nbsp;&nbsp;(<span style="color:#2b91af;">(/=)</span>&nbsp;`on`&nbsp;Galapagos.cellRNG)&nbsp;cell2&nbsp;(<span style="color:blue;">snd</span>&nbsp;actual)&nbsp;@? &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#a31515;">&quot;New&nbsp;cell&nbsp;2&nbsp;should&nbsp;have&nbsp;an&nbsp;updated&nbsp;RNG.&quot;</span></pre> </p> <p> As you can tell from the <a href="http://butunclebob.com/ArticleS.TimOttinger.ApologizeIncode">apologies</a> all these assertions leave something to be desired. The first assertion uses <code>finchHP</code> as a proxy for the entire finch in <code>cell1</code>, which is not supposed to change. Instead of an assertion for each of the first finch's attributes, the test 'hopes' that if <code>finchHP</code> didn't change, then so didn't the other values. </p> <p> The test then proceeds to verify various fields of the new finch in <code>cell2</code>, checking them one by one, since the lack of <code>Eq</code> makes it impossible to simply check that the actual value is equal to the expected value. </p> <p> In comparison, the test you saw in the previous article is almost pretty. It uses another example <code>Finch</code> value named <code>cheater</code>. </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Cell&nbsp;1&nbsp;does&nbsp;not&nbsp;reproduce&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;cheater)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;Nothing&nbsp;(mkStdGen&nbsp;1)&nbsp;<span style="color:green;">--&nbsp;seeded:&nbsp;no&nbsp;repr. </span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.reproduce&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">--&nbsp;Sanity&nbsp;check&nbsp;that&nbsp;cell&nbsp;1&nbsp;remains,&nbsp;sampling&nbsp;on&nbsp;strategy: </span>&nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;Galapagos.finchStrategyExp&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;(<span style="color:blue;">fst</span>&nbsp;actual))&nbsp;@?= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Galapagos.finchStrategyExp&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;cell1 &nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;Galapagos.finchHP&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;(<span style="color:blue;">snd</span>&nbsp;actual))&nbsp;@?=&nbsp;Nothing</pre> </p> <p> The apparent simplicity is mostly because at that time, I'd almost given up on more thorough testing. In this test, I chose <code>finchStrategyExp</code> as a proxy for each value, and 'hoped' that if these properties behaved as expected, other attributes would, too. </p> <p> Given that I was following test-driven development and thus engaging in <a href="/2025/09/15/greyscale-box-test-driven-development">grey-box testing</a>, I had reason to believe that the implementation was correct if the test passes. </p> <p> Still, those tests exhibit more than one code smell. Could test-specific equality be the answer? </p> <h3 id="0d24dbb0a6c144ff9cdbf7b146f2329a"> Test utilities for finches <a href="#0d24dbb0a6c144ff9cdbf7b146f2329a">#</a> </h3> <p> The fundamental problem is that the <code>finchStrategy</code> field prevents <code>Finch</code> from being an <code>Eq</code> instance. Finding a way to compare <code>Strategy</code> values seems impractical. A more realistic course of action might be to compare all other fields. One option is to introduce a test-specific type with proper <code>Eq</code> and <code>Show</code> instances. </p> <p> <pre><span style="color:blue;">data</span>&nbsp;FinchEq&nbsp;=&nbsp;FinchEq &nbsp;&nbsp;{&nbsp;feqID&nbsp;::&nbsp;Int &nbsp;&nbsp;,&nbsp;feqHP&nbsp;::&nbsp;Galapagos.HP &nbsp;&nbsp;,&nbsp;feqRoundsLeft&nbsp;::&nbsp;Galapagos.Rounds &nbsp;&nbsp;,&nbsp;feqColour&nbsp;::&nbsp;Galapagos.Colour &nbsp;&nbsp;,&nbsp;feqStrategyExp&nbsp;::&nbsp;Exp&nbsp;} &nbsp;&nbsp;<span style="color:blue;">deriving</span>&nbsp;(<span style="color:#2b91af;">Eq</span>,&nbsp;<span style="color:#2b91af;">Show</span>)</pre> </p> <p> This data type only exists in the test code base. It has all the fields of <code>Finch</code>, except <code>finchStrategy</code>. </p> <p> While I could use it as-is, it quickly turns out that a helper function to turn a <code>CellState</code> value into a <code>FinchEq</code> value would also be useful. </p> <p> <pre><span style="color:#2b91af;">finchEq</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Galapagos</span>.<span style="color:blue;">Finch</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:blue;">FinchEq</span> finchEq&nbsp;f&nbsp;=&nbsp;FinchEq &nbsp;&nbsp;{&nbsp;feqID&nbsp;=&nbsp;Galapagos.finchID&nbsp;f &nbsp;&nbsp;,&nbsp;feqHP&nbsp;=&nbsp;Galapagos.finchHP&nbsp;f &nbsp;&nbsp;,&nbsp;feqRoundsLeft&nbsp;=&nbsp;Galapagos.finchRoundsLeft&nbsp;f &nbsp;&nbsp;,&nbsp;feqColour&nbsp;=&nbsp;Galapagos.finchColour&nbsp;f &nbsp;&nbsp;,&nbsp;feqStrategyExp&nbsp;=&nbsp;Galapagos.finchStrategyExp&nbsp;f &nbsp;&nbsp;} <span style="color:#2b91af;">cellFinchEq</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Galapagos</span>.<span style="color:blue;">CellState</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">Maybe</span>&nbsp;<span style="color:blue;">FinchEq</span> cellFinchEq&nbsp;=&nbsp;<span style="color:blue;">fmap</span>&nbsp;finchEq&nbsp;.&nbsp;Galapagos.cellFinch</pre> </p> <p> Finally, the System Under Test (the <code>reproduce</code> function) takes a tuple as input, and returns a tuple of the same type as output. To avoid some code duplication, it's practical to introduce a data type that can map over both components. </p> <p> <pre><span style="color:blue;">newtype</span>&nbsp;Pair&nbsp;a&nbsp;=&nbsp;Pair&nbsp;(a,&nbsp;a) &nbsp;&nbsp;<span style="color:blue;">deriving</span>&nbsp;(<span style="color:#2b91af;">Eq</span>,&nbsp;<span style="color:#2b91af;">Show</span>,&nbsp;<span style="color:#2b91af;">Functor</span>)</pre> </p> <p> This <code>newtype</code> wrapper makes it possible to map both the first and the second component of a pair (a two-tuple) using a single projection, since <code>Pair</code> is a <code>Functor</code> instance. </p> <p> That's all the machinery required to rewrite the two tests shown above. </p> <h3 id="3aed5cd44da14648bff6b7113018ff2f"> Improving the first test <a href="#3aed5cd44da14648bff6b7113018ff2f">#</a> </h3> <p> The first test may be rewritten as this: </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Cell&nbsp;1&nbsp;reproduces&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;flipflop)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;Nothing&nbsp;(mkStdGen&nbsp;6)&nbsp;<span style="color:green;">--&nbsp;seeded&nbsp;to&nbsp;reprod </span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.reproduce&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;=&nbsp;Just&nbsp;$&nbsp;finchEq&nbsp;$&nbsp;flipflop &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;Galapagos.finchID&nbsp;=&nbsp;-1142203427417426925&nbsp;<span style="color:green;">--&nbsp;From&nbsp;Character.&nbsp;Test </span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,&nbsp;Galapagos.finchHP&nbsp;=&nbsp;14 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,&nbsp;Galapagos.finchRoundsLeft&nbsp;=&nbsp;23 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;(cellFinchEq&nbsp;&lt;$&gt;&nbsp;Pair&nbsp;actual)&nbsp;@?=&nbsp;Pair&nbsp;(cellFinchEq&nbsp;cell1,&nbsp;expected) &nbsp;&nbsp;&nbsp;&nbsp;(<span style="color:#2b91af;">(/=)</span>&nbsp;`on`&nbsp;Galapagos.cellRNG)&nbsp;cell2&nbsp;(<span style="color:blue;">snd</span>&nbsp;actual)&nbsp;@? &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#a31515;">&quot;New&nbsp;cell&nbsp;2&nbsp;should&nbsp;have&nbsp;an&nbsp;updated&nbsp;RNG.&quot;</span></pre> </p> <p> That's still a bit of code. If you're used to C# or <a href="https://www.java.com">Java</a> code, you may not bat an eyelid over a fifteen-line code block (that even has a few <a href="/2013/06/24/a-heuristic-for-formatting-code-according-to-the-aaa-pattern">blank lines</a>), but fifteen lines of Haskell code is still significant. </p> <p> There are compound reasons for this. One is that the <code>Galapagos</code> module is a qualified import, which makes the code more verbose than it otherwise could have been. It doesn't help that I follow a strict rule of <a href="/2019/11/04/the-80-24-rule">staying within an 80-character line width</a>. </p> <p> That said, this version of the test has stronger assertions than before. Notice that the first assertion compares two <code>Pair</code>s of <code>FinchEq</code> values. This means that all five comparable fields of each finch is compared against the expected value. Since the assertion compares two <code>Pair</code>s, that's ten comparisons in all. The previous test only made five comparisons on the finches. </p> <p> The second assertion remains as before. It's there to ensure that the System Under Test (SUT) remembers to update its pseudo-random number generator. </p> <p> Perhaps you wonder about the expected values. For the <code>finchID</code>, hopefully the comment gives a hint. I originally set this value to <code>0</code>, ran the test, observed the actual value, and used what I had observed. I could do that because I was refactoring an existing test that exercised an existing SUT, following the rules of <a href="/2025/11/03/empirical-characterization-testing">empirical Characterization Testing</a>. </p> <p> The <code>finchID</code> values are in practice randomly generated numbers. These are notoriously awkward in test contexts, so I could also have excluded that field from <code>FinchEq</code>. Even so, I kept the field, because it's important to be able to verify that the new finch has a different <code>finchID</code> than the parent that begat it. </p> <h3 id="46f55878cf9d4b4a9a6c02c11986fd9c"> Derived values <a href="#46f55878cf9d4b4a9a6c02c11986fd9c">#</a> </h3> <p> Where do the magic constants <code>14</code> and <code>23</code> come from? Although we could use comments to explain their source, another option is to use <a href="http://xunitpatterns.com/Derived%20Value.html">Derived Values</a> to explicitly document their origin: </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Cell&nbsp;1&nbsp;reproduces&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;flipflop)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;Nothing&nbsp;(mkStdGen&nbsp;6)&nbsp;<span style="color:green;">--&nbsp;seeded&nbsp;to&nbsp;reprod </span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.reproduce&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;=&nbsp;Just&nbsp;$&nbsp;finchEq&nbsp;$&nbsp;flipflop &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;Galapagos.finchID&nbsp;=&nbsp;-1142203427417426925&nbsp;<span style="color:green;">--&nbsp;From&nbsp;Character.&nbsp;Test </span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,&nbsp;Galapagos.finchHP&nbsp;=&nbsp;Galapagos.startHP&nbsp;Galapagos.defaultParams &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,&nbsp;Galapagos.finchRoundsLeft&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Galapagos.lifespan&nbsp;Galapagos.defaultParams &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;(cellFinchEq&nbsp;&lt;$&gt;&nbsp;Pair&nbsp;actual)&nbsp;@?=&nbsp;Pair&nbsp;(cellFinchEq&nbsp;cell1,&nbsp;expected) &nbsp;&nbsp;&nbsp;&nbsp;(<span style="color:#2b91af;">(/=)</span>&nbsp;`on`&nbsp;Galapagos.cellRNG)&nbsp;cell2&nbsp;(<span style="color:blue;">snd</span>&nbsp;actual)&nbsp;@? &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#a31515;">&quot;New&nbsp;cell&nbsp;2&nbsp;should&nbsp;have&nbsp;an&nbsp;updated&nbsp;RNG.&quot;</span></pre> </p> <p> We now learn that the <code>finchHP</code> value originates from the <code>startHP</code> value of the <code>defaultParams</code>, and similarly for <code>finchRoundsLeft</code>. </p> <p> To be honest, I'm not sure that this is an improvement. It makes the test more abstract, and if we wish that tests may serve as executable documentation, concrete example values may be easier to understand. Besides, this gets uncomfortably close to duplicating the actual implementation code contained in the SUT. </p> <p> This variation only serves as an exploration of alternatives. I would strongly consider rolling this change back, and instead add some comments to the magic numbers. </p> <h3 id="21b8d21e41c14aa39050ee49a72e0b11"> Improving the second test <a href="#21b8d21e41c14aa39050ee49a72e0b11">#</a> </h3> <p> The second test improves better. </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Cell&nbsp;1&nbsp;does&nbsp;not&nbsp;reproduce&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;cheater)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;Nothing&nbsp;(mkStdGen&nbsp;1)&nbsp;<span style="color:green;">--&nbsp;seeded:&nbsp;no&nbsp;repr. </span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.reproduce&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;(cellFinchEq&nbsp;&lt;$&gt;&nbsp;Pair&nbsp;actual)&nbsp;@?=&nbsp;Pair&nbsp;(cellFinchEq&nbsp;cell1,&nbsp;Nothing)</pre> </p> <p> Not only is it shorter, the assertion is much stronger. It achieves the ideal of verifying that the actual value is equal to the expected value, comparing five data fields on each of the two finches. </p> <h3 id="18c1f690d06d4722a99c0e18e2c66bf2"> Comparing cells <a href="#18c1f690d06d4722a99c0e18e2c66bf2">#</a> </h3> <p> The <code>reproduce</code> function uses the pseudo-random number generators embedded in the <code>CellState</code> data type to decide whether a finch reproduces in a given round. Thus, the number generators change in deterministic, but by human cognition unpredictable, ways. It makes sense to exclude the generators from the assertions, apart from the above assertion that verifies the change itself. </p> <p> Other functions in the <code>Galapagos</code> module also work on <code>CellState</code> values, but are entirely deterministic; that is, they don't make use of the pseudo-random number generators. One such function is <code>groom</code>, which models what happens when two finches meet and play out their game of Prisoner's Dilemma by deciding to groom the other for parasites, or not. The function has this type: </p> <p> <pre><span style="color:#2b91af;">groom</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Params</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;(<span style="color:blue;">CellState</span>,&nbsp;<span style="color:blue;">CellState</span>)&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;(<span style="color:blue;">CellState</span>,&nbsp;<span style="color:blue;">CellState</span>)</pre> </p> <p> By specification, this function has no random behaviour, which means that we expect the number generators to stay the same. Even so, due to the lack of an <code>Eq</code> instance, comparing cells is difficult. </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Groom&nbsp;when&nbsp;right&nbsp;cell&nbsp;is&nbsp;empty&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;flipflop)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;Nothing&nbsp;(mkStdGen&nbsp;1) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.groom&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;Galapagos.finchHP&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;(<span style="color:blue;">fst</span>&nbsp;actual))&nbsp;@?= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Galapagos.finchHP&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;cell1 &nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;Galapagos.finchHP&nbsp;&lt;$&gt;&nbsp;Galapagos.cellFinch&nbsp;(<span style="color:blue;">snd</span>&nbsp;actual))&nbsp;@?=&nbsp;Nothing</pre> </p> <p> Instead of comparing cells, this test only considers the contents of each cell, and it only compares a single field, <code>finchHP</code>, as a proxy for comparing the more complete data structure. </p> <p> With <code>FinchEq</code> we have a better way of comparing two finches, but we don't have to stop there. We may introduce another test-utility type that can compare cells. </p> <p> <pre><span style="color:blue;">data</span>&nbsp;CellStateEq&nbsp;=&nbsp;CellStateEq &nbsp;&nbsp;{&nbsp;cseqFinch&nbsp;::&nbsp;Maybe&nbsp;FinchEq &nbsp;&nbsp;,&nbsp;cseqRNG&nbsp;::&nbsp;StdGen &nbsp;&nbsp;} &nbsp;&nbsp;<span style="color:blue;">deriving</span>&nbsp;(<span style="color:#2b91af;">Eq</span>,&nbsp;<span style="color:#2b91af;">Show</span>)</pre> </p> <p> A helper function also turns out to be useful. </p> <p> <pre><span style="color:#2b91af;">cellStateEq</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Galapagos</span>.<span style="color:blue;">CellState</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:blue;">CellStateEq</span> cellStateEq&nbsp;cs&nbsp;=&nbsp;CellStateEq &nbsp;&nbsp;{&nbsp;cseqFinch&nbsp;=&nbsp;cellFinchEq&nbsp;cs &nbsp;&nbsp;,&nbsp;cseqRNG&nbsp;=&nbsp;Galapagos.cellRNG&nbsp;cs &nbsp;&nbsp;}</pre> </p> <p> We can now rewrite the test to compare both cells in their entirety (minus the <code>finchStrategy</code>). </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Groom&nbsp;when&nbsp;right&nbsp;cell&nbsp;is&nbsp;empty&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;flipflop)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;Nothing&nbsp;(mkStdGen&nbsp;1) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.groom&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;(cellStateEq&nbsp;&lt;$&gt;&nbsp;Pair&nbsp;actual)&nbsp;@?=&nbsp;cellStateEq&nbsp;&lt;$&gt;&nbsp;Pair&nbsp;(cell1,&nbsp;cell2)</pre> </p> <p> Again, the test is both simpler and stronger. </p> <h3 id="13df780b2ce74b40b198e8b05fc0f721"> A fly in the ointment <a href="#13df780b2ce74b40b198e8b05fc0f721">#</a> </h3> <p> Introducing <code>FinchEq</code> and <code>CellStateEq</code> allowed me to improve most of the tests, but a few annoying issues remain. The most illustrative example is this test of the core <code>groom</code> behaviour, which lets two example <code>Finch</code> values named <code>samaritan</code> and <code>cheater</code> interact. </p> <p> <pre>testCase&nbsp;<span style="color:#a31515;">&quot;Groom&nbsp;two&nbsp;finches&quot;</span>&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;cell1&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;samaritan)&nbsp;(mkStdGen&nbsp;0) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cell2&nbsp;=&nbsp;Galapagos.CellState&nbsp;(Just&nbsp;cheater)&nbsp;(mkStdGen&nbsp;1) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;actual&nbsp;=&nbsp;Galapagos.groom&nbsp;Galapagos.defaultParams&nbsp;(cell1,&nbsp;cell2) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;expected&nbsp;=&nbsp;Just&nbsp;&lt;$&gt;&nbsp;Pair &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(&nbsp;finchEq&nbsp;$&nbsp;samaritan&nbsp;{&nbsp;Galapagos.finchHP&nbsp;=&nbsp;16&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,&nbsp;finchEq&nbsp;$&nbsp;cheater&nbsp;{&nbsp;Galapagos.finchHP&nbsp;=&nbsp;13&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;) &nbsp;&nbsp;<span style="color:blue;">in</span>&nbsp;(cellFinchEq&nbsp;&lt;$&gt;&nbsp;Pair&nbsp;actual)&nbsp;@?=&nbsp;expected</pre> </p> <p> This test ought to compare cells with <code>CellStateEq</code>, but only compares finches. The practical reason is that defining the <code>expected</code> value as a pair of cells entails embedding the expected finches in their respective cells. This is possible, but awkward, due to the nested nature of the data types. </p> <p> It's possible to do something about that, too, but that's the topic for <a href="/2026/02/09/simplifying-assertions-with-lenses">another article</a>. </p> <h3 id="f6ff371e3106437195a3debe12c930a5"> Conclusion <a href="#f6ff371e3106437195a3debe12c930a5">#</a> </h3> <p> If a test is difficult to write, it may be a symptom that the System Under Test (SUT) has an API which is difficult to use. When doing test-driven development you may want to reconsider the API. Is there a way to model the desired data and behaviour in such a way that the tests become simpler? If so, the API may improve in general. </p> <p> Sometimes, however, you can't change the SUT API. Perhaps it's already given. Perhaps improving it would be a breaking change. Or perhaps you simply can't think of a better way. </p> <p> An alternative to changing the SUT API is to introduce test utilities, such as types with test-specific equality. This is hardly better than improving the SUT API, but may be useful in those situations where the best option is unavailable. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/12/22/test-specific-eq