Pytest is fast

One major attraction of Python. A recent realization.

Ever since I became aware of the distinction between statically and dynamically typed languages, I've struggled to understand the attraction of dynamically typed languages. As regular readers may have noticed, this is a bias that doesn't sit well with me. Clearly, there are advantages to dynamic languages that I fail to notice. Is it a question of mindset? Or is it a combination of several small advantages?

In this article, I'll discuss another potential benefit of at least one dynamically typed language, Python.

Fast feedback #

Rapid feedback is a cornerstone of modern software engineering. I've always considered the feedback from the compiler an important mechanism, but I've recently begun to realize that it comes with a price. While a good type system keeps you honest, compilation takes time, too.

Since I've been so entrenched in the camp of statically typed languages (C#, F#, Haskell), I've tended to regard compilation as a mandatory step. And since the compiler needs to run anyway, you might as well take advantage of it. Use the type system to make illegal states unrepresentable, and all that.

Even so, I've noticed that compilation time isn't a fixed size. This observation surely borders on the banal, but with sufficient cognitive bias, it can, apparently, take years to come to even such a trivial conclusion. After initial years with various programming languages, my formative years as a programmer were spent with C#. As it turns out, the C# compiler is relatively fast.

This is probably a combination of factors. Since C# is a one of the most popular languages, it has a big and skilled engineering team, and it's my impression that much effort goes into making it as fast and efficient as possible.

I also speculate that, since the C# type system isn't as powerful as F#'s or Haskell's, there's simply work that it can't do. When you can't expression certain constraints or relationships with the type system, the compiler can't check them, either.

That said, the C# compiler seems to have become slower over the years. This could be a consequence of all the extra language features that accumulate.

The F# compiler, in comparison, has always taken longer than the C# compiler. Again, this may be due to a combination of a smaller engineering team and that it actually can check more things at compile time, since the type system is more expressive.

This, at least, seems to fit with the observation that the Haskell compiler is even slower than F#. The language is incredibly expressive. There's a lot of constraints and relationships that you can model with the type system. Clearly, the compiler has to perform extra work to check that your types line up.

You're often left with the impression that if it compiles, it works. The drawback is that getting Haskell code to compile may be a non-trivial undertaking.

One thing is that you'll have to wait for the compiler. Another is that if you practice test-driven development (TDD), you'll have to compile the test code, too. Only once the tests are compiled can you run them. And TDD test suites should run in 10 seconds or less.

Skipping compilation with pytest #

A few years ago I had to learn a bit of Python, so I decided to try Advent of Code 2022 in Python. As the puzzles got harder, I added unit tests with pytest. When I ran them, I was taken aback at how fast they ran.

There's no compilation step, so the test suite runs immediately. Obviously, if you've made a mistake that a compiler would have caught, the test fails, but if the code makes sense to the interpreter, it just runs.

For various reasons, I ran out of steam, as one does with Advent of Code, but I managed to write a good little test suite. Until day 17, it ran in 0.15-0.20 seconds on my little laptop. To be honest, though, once I added tests for day 17, feedback time jumped to just under two seconds. This is clearly because I'd written some inefficient code for my System Under Test.

I can't really blame a test framework for being slow, when it's really my own code that slows it down.

A counter-argument is that a compiled language is much faster than an interpreted one. Thus, one might think that a faster language would counter poor implementations. Not so.

TDD with Haskell #

As I've already outlined, the Haskell compiler takes more time than C#, and obviously it takes more time than a language that isn't compiled at all. On the other hand, Haskell compiles to native machine code. My experience with it is that once you've compiled your program, it's fast.

In order to compare the two styles, I decided to record compilation and test times while doing Advent of Code 2024 in Haskell. I set up a Haskell code base with Stack and HUnit, as I usually do. As I worked through the puzzles, I'd be adding and running tests. Every time I recorded the time it took, using the time command to measure the time it took for stack test to run.

I've plotted the observations in this chart:

Scatter plot of more than a thousand compile-and-test times, measured in seconds.

The chart shows more than a thousand observations, with the first to the left, and the latest to the right. The times recorded are the entire time it took from I started a test run until I had an answer. For this, I used the time command's real time measurement, rather than user or sys time. What matters is the feedback time; not the CPU time.

Each measurement is in seconds. The dashed orange line indicates the linear trend.

It's not the first time I've written Haskell code, so I knew what to expect. While you get the occasional fast turnaround time, it easily takes around ten seconds to compile even an empty code base. It seems that there's a constant overhead of that size. While there's an upward trend line as I added more and more code, and more tests, actually running the tests takes almost no time. The initial 'average' feedback time was around eight seconds, and 1100 observations later, the trends sits around 11.5 seconds. At this time, I had more than 200 test cases.

You may also notice that the observations vary quite a bit. You occasionally see sub-second times, but also turnaround times over thirty seconds. There's an explanation for both.

The sub-second times usually happen if I run the test suite twice without changing any code. In that case, the Haskell Stack correctly skips recompiling the code and instead just reruns the tests. This only highlights that I'm not waiting for the tests to execute. The tests are fast. It's the compiler that causes over 90% of the delay.

(Why would I rerun a test suite without changing any code? This mostly happens when I take a break from programming, or if I get distracted by another task. In such cases, when I return to the code, I usually run the test suite in order to remind myself of the state in which I left it. Sometimes, it turns out, I'd left the code in a state were the last thing I did was to run all tests.)

The other extremes have a different explanation.

IDE woes #

Why do I have to suffer through those turnaround times over twenty seconds? A few times over thirty?

The short answer is that these represent complete rebuilds. Most of these are caused by problems with the IDE. For Haskell development, I use Visual Studio Code with the Haskell extension.

Perhaps it's only my setup that's messed up, but whenever I change a function in the System Under Test (SUT), I can. not. make. VS Code pick up that the API changed. Even if I correct my tests so that they still compile and run successfully from the command line, VS Code will keep insisting that the code is wrong.

This is, of course, annoying. One of the major selling points of statically type languages is that a good IDE can tell you if you made mistakes. Well, if it operates on an outdated view of what the SUT looks like, this no longer works.

I've tried restarting the Haskell Language Server, but that doesn't work. The only thing that works, as far as I've been able to discover, is to close VS Code, delete .stack-work, recompile, and reopen VS Code. Yes, that takes a minute or two, so not something I like doing too often.

Deleting .stack-work does trigger a full rebuild, which is why we see those long build times.

Striking a good balance #

What bothers me about dynamic languages is that I find discoverability and encapsulation so hard. I can't just look at the type of an operation and deduce what inputs it might take, or what the output might look like.

To be honest, if you give me a plain text file with F# or Haskell, I can't do that either. A static type system doesn't magically surface that kind of information. Instead, you may rely on an IDE to provide such information at your fingertips. The Haskell extension, for example, gives you a little automatic type annotation above your functions, as discussed in the article Pendulum swing: no Haskell type annotation by default, and shown in a figure reprinted here for your convenience:

Screen shot of a Haskell function in Visual Studio Code with the function's type automatically displayed above it by the Haskell extension.

If this is to work well, this information must be immediate and responsive. On my system it isn't.

It may, again, be that there's some problem with my particular tool chain setup. Or perhaps a four-year-old Lenovo X1 Carbon is just too puny a machine to effectively run such a tool.

On the other hand, I don't have similar issues with C# in Visual Studio (not VS Code). When I make changes, the IDE quickly responds and tells me if I've made a mistake. To be honest, even here, I feel that it was faster and more responsive a decade ago, but compared to Haskell programming, the feedback I get with C# is close to immediate.

My experience with F# is somewhere in between. Visual Studio is quicker to pick up changes in F# code than VS Code is to reflect changes in Haskell, but it's not as fast as C#.

With Python, what little IDE integration is available is usually not trustworthy. Essentially, when suggesting callable operations, the IDE is mostly guessing, based on what it's already seen.

But, good Lord! The tests run fast.

Conclusion #

My recent experiences with both Haskell and Python programming is giving me a better understanding of the balances and trade-offs involved with picking a language. While I still favour statically typed languages, I'm beginning to see some attractive qualities on the other side.

Particularly, if you buy the argument that TDD suites should run in 10 seconds or less, this effectively means that I can't do TDD in Haskell. Not with the hardware I'm running. Python, on the other hand, seems eminently well-suited for TDD.

That doesn't sit too well with me, but on the other hand, I'm glad. I've learned about a benefit of a dynamically typed language. While you may consider all of this ordinary and trite, it feels like a small breakthrough to me. I've been trying hard to see past my own limitations, and it finally feels as though I've found a few chinks in the armour of my biases.

I'll keep pushing those envelopes to see what else I may learn.

Comments

Daniel Tartaglia #

An interesting insight, but if you consider that the compiler is effectively an additional test suit that is verifying the types are being used correctly, that extra compilation time is really just a whole suite of tests that you didn't have to write. I can't help but wonder how long it would take to manually implement all the tests that would be required to satisfy those checks in Python, and how much slower the Python test suite would then be.

Like you, I have a strong bias for typesafe languages (or at least moderately typesafe ones). The way I've always explained it is as follows: Developers tend to work faster when writing with dynamic typed languages because they don't have to explain as much to a compiler. This literally means less code to write. However, because the developer hasen't fully explained themself, any follow-on developer does not have as much context to work with.

After all, whether the language requires it or not, the developers need to define and consider types. The only question is, do they have to write it down

2025-01-01 01:26 UTC

Mark Seemann #

Daniel, thank you for writing. I'm well aware that a type checker is a 'first line of defence', and I agree that if we truly had to replicate everything that a type checker does, as tests, it would take a long time. It would take a long time to write all those tests, and it would probably also take a long time to execute them all.

That said, I think that any sane proponent of dynamically typed languages would counter that that's an unreasonable demand. After all, in most cases, it's hardly the case that the code was written by a monkey with a typewriter, but rather by a well-meaning human who did his or her best to write correct code.

In the end, however, it's all a question about context. How important is correctness, after all? Dan North once kindly pointed out to me that in many cases, the software owner doesn't even know what he or she wants. It's only through a series of iterations that we learn what a business system is supposed to do. Until we reach that point, correctness is, at best, a secondary priority. On the other hand, you should really test your outer space proble software.

But you're right. The types are still there, either way.

The last word in this debate are hardly said yet, but you may also find my recent article series Implementation and usage mindsets interesting.

2025-01-07 06:53 UTC

Published: Monday, 30 December 2024 16:01:00 UTC

Pytest is fast by Mark Seemann