FAQ: How should you unit test internals? A: Through the public API.

This question seems to come up repeatedly: I have some internal (package-private in Java) code. How do I unit test it?

The short answer is: you unit test it as you unit test all other code: through the public API of the System Under Test (SUT).

Purpose of automated testing #

Details can be interesting, but don't lose sight of the big picture. Why do you test your software with automated tests?

Automated testing (as opposed to manual testing) only serves a single purpose: it prevents regressions. Some would say that it demonstrates that the software works correctly, but that's inaccurate. Automated tests can only demonstrate that the software works correctly if the tests are written correctly, but that's a different discussion.

Assuming that all automated tests are correct, then yes: automated tests also demonstrate that the software works, but it's still regression testing. The tests were written to demonstrate that the software worked correctly once. Running the tests repeatedly only demonstrates that it still works correctly.

What does it mean that the software works correctly? When it comes to automated testing, the verification is only as good as the tests. If the tests are good, the verification is strong. If the tests are weak, the verification is weak.

Consider the purpose of writing the software you're currently being paid to produce. Is the purpose of the software to pass all tests? Hardly. The purpose of the software is to solve some problem, to make it easier to perform some task, or (if you're writing games) to entertain. The tests are proxies of the actual purpose.

It ought to be evident, then, that automated tests should be aligned with the purpose of the software. There's nothing new in this: Dan North introduced Behaviour Driven Development (BDD) in order to emphasise that testing should be done with the purpose of the software in mind. You should test the behaviour of the software, not its implementation.

Various software can have different purposes. The software typically emphasised by BDD tends to be business software that solves a business problem. Often, these are full applications. Other types of software include reusable libraries. These exist to be reusable. Common to all is that they have a reason to exist: they have externally visible behaviour that some consumer cares about.

If you want to test a piece of software to prevent regressions, you should make sure that you're testing the externally visible behaviour of the software.

Combinatorics #

In an ideal world, then, all automated testing should be done against the public interface of the SUT. If the application is a web application, testing should be done against the HTML and JavaScript. If the application is a mobile app, testing should be done somehow by automating user interaction against its GUI. In reality, these approaches to testing tend to be brittle, so instead, you can resort to subcutaneous testing.

Even if you're developing a reusable library, or a command-line executable, if you're doing something even moderately complex, you run into another problem: a combinatorial explosion of possible paths through the code. As J.B. Rainsberger explains much better than I can do, if you combine software modules (e.g. validation, business logic, data access, authentication and authorisation, caching, logging, etc) you can easily have tens of thousands distinct paths through a particular part of your software - all via a single entry point.

This isn't related to BDD, or business problems, or agile... It's a mathematical result. There's a reason the Test Pyramid looks like it does.

When you combine a procedure with four distinct paths with another with five paths, the number of possible paths isn't (4 + 5 =) nine; it's (4 * 5 =) twenty. As you combine units together, you easily reach tens of thousands distinct paths through your software. (This is basic combinatorics).

You aren't going to write tens of thousands of automated tests.

In the ideal world, we would like to only test the behaviour of software against its public interface. In the real world, we have to separate any moderately complex software into modules, and instead test the behaviour of those modules in isolation. This prevents the combinatorial explosion.

If you cut up your software in an appropriate manner, you'll get modules that are independent of each other. Many books and articles have been written about how to do this. You've probably heard about Layered Application Architecture, Hexagonal Architecture, or Ports and Adapters. The recent interest related to microservices is another (promising) attempt at factoring code into smaller units.

You still need to care about the behaviour of those modules. Splitting up your software doesn't change the overall purpose of the software (whatever that is).

Internals #

When I do code reviews, often the code is already factored into separate concerns, but internal classes are everywhere. What could be the reason for that?

When I ask, answers always fall into one of two categories:

  • Members (or entire classes) are poorly encapsulated, so the developers don't want to expose these internals for fear of destabilising the system.
  • Types or members are kept hidden in order to protect the code base from backwards compatibility issues.
The first issue is easy to deal with. Consider a recent example:

private void CalculateAverage()
    this.average =
            (long)this.durations.Average(ts => ts.Ticks));

This CalculateAverage method is marked private because it's unsafe to call it. this.durations can be null, in which case the method would throw an exception. It may feel like a solution to lock down the method with an access modifier, but really it only smells of poor encapsulation. Instead, refactor the code to a proper, robust design.

There are valid use cases for the private and internal access modifiers, but the majority of the time I see private and internal code, it merely smells of poor design. If you change the design, you could make types and members public, and feel good about it.

The other issue, concerns about compatibility, can be addressed as well. In any case, for most developers, this is essentially a theoretical issue, because most code written isn't for public consumption anyway. If you also control all consumers of your API, you may not need to worry about compatibility. If you need to change the name of a class, just let your favourite refactoring tool do this for you, and all is still good. (I'd still recommend that you should adopt a Ranger approach to your Zoo software, but that's a different topic.)

The bottom line: you don't have to hide most of your code behind access modifiers. There are good alternatives that lead to better designs.

Testing internals #

That was a long detour around reasons for testing, as well as software design in general. How, then, do you test internal code?

Through the public API. Remember: the reason for testing is to verify that the observable behaviour of the SUT is correct. If a class or member is internal, it isn't visible; it's not observable. It may exist in order to support the public API, but if you explicitly chose to not make it visible, you also stated that it can't be directly observable. You can't have it both ways. (Yes, I know that .NET developers will point the [InternalsVisibleTo] attribute out to me, but this attribute isn't a solution; it's part of the problem.)

Why would you test something that has no 'official' existence?

I think I know the answer to that question. It's because of the combinatorial explosion problem. You want the software to solve a particular problem, and to keep everything else 'below the surface'. Unfortunately, as the complexity of the software grows, you realise (explicitly or implicitly) that you can't cover the entire code base with high-level BDD-style tests. You want to unit test the internals.

The problem with doing this is that it's just as brittle as testing through a GUI. Every time you change the internals, you'll have to change your tests.

A well-designed system tends to be more stable when it comes to unit testing. A poorly designed system is often characterised by tests that are too coupled to implementation details. As soon as you decide to change something in such a system, you'll need to change the tests as well.

Recall that automated testing is regression testing. The only information we get from running tests is whether or not existing tests passed or failed. We don't learn if the tests are correct. How do we know that tests are correct? Essentially, we know because we review them, see them fail, and never touch them again. If you constantly have to fiddle with your tests, how do you know they still test the right behaviour?

Design your sub-modules well, on the other hand, and test maintenance tends to be much lower. If you have well-designed sub-modules, though, you don't have to make all types internal. Make them public. They are part of your solution.

Summary #

A system's public API should enable you to exercise and verify its observable behaviour. That's what you should care about, because that's the SUT's reason to exist. It may have internal types and members, but these are implementation details. They exist to support the system's observable behaviour, so test them through the public API.

If you can't get good test coverage of the internal parts through the public API, then why do these internal parts exist? If they exhibit no observable behaviour, couldn't you delete them?

If they do exist for a reason, but you somehow can't reach them through the public API, it's a design smell. Address that smell instead of trying to test internals.


Dear Mark,

Thank you for this excellent article. I got one question: in the "Internals" section, you state that there are valid use cases for the internal access modifier - can you name some of them for me?

I'm also a proponent of keeping nearly all types public in reusable code bases even if most of them are not considered to be part of the API that a client usually consumes - therefore I would only think about marking a class internal if it cannot protect it's invariants properly and fail fast when it's used in the wrong way. But you also mentioned that in this article, too.

When I did a Google search on the topic, I couldn't find any useful answers either. The best one is probably from Eric Lippert on a Stack Overflow question, stating that big important classes that are hard to verify in the development process should be marked internal. But one can easily counter that by not designing code in such a way.

Anyways, it would be very kind of you if you could provide some beneficial use cases for the use of internal.


2015-10-03 08:14 UTC

Kenny, thank you for writing. The answer given by Eric Lippert does, in my opinion, still paint an appropriate picture. There's always a cost to making types or members public, and I don't always wish to pay that cost. The point that I'm trying to make in the present article is that while this cost exists, it's not so high that it justifies unit testing internals via mechanisms like Reflections or [InternalsVisibleTo]. The cost of doing that is higher than making types or members public.

Still, there are many cases where it's possible to cover internal or private code though a public API. Sometimes, I may be in doubt that the way I've modelled the internal code is the best way to do so, and then I'd rather avoid the cost of making it public. After all, making code public means that you're making a promise that it'll be around without breaking changes for a long time.

Much of my code has plenty of private or internal code. An example is Hyprlinkr, which has private helper methods for one of its central features. These private helpers only exist in order to make the feature implementation more readable, but are not meant for public consumption. They're all covered by tests that exercise the various public members of the class.

Likewise, you can consider the private DurationStatistics class from my Temporary Field code smell article. At first, you may want to keep this class private, because you're not sure that it's stable (i.e. that it isn't going to change). Later, you may realise that you'll need that code in other parts of your code base, so you move it to an internal class. If you're still not convinced that it's stable, you may feel better keeping it internal rather than making it public. Ultimately, you may realise that it is, indeed, stable, in which case you may decide to make it public - or you may decide that no one has had the need for it, so it doesn't really matter, after all.

My goal with this article wasn't to advise against internal code, but only to advise against trying to directly call such code for unit testing purposes. In my experience, when people ask how to unit test internal code, it's a problem they have because they haven't used Test-Driven Development (TDD). If you do TDD, you'll have sufficient coverage, and then it doesn't matter how you choose to organise the internals of your code base.

2015-10-03 11:45 UTC

Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.


Tuesday, 22 September 2015 11:56:00 UTC


"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Tuesday, 22 September 2015 11:56:00 UTC