What to log, and how to log it.

When I visit software organisations to help them make their code more maintainable, I often see code like this:

public ILog Log { get; }
 
public ActionResult Post(ReservationDto dto)
{
    Log.Debug($"Entering {nameof(Post)} method...");
    if (!DateTime.TryParse(dto.Date, out var _))
    {
        Log.Warning("Invalid reservation date.");
        return BadRequest($"Invalid date: {dto.Date}.");
    }
 
    Log.Debug("Mapping DTO to Domain Model.");
    Reservation reservation = Mapper.Map(dto);
 
    if (reservation.Date < DateTime.Now)
    {
        Log.Warning("Invalid reservation date.");
        return BadRequest($"Invalid date: {reservation.Date}.");
    }
 
    Log.Debug("Reading existing reservations from database.");
    var reservations = Repository.ReadReservations(reservation.Date);
    bool accepted = maîtreD.CanAccept(reservationsreservation);
    if (!accepted)
    {
        Log.Warning("Not enough capacity");
        return StatusCode(
            StatusCodes.Status500InternalServerError,
            "Couldn't accept.");
    }
 
    Log.Info("Adding reservation to database.");
    Repository.Create(reservation);
    Log.Debug($"Leaving {nameof(Post)} method...");
    return Ok();
}

Logging like this annoys me. It adds avoidable noise to the code, making it harder to read, and thus, more difficult to maintain.

Ideal #

The above code ought to look like this:

public ActionResult Post(ReservationDto dto)
{
    if (!DateTime.TryParse(dto.Date, out var _))
        return BadRequest($"Invalid date: {dto.Date}.");
 
    Reservation reservation = Mapper.Map(dto);
 
    if (reservation.Date < Clock.GetCurrentDateTime())
        return BadRequest($"Invalid date: {reservation.Date}.");
 
    var reservations = Repository.ReadReservations(reservation.Date);
    bool accepted = maîtreD.CanAccept(reservationsreservation);
    if (!accepted)
    {
        return StatusCode(
            StatusCodes.Status500InternalServerError,
            "Couldn't accept.");
    }
 
    Repository.Create(reservation);
    return Ok();
}

This is more readable. The logging statements are gone from the code, thereby amplifying the essential behaviour of the Post method. The noise is gone.

Wait a minute! you might say, You can't just remove logging! Logging is important.

Yes, I agree, and I didn't. This code still logs. It logs just what you need to log. No more, no less.

Types of logging #

Before we talk about the technical details, I think it's important to establish some vocabulary and context. In this article, I use the term logging broadly to describe any sort of action of recording what happened while software executed. There's more than one reason an application might have to do that:

  • Instrumentation. You may log to support your own work. The first code listing in this article is a typical example of this style of logging. If you've ever had the responsibility of having to support an application that runs in production, you know that you need insight into what happens. When people report strange behaviours, you need those logs to support troubleshooting.
  • Telemetry. You may log to support other people's work. You can write status updates, warnings, and errors to support operations. You can record Key Performance Indicators (KPIs) to support 'the business'.
  • Auditing. You may log because you're legally obliged to do so.
  • Metering. You may log who does what so that you can bill users based on consumption.
Regardless of motivation, I still consider these to be types of logging. All are cross-cutting concerns in that they're independent of application features. Logging records what happened, when it happened, who or what triggered the event, and so on. The difference between instrumentation, telemetry, auditing, and metering is only what you choose to persist.

Particularly when it comes to instrumentation, I often see examples of 'overlogging'. When logging is done to support future troubleshooting, you can't predict what you're going to need, so it's better to log too much data than too little.

It'd be even better to log only what you need. Not too little, not too much, but just the right amount of logging. Obviously, we should call this Goldilogs.

Repeatability #

How do you know what to log? How do you know that you've logged everything that you'll need, when you don't know your future needs?

The key is repeatability. Just like you should be able to reproduce builds and repeat deployments, you should also be able to reproduce execution.

If you can replay what happened when a problem manifested itself, you can troubleshoot it. You need to log just enough data to enable you to repeat execution. How do you identify that data?

Consider a line of code like this:

int z = x + y;

Would you log that?

It might make sense to log what x and y are, particularly if these values are run-time values (e.g. entered by a user, the result of a web service call, etc.):

Log.Debug($"Adding {x} and {y}.");
int z = x + y;

Would you ever log the result, though?

Log.Debug($"Adding {x} and {y}.");
int z = x + y;
Log.Debug($"Result of addition: {z}");

There's no reason to log the result of the calculation. Addition is a pure function; it's deterministic. If you know the inputs, you can always repeat the calculation to get the output. Two plus two is always four.

The more your code is composed from pure functions, the less you need to log.

Log only impure actions #

In principle, all code bases interleave pure functions with impure actions. In most procedural or object-oriented code, no attempt is made of separating the two:

A box of mostly impure (red) code with vertical stripes of green symbolising pure code.

I've here illustrated impure actions with red and pure functions with green. Imagine that this is a conceptual block of code, with execution flowing from top to bottom. When you write normal procedural or object-oriented code, most of the code will have some sort of local side effect in the form of a state change, a more system-wide side effect, or be non-deterministic. Occasionally, arithmetic calculation or similar will form small pure islands.

While you don't need to log the output of those pure functions, it hardly makes a difference, since most of the code is impure. It would be a busy log, in any case.

Once you shift towards functional-first programming, your code may begin to look like this instead:

A box of mostly pure (green) code with a few vertical stripes of red symbolising impure code.

You may still have some code that occasionally executes impure actions, but largely, most of the code is pure. If you know the inputs to all the pure code, you can reproduce that part of the code. This means that you only need to log the non-deterministic parts: the impure actions. Particularly, you need to log the outputs from the impure actions, because these impure output values become the inputs to the next pure block of code.

This style of architecture is what you'll often get with a well-designed F# code base, but you can also replicate it in C# or another object-oriented programming language. I'd also draw a diagram like this to illustrate how Haskell code works if you model interactions with free monads.

This is the most generally applicable example, so articles in this short series show a Haskell code base with free monads, as well as a C# code base.

In reality, you can often get away with an impureim sandwich:

A box with a thin red slice on top, a thick green middle, and a thin red slice at the bottom.

This architecture makes things simpler, including logging. You only need to log the inital and the concluding impure actions. The rest, you can always recompute.

I could have implemented the comprehensive example code shown in the next articles as impureim sandwiches, but I chose to use free monads in the Haskell example, and Dependency Injection in the C# example. I did this in order to offer examples from which you can extrapolate a more complex architecture for your production code.

Examples #

I've produced two equivalent example code bases to show how to log just enough data. The first is in Haskell because it's the best way to be sure that pure and impure code is properly separated.

Both example applications have the same externally visible behaviour. They showcase a focused vertical slice of a restaurant reservation system. The only feature they support is the creation of a reservation.

Clients make reservations by making an HTTP POST request to the reservation system:

POST /reservations HTTP/1.1
Content-Type: application/json

{
  "id""84cef648-1e5f-467a-9d13-1b81db7f6df3",
  "date""2021-12-21 19:00:00",
  "email""mark@example.com",
  "name""Mark Seemann",
  "quantity": 4
}

This is an attempt to make a reservation for four people at December 21, 2021 at seven in the evening. Both code bases support this HTTP API.

If the web service accepts the reservation, it'll write the reservation as a record in a SQL Server database. The table is defined as:

CREATE TABLE [dbo].[Reservations] (
    [Id]         INT                NOT NULL IDENTITY,
    [Guid]       UNIQUEIDENTIFIER   NOT NULL UNIQUE,
    [Date]       DATETIME2          NOT NULL,
    [Name]       NVARCHAR (50)      NOT NULL,
    [Email]      NVARCHAR (50)      NOT NULL,
    [Quantity]   INT                NOT NULL
    PRIMARY KEY CLUSTERED ([Id] ASC)

Both implementations of the service can run on the same database.

The examples follow in separate articles:

Readers not comfortable with Haskell are welcome to skip directly to the C# article.

Log metadata #

In this article series, I focus on run-time data. The point is that there's a formal method to identify what to log: Log the inputs to and outputs from impure actions.

I don't focus on metadata, but apart from run-time data, each log entry should be accompanied by metadata. As a minimum, each entry should come with information about the time it was observed, but here's a list of metadata to consider:

  • Date and time of the log entry. Make sure to include the time zone, or alternatively, log exclusively in UTC.
  • The version of the software that produced the entry. This is particularly important if you deploy new versions of the software several times a day.
  • The user account or security context in which the application runs.
  • The machine ID, if you consolidate server farm logs in one place.
  • Correlation IDs, if present.
I don't claim that this list is complete, but I hope it can inspire you to add the metadata that you need.

Conclusion #

You only need to log what happens in impure actions. In a normal imperative or object-oriented code base, this is almost a useless selection criterion, because most of what happens is impure. Thus, you need to log almost everything.

There's many benefits to be had from moving towards a functional architecture. One of them is that it simplifies logging. Even a functional-first approach, as is often seen in idiomatic F# code bases, can simplify your logging efforts. The good news is that you can adopt a similar architecture in object-oriented code. You don't even have to compromise the design.

I've worked on big C# code bases where we logged all the impure actions. It was typically less than a dozen impure actions per HTTP request. When there was a problem in production, I could usually reproduce what caused it based on the logs.

You don't have to overlog to be able to troubleshoot your production code. Log the data that matters, and only that. Log the impure inputs and outputs.

Next: Repeatable execution in Haskell.



Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Monday, 23 March 2020 08:17:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Monday, 23 March 2020 08:17:00 UTC