Unit Testing methods with the same implementation

Thursday, 09 October 2014 18:45:00 UTC

How do you comprehensibly unit test two methods that are supposed to have the same behaviour?

Occasionally, you may find yourself in situations where you need to have two (or more) public methods with the same behaviour. How do you properly cover them with unit tests?

The obvious answer is to copy and paste all the tests, but that leads to test duplication. People with a superficial knowledge of DAMP (Descriptive And Meaningful Phrases) would argue that test duplication isn't a problem, but it is, simply because it's more code that you need to maintain.

Another approach may be to simply ignore one of those methods, effectively not testing it, but if you're writing Wildlife Software, you have to have mechanisms in place that can protect you from introducing breaking changes.

This article outlines one approach you can use if your unit testing framework supports Parameterized Tests.

Example problem

Recently, I was working on a class with a complex AppendAsync method:

public Task AppendAsync(T @event)
{
    // Lots of stuff going on here...
}

To incrementally define the behaviour of that AppendAsync method, I had used Test-Driven Development to implement it, ending with 13 test methods. None of those test methods were simple one-liners; they ranged from 3 to 18 statements, and averaged 8 statements per test.

After having fully implemented the AppendAsync method, I wanted to make the containing class implement IObserver<T>, with the OnNext method implemented like this:

public void OnNext(T value)
{
    this.AppendAsync(value).Wait();
}

That was my plan, but how could I provide appropriate coverage with unit tests of that OnNext implementation, given that I didn't want to copy and paste those 13 test methods?

Parameterized Tests with varied behaviour

In this code base, I was already using xUnit.net with its nice support for Parameterized Tests; in fact, I was using AutoFixture.Xunit with attribute-based test input, so my tests looked like this:

[TheoryAutoAtomData]
public void AppendAsyncFirstEventWritesPageBeforeIndex(
    [Frozen(As = typeof(ITypeResolver))]TestEventTypeResolver dummyResolver,
    [Frozen(As = typeof(IContentSerializer))]XmlContentSerializer dummySerializer,
    [Frozen(As = typeof(IAtomEventStorage))]SpyAtomEventStore spyStore,
    AtomEventObserver<XmlAttributedTestEventX> sut,
    XmlAttributedTestEventX @event)
{
    sut.AppendAsync(@event).Wait();
 
    var feed = Assert.IsAssignableFrom<AtomFeed>(
        spyStore.ObservedArguments.Last());
    Assert.Equal(sut.Id, feed.Id);
}

This is the sort of test I needed to duplicate, but instead of calling sut.AppendAsync(@event).Wait(); I needed to call sut.OnNext(@event);. The only variation in the test should be in the exercise SUT phase of the test. How can you do that, while maintaining the terseness of using attributes for defining test cases? The problem with using attributes is that you can only supply primitive values as input.

First, I briefly experimented with applying the Template Method pattern, and while I got it working, I didn't like having to rely on inheritance. Instead, I devised this little, test-specific type hierarchy:

public interface IAtomEventWriter<T>
{
    void WriteTo(AtomEventObserver<T> observer, T @event);
}
 
public enum AtomEventWriteUsage
{
    AppendAsync = 0,
    OnNext
}
 
public class AppendAsyncAtomEventWriter<T> : IAtomEventWriter<T>
{
    public void WriteTo(AtomEventObserver<T> observer, T @event)
    {
        observer.AppendAsync(@event).Wait();
    }
}
 
public class OnNextAtomEventWriter<T> : IAtomEventWriter<T>
{
    public void WriteTo(AtomEventObserver<T> observer, T @event)
    {
        observer.OnNext(@event);
    }
}
 
public class AtomEventWriterFactory<T>
{
    public IAtomEventWriter<T> Create(AtomEventWriteUsage use)
    {
        switch (use)
        {
            case AtomEventWriteUsage.AppendAsync:
                return new AppendAsyncAtomEventWriter<T>();
            case AtomEventWriteUsage.OnNext:
                return new OnNextAtomEventWriter<T>();
            default:
                throw new ArgumentOutOfRangeException("Unexpected value.");
        }
    }
}

This is test-specific code that I added to my unit tests, so there's no test-induced damage here. These types are defined in the unit test library, and are not available in the SUT library at all.

Notice that I defined an interface to abstract how to use the SUT for this particular test purpose. There are two small classes implementing that interface: one using AppendAsync, and one using OnNext. However, this small polymorphic type hierarchy can't be supplied from attributes, so I also added an enum, and a concrete factory to translate from the enum to the writer interface.

This test-specific type system enabled me to refactor my unit tests to something like this:

[Theory]
[InlineAutoAtomData(AtomEventWriteUsage.AppendAsync)]
[InlineAutoAtomData(AtomEventWriteUsage.OnNext)]
public void WriteFirstEventWritesPageBeforeIndex(
    AtomEventWriteUsage usage,
    [Frozen(As = typeof(ITypeResolver))]TestEventTypeResolver dummyResolver,
    [Frozen(As = typeof(IContentSerializer))]XmlContentSerializer dummySerializer,
    [Frozen(As = typeof(IAtomEventStorage))]SpyAtomEventStore spyStore,
    AtomEventWriterFactory<XmlAttributedTestEventX> writerFactory,
    AtomEventObserver<XmlAttributedTestEventX> sut,
    XmlAttributedTestEventX @event)
{
    writerFactory.Create(usage).WriteTo(sut, @event);
 
    var feed = Assert.IsAssignableFrom<AtomFeed>(
        spyStore.ObservedArguments.Last());
    Assert.Equal(sut.Id, feed.Id);
}

Notice that this is the same test case as before, but because of the two occurrences of the [InlineAutoAtomData] attribute, each test method is being executed twice: once with AppendAsync, and once with OnNext.

This gave me coverage, and protection against breaking changes of both methods, without making any test-induced damage.

The entire source code in this article is available here. The commit just before the refactoring is this one, and the commit with the completed refactoring is this one.


Comments

I'm not familiar with xUnit, but does it now allow to use something along the lines of nUnit's TestCaseSource to provide the different implementations? From an OOD your implementation is flawless, but I just believe that for tests, simplicity is key so that readability is not affected and intent is instantly captured.
2014-10-29 14:34 UTC

Ricardo, thank you for writing. xUnit.net's equivalent to [TestCaseSource] is to combine the [Theory] attribute with one of the built-in [PropertyData], [ClassData], etc. attributes. However, in this particular case, there's a couple of reasons that I didn't find those attractive:

  • When supplying test data in this way, you can write any code you'd like, but you have to supply values for all the parameters for all the test cases. In the above example, I wanted to vary the System Under Test (SUT), while I didn't care about varying any of the other parameters. The above approach enabled my to do so in a fairly DRY manner.
  • In general, I don't like using [PropertyData], [ClassData] (or NUnit's [TestCaseSource]) unless I truly have a set of reusable test cases. Having a member or type encapsulating a set of test cases implies that these test cases are reusable - but they rarely are.
Perhaps a better approach would be to use something like Exude, but while I wish such semantics were built into xUnit.net or NUnit directly, I didn't want to take a dependency on Exude only for a couple of tests.

2014-10-30 12:26 UTC

AtomEventStore

Tuesday, 07 October 2014 16:35:00 UTC

Announcing AtomEventStore: an open source, server-less .NET Event Store.

Grean has kindly open-sourced a library we've used internally for several projects: AtomEventStore. It's pretty nice, if I may say so.

It comes with quite a bit of documentation, and is also available on NuGet.

The original inspiration came from Yves Reynhout's article Your EventStream is a linked list.


Faking a continuously Polling Consumer with scheduled tasks

Thursday, 25 September 2014 06:33:00 UTC

How to (almost) continuously poll a queue when you can only run a task at discrete intervals.

In a previous article, I described how you set up Azure Web Jobs to run only a single instance of a background process. However, the shortest time interval you can currently configure is to run a scheduled task every minute. If you want to use this trick to set up a Polling Consumer, it may seem limiting that you'll have to wait up to a minute before the scheduled task can pull new messages off the queue.

The problem

The problem is this: a Web Job is a command-line executable you can schedule. The most frequent schedule you can set up is once per minute. Thus, once per minute, the executable can start and query a queue in order to see if there are new messages since it last looked.

If there are new messages, it can pull these messages off the queue and handle them one by one, until the queue is empty.

If there are no new messages, the executable exits. It will then be up to a minute before it will run again. This means that if a message arrives just after the executable exits, it will sit in the queue up to a minute before being handled.

At least, that's the naive implementation.

A more sophisticated approach

Instead of exiting immediately, what if the executable was to wait for a small period, and then check again? This means that you'd be able to increase the polling frequency to run much faster than once per minute.

If the executable also keeps track of when it started, it can gracefully exit slightly before the minute is up, enabling the task scheduler to start a new process soon after.

Example: a recursive F# implementation

You can implement the above strategy in any language. Here's an F# version. The example code below is the main 'loop' of the program, but a few things have been put into place before that. Most of these are configuration values pulled from the configuration system:

  • timeout is a TimeSpan value that specifies for how long the executable should run before exiting. This value comes from the configuration system. Since the minimum schedule frequency for Azure Web Jobs is 1 minute, it makes sense to set this value to 1 minute.
  • stopBefore is essentially DateTimeOffset.Now + timeout.
  • estimatedDuration is a TimeSpan containing your (conservative) estimate of how long time it takes to handle a single message. It's only used if there are no messages to be handled in the queue, as the algorithm then has no statistics about the average execution time for each message. This value comes from the configuration system. In a recent system, I just arbitrarily set it to 2 seconds.
  • toleranceFactor is a decimal used as a multiplier in order to produce a margin for when the executable should exit, so that it can exit before stopBefore, instead of idling for too long. This value comes from the configuration system. You'll have to experiment a bit with this value. When I originally deployed the code below, I had it set to 2, but it seems like the Azure team changed how Web Jobs are initialized, so currently I have it set to 5.
  • idleTime is a TimeSpan that controls how long the executable should sit idly waiting, if there are no messages in the queue. This value comes from the configuration system. In a recent system, I set it to 5 seconds, which means that you may experience up to 5 seconds delay from a message arrives on an empty queue, until it's picked up and handled.
  • dequeue is the function that actually pulls a message off the queue. It has the signature unit -> (unit -> unit) option. That looks pretty weird, but it means that it's a function that takes no input arguments. If there's a message in the queue, it returns a handler function, which is used to handle the message; otherwise, it returns None.
Obviously, there's quite a bit of code behind that dequeue function (it's actually the heart of the system), but exactly what it does, and how it does it, isn't particularly important in this context. If you want to see an example of how to implement a background process in F#, see my Pluralsight course on Functional Architecture with F#.

let rec handleUntilTimedOut (durations : TimeSpan list) =
    let avgDuration =
        match durations with
        | [] -> estimatedDuration
        | _ ->
            (durations |> List.sumBy (fun x -> x.Ticks)) /
                (int64 durations.Length)
            |> TimeSpan.FromTicks
    let margin =
        (decimal avgDuration.Ticks) * toleranceFactor
        |> int64
        |> TimeSpan.FromTicks
 
    if DateTimeOffset.Now + margin < stopBefore then
        match dequeue() with
        | Some handle ->
            let before = DateTimeOffset.Now
            handle()
            let after = DateTimeOffset.Now
            let duration = after - before
            let newDurations = duration :: durations
            handleUntilTimedOut newDurations
        | _ ->
            if DateTimeOffset.Now + idleTime < stopBefore then
                Async.Sleep (int idleTime.TotalMilliseconds)
                |> Async.RunSynchronously
                handleUntilTimedOut durations
 
handleUntilTimedOut []

As you can see, the first thing the handleUntilTimedOut function does is to attempt to calculate the average duration of handling a message. Calculating the average is easy enough if you have at least one observation, but if you have no observations at all, you can't calculate the average. This is the reason the algorithm needs the estimatedDuration to get started.

Based on the the average (or estimated) duration, the algorithm next calculates a safety margin. This margin is intended to give the executable a chance to exit in time: if it can see that it's getting too close to the timeout, it'll exit instead of attempting to handle another message, since handling another message may take so much time that it may push it over the timeout limit.

If it decides that, based on the margin, it still have time to handle a message, it'll attempt to dequeue a message.

If there's a message, it'll measure the time it takes to handle the message, and then append the measured duration to the list of already observed durations, and recursively call itself.

If there's no message in the queue, the algorithm will idle for the configured period, and then call itself recursively.

Finally, the handleUntilTimedOut [] expression kicks off the polling cycle with an empty list of observed durations.

Observations

When appropriately configured, a typical Azure Web Job log sequence looks like this:

  • 1 minute ago (55 seconds running time)
  • 2 minutes ago (55 seconds running time)
  • 3 minutes ago (56 seconds running time)
  • 4 minutes ago (55 seconds running time)
  • 5 minutes ago (56 seconds running time)
  • 6 minutes ago (58 seconds running time)
  • 7 minutes ago (55 seconds running time)
  • 8 minutes ago (55 seconds running time)
  • 9 minutes ago (58 seconds running time)
  • 10 minutes ago (56 seconds running time)
Notice that all Web Jobs complete within 1 minute, leaving time for the next scheduled job to start. In all of those 55ish seconds, the job can continuously pull messages of the queue, if any are present.

Summary

If you have a situation where you need to run a scheduled job (as opposed to a continuously running service or daemon), but you want it to behave like it was a continuously running service, you can make it wait until just before a new scheduled job is about to start, and then exit. This is an approach you can use anywhere you find yourself in that situation - not only on Azure.


Decommissioning Decoraptors

Monday, 25 August 2014 10:08:00 UTC

How to release objects from within a Decoraptor.

In my article about the Decoraptor design pattern, I deliberately ignored the question of decommissioning: what if the short-lived object created inside of the Decoraptor contains one or more disposable objects?

A Decoraptor creates instances of object graphs, and such object graphs may contain disposable objects. These disposable objects may be deeply buried inside of the object graph, and the root object itself may not implement IDisposable.

This is a know problem, and the solution is known as well: the object that composes the object graph is the only object with enough knowledge to dispose of any disposable objects within the object graph that it created, so an Abstract Factory should also be able to release objects again:

public interface IFactory<T>
{
    T Create();
 
    void Release(T item);
}

This is similar to advice I've already given in conjunction with designing DI-Friendly frameworks.

Example: refactored Decoraptor

With the new version of the IFactory<T> interface, you can refactor the Observeraptor class from the Decoraptor article:

public class Observeraptor<T> : IObserver<T>
{
    private readonly IFactory<IObserver<T>> factory;
 
    public Observeraptor(IFactory<IObserver<T>> factory)
    {
        this.factory = factory;
    }
 
    public void OnCompleted()
    {
        var obs = this.factory.Create();
        obs.OnCompleted();
        this.factory.Release(obs);
    }
 
    public void OnError(Exception error)
    {
        var obs = this.factory.Create();
        obs.OnError(error);
        this.factory.Release(obs);
    }
 
    public void OnNext(T value)
    {
        var obs = this.factory.Create();
        obs.OnNext(value);
        this.factory.Release(obs);
    }
}

As you can see, each method implementation now performs three steps:

  1. Use the factory to create an instance of IObserver<T>
  2. Delegate the method call to the new instance
  3. Release the short-lived instance
This informs the factory that the client is done with the instance, and it can let it go out of scope, as well as dispose of any disposable objects that object graph may contain.

Some people prefer letting the factory return something that implements IDisposable, so that the Decoraptor can employ a using statement in order to manage the lifetime of the returned object. However, IObserver<T> itself doesn't derive from IDisposable, so if you want to do something like that, you'd need to invent some sort of Scope<T> class, which would implement IDisposable; that's another alternative.

Example: refactored factory

Refactoring the Decoraptor itself to take decommissioning into account is fairly easy, but you also need to implement the Release method on any factory implementations. Furthermore, the factory should be thread-safe, because often, the entire point of introducing a Decoraptor in the first place is to deal with Services that aren't thread-safe.

Implementing a thread-safe Composer isn't too difficult, but certainly increases the complexity of it:

public class SqlMeterFactory : IFactory<IObserver<MeterRecord>>
{
    private readonly
        ConcurrentDictionary<IObserver<MeterRecord>, MeteringContext>
            observers;
 
    public SqlMeterFactory()
    {
        this.observers = new
            ConcurrentDictionary<IObserver<MeterRecord>, MeteringContext>();
    }
 
    public IObserver<MeterRecord> Create()
    {
        var ctx = new MeteringContext();
        var obs = new SqlMeter(ctx);
 
        this.observers[obs] = ctx;
 
        return obs;
    }
 
    public void Release(IObserver<MeterRecord> item)
    {
        MeteringContext ctx;
        if (this.observers.TryRemove(item, out ctx))
            ctx.Dispose();
    }
}

Fortunately, instead of having to handle the complexity of a thread-safe implementation, you can delegate that part to a ConcurrentDictionary.

Disposable graphs

The astute reader may have been wondering about this: shouldn't SqlMeter implement IDisposable? And if so, wouldn't it be easier to simply dispose of it directly, instead of having to deal with a dictionary of root objects as keys, and disposable objects as values?

The short answer is that, in this particular example, this would indeed have been a more correct, as well as a simpler, solution. Consider the SqlMeter class:

public class SqlMeter : IObserver<MeterRecord>
{
    private readonly MeteringContext ctx;
 
    public SqlMeter(MeteringContext ctx)
    {
        this.ctx = ctx;
    }
 
    public void OnNext(MeterRecord value)
    {
        this.ctx.Records.Add(value);
        this.ctx.SaveChanges();
    }
 
    public void OnCompleted() { }
 
    public void OnError(Exception error) { }
}

Since SqlMeter has a class field of the type MeteringContext, and MeteringContext, via deriving from DbContext, is a disposable object, SqlMeter itself also ought to implement IDisposable.

Yes, indeed, it should, and the only reason I didn't do that was for the sake of the example. If SqlMeter had implemented IDisposable, you'd be tempted to implement SqlMeterFactory.Release by simply attempting to downcast the incoming item from IObserver<MeterRecord> to IDisposable, and then, if the cast succeeds, invoke its Dispose method.

However, the only reason we know that SqlMeter ought to implement IDisposable is because it has a concrete dependency. Concrete types can implement IDisposable, but interfaces should not, since IDisposable is an implementation detail. (I do realize that some interfaces defined by the .NET Base Class Library derives from IDisposable, but I consider those Leaky Abstractions. As Nicholas Blumhardt put it: "an interface [...] generally shouldn't be disposable. There's no way for the one defining an interface to foresee all possible implementations of it - you can always come up with a disposable implementation of practically any interface.")

The point of this discussion is that as soon as a class has an abstract dependency (as opposed to a concrete dependency), the class may or may not be relying on something that is disposable. That's not enough to warrant the class itself to implement the IDisposable interface. For this reason, when an Abstract Factory creates an object graph, the root object of that graph should rarely implement IDisposable, and that's the reason it's not enough to attempt a downcast.

This was the reason that I intentionally didn't implement IDisposable on SqlMeter, even though I should have: to demonstrate how a Release method should deal with object graphs where the disposable objects are deeply buried as leaf nodes in the graph. A Release method may not always be able to traverse the graph, so this is the reason that the factory must also have the decommissioning responsibility: only the Composer knows what it composed, so only it knows how to safely dismantle the graph again.

Summary

Throwing IDisposable into the mix always makes things much more complicated, so if you can, try to implement your classes in such a way that they aren't disposable. This is often possible, but if you can't avoid it, you must add a Release method to your Abstract Factory and handle the additional complexity of decommissioning object graphs.

Some (but not all!) DI Containers already do this, so that may be one reason to use a DI Container.


Decoraptor

Sunday, 24 August 2014 13:04:00 UTC

A Dependency Injection design pattern

This article describes a Dependency Injection design pattern called Decoraptor. It can be used as a solution to the problem:

How can you address lifetime mismatches without introducing a Leaky Abstraction?

By adding a combination of a Decorator and Adapter between the Client and its Service.

Decoraptor

Sometimes, when a Client wishes to talk to a Service (Component in the figure; in practice often an interface), the Client may have a long lifetime, whereas, because of technical constraints in the implementation, the Service must only have a short lifetime. If not addressed, this may lead to a Captive Dependency: the long-lived Client holds onto the Service for too long.

One solution to this is a Decoraptor - a combination of a Decorator and an Adapter. A Decoraptor is a long-lived object that adapts an Abstract Factory, and uses the factory to create a short-lived instance of the interface that it, itself, implements.

How it works

A Decoraptor is an Adapter that looks almost like a Decorator. It implements an interface (Component in the above figure) by adapting an Abstract Factory. The factory creates other instances of the implemented interface, so the Decoraptor implements each operation defined by the interface by

  1. Invoking the factory to create a new instance of the same interface
  2. Delegating the operation to the new instance
This enables the Decoraptor to resolve the lifetime mismatch between a long-lived Client and a short-lived Service. The Decoraptor itself is as long-lived as its Client, but ensures that each Service instance is as short-lived as a single operation.

While not strictly a Decorator, a Decoraptor is closely related because it ultimately delegates all implementation to another implementation of the interface it, itself, implements. The only purpose of a Decoraptor is to match two otherwise incompatible lifetimes.

When to use it

While a Decoraptor is a fairly simple piece of infrastructure code, it still adds complexity to a code base, so it should only be used if necessary. The simplest solution to the Captive Dependency problem is to align the Client's lifetime to the Service's lifetime; in other words, make the Client's lifetime as short as the Service's lifetime. This doesn't require a Decoraptor: it only requires you to compose the Client/Service object graph every time you want to create a new instance of the Service.

Another alternative to a Decoraptor is to consider whether it's possible to refactor the Service so that it can have a longer lifetime. Often, the reason why a Service should have a short life span is because it isn't thread-safe. Sometimes, making a Service thread-safe isn't that difficult; in such cases, this may be a better solution (but sometimes, making something thread-safe is extremely difficult, in which case Decoraptor may be a much simpler solution).

Still, there may be cases where creating a new Client every time you want to use it isn't desirable, or even possible. The most common concern is often performance, although in my experience, that's rarely a real problem. More substantial is a concern that sometimes, due to technical constraints, it isn't possible to make the Client shorter-lived. In these cases, a Decoraptor is a good solution.

A Decoraptor is a better alternative to injecting an Abstract Factory directly into the Client, because doing that is a Leaky Abstraction. The Client is programming against an interface, and, according to the Dependency Inversion Principle,

"clients [...] own the abstract interfaces"

- Agile Principles, Patterns, and Practices, chapter 11

Therefore, it would be a Leaky Abstraction if the Client were to define the interface based on the requirements of a particular implementation of that interface. A Decoraptor is a better alternative, because it enables the Client to define the interface it needs, and the Service to implement the interface as it can, and the Decoraptor's single responsibility is to make those two ends meet.

Implementation details

Let the Client define the interface it needs, without taking lifetime issues into consideration. Implement the interface in a separate class, again disregarding specific lifetime issues.

Define an Abstract Factory that creates new instances of the interface required by the Client. Create a new class that implements the interface, but takes the factory as a dependency. This is the Decoraptor. In the Decoraptor, implement each method by invoking the factory to create the short-lived instance of the interface, and delegate the method implementation to that instance.

Create a class that implements the Abstract Factory by creating new instances of the short-lived Service.

Inject the factory into the Decoraptor, and inject the Decoraptor into the long-lived Client. The Decoraptor has the same lifetime as the Client, but each Service instance only exists for the duration of a single method call.

Motivating example

When using Passive Attributes with ASP.NET Web API, the Filter containing all the behaviour must be added to the overall collection of Filters:

var filter = new MeteringFilter(observer);
config.Filters.Add(filter);

Due to the way ASP.NET Web API works, this configuration happens during Application_Start, so there's only going to be a single instance of MeteringFilter around. In other work, the MeteringFilter is a long-lived Client; it effectively has Singleton lifetime scope.

MeteringFilter depends on IObserver<MeterRecord> (See the article about Passive Attributes for the full code of MeteringFilter). The observer injected into the MeteringFilter object will have the same lifetime as the MeteringFilter object. The Filter will be used from concurrent threads, and while MeteringFilter itself is thread-safe (it has immutable state), the injected observer may not be.

To be more concrete, this question was recently posted on Stack Overflow:

"The filter I'm implementing has a dependency on a repository, which itself has a dependency on a custom DbContext. [...] I'm not sure how to implement this, while taking advantage of the DI container's lifetime management capabilities (so that a new DbContext will be used per request)."
It sounds like the DbContext in question is probably an Entity Framework DbContext, which makes sense: last time I looked, DbContext wasn't thread-safe. Imagine that the IObserver<MeterRecord> implementation looks like this:

public class SqlMeter : IObserver<MeterRecord>
{
    private readonly MeteringContext ctx;
 
    public SqlMeter(MeteringContext ctx)
    {
        this.ctx = ctx;
    }
 
    public void OnNext(MeterRecord value)
    {
        this.ctx.Records.Add(value);
        this.ctx.SaveChanges();
    }
 
    public void OnCompleted() { }
 
    public void OnError(Exception error) { }
}

The SqlMeter class here takes the place of the Service in the more abstract language used above. In the Stack Overflow question, it sounds like there's a Repository between the Client and the DbContext, but I think this example captures the essence of the problem.

This is exactly the issue outlined above. Due to technical constraints, the MeteringFilter must be a Singleton, while (again due to technical constraints) each Observer must be either Transient or scoped to each request.

Refactoring notes

In order to resolve a problem like the example above, you can introduce a Decoraptor, which sits between the Client (MeteringFilter) and the Service (SqlMeter). The Decoraptor relies on an Abstract Factory to create new instances of SqlMeter:

public interface IFactory<T>
{
    T Create();
}

In this case, the Abstract Factory is a generic interface, but it could also be a non-generic interface that only creates instances of IObserver<MeterRecord>. Some people prefer delegates instead, so would use e.g. Func<IObserver<MeterRecord>>. You could also define an Abstract Base Class instead of an interface, if that's more to your liking.

This particular example also disregards decommissioning concerns. Effectively, the Abstract Factory will create instances of SqlMeter, but since these instances simply go out of scope, the contained MeteringContext isn't being disposed of in a deterministic manner. In a future article, I will remedy this situation.

Example: generic Decoraptor for Observers

In order to resolve the lifetime mismatch between MeteringFilter and SqlMeter you can introduce a generic Decoraptor:

public class Observeraptor<T> : IObserver<T>
{
    private readonly IFactory<IObserver<T>> factory;
 
    public Observeraptor(IFactory<IObserver<T>> factory)
    {
        this.factory = factory;
    }
 
    public void OnCompleted()
    {
        this.factory.Create().OnCompleted();
    }
 
    public void OnError(Exception error)
    {
        this.factory.Create().OnError(error);
    }
 
    public void OnNext(T value)
    {
        this.factory.Create().OnNext(value);
    }
}

Notice that Observeraptor<T> can adapt any IObserver<T> because it depends on the generic IFactory<IObserver<T>> interface. In each method defined by IObserver<T>, it first uses the factory to create an instance of a short-lived IObserver<T>, and then delegates the implementation to that object.

Since MeteringFilter depends on IObserver<MeterRecord>, you also need an implementation of IFactory<IObserver<MeterRecord>> in order to compose the MeteringFilter object graph:

public class SqlMeterFactory : IFactory<IObserver<MeterRecord>>
{
    public IObserver<MeterRecord> Create()
    {
        return new SqlMeter(new MeteringContext());
    }
}

This implementation simply creates a new instance of SqlMeter and MeteringContext every time the Create method is invoked; hardly surprising, I should hope.

You now have all building blocks to correctly compose MeteringFilter in Application_Start:

var factory = new SqlMeterFactory();
var observer = new Observeraptor<MeterRecord>(factory);
var filter = new MeteringFilter(observer);
config.Filters.Add(filter);

Because SqlMeterFactory creates new instances of SqlMeter every time MeteringFilter invokes IObserver<MeterRecord>.OnNext, the lifetime requirements of the DbContext are satisfied. Moreover, MeteringFilter only depends on IObserver<MeterRecord>, so is perfectly shielded from that implementation detail, so no Leaky Abstraction was introduced.

Known examples

Questions related to the issue of lifetime mismatches are regularly being asked on Stack Overflow, and I've had a bit of success recommending Decoraptor as a solution, although at that time, I had yet to come up with the name:

In addition, I've also previously touched upon the subject in relation to using a Virtual Proxy for lazy dependencies.


CQS versus server generated IDs

Monday, 11 August 2014 19:40:00 UTC

How can you both follow Command Query Separation and assign unique IDs to Entities when you save them? This post examines some options.

In my Encapsulation and SOLID Pluralsight course, I explain why Command Query Separation (CQS) is an important component of Encapsulation. When first exposed to CQS, many people start to look for edge cases, so a common question is something like this:

What would you do if you had a service that saves to a database and returns the ID (which is set by the database)? The Save operation is a Command, but if it returns void, how do I get the ID?
This is actually an exercise I've given participants when I've given the course as a workshop, so before you read my proposed solutions, consider taking a moment to see if you can come up with a solution yourself.

Typical, CQS-violating design

The problem is that in many code bases, you occasionally need to save a brand-new Entity to persistent storage. Since the object is an Entity (as defined in Domain-Driven Design), it must have a unique ID. If your system is a concurrent system, you can't just pick a number and expect it to be unique... or can you?

Instead, many programmers rely on their underlying database to add a unique ID to the Entity when it save it, and then return the ID to client. Accordingly, they introduce designs like this:

public interface IRepository<T>
{
    int Create(T item);
 
    // other members
}

The problem with this design, obviously, is that it violates CQS. The Create method ought to be a Command, but it returns a value. From the perspective of Encapsulation, this is problematic, because if we look only at the method signature, we could be tricked into believing that since it returns a value, the Create method is a Query, and therefore has no side-effects - which it clearly has. That makes it difficult to reason about the code.

It's also a leaky abstraction, because most developers or architects implicitly rely on a database implementation to provide the ID. What if an implementation doesn't have the ability to come up with a unique ID? This sounds like a Liskov Substitution Principle violation just waiting to happen!

The easiest solution

When I wrote "you can't just pick a number and expect it to be unique", I almost gave away the easiest solution to this problem:

public interface IRepository<T>
{
    void Create(Guid id, T item);
 
    // other members
}

Just change the ID from an integer to a GUID (or UUID, if you prefer). This does enable clients to come up with a unique ID for the Entity - after all, that's the whole point of GUIDs.

Notice how, in this alternative design, the Create method returns void, making it a proper Command. From an encapsulation perspective, this is important because it enables you to immediately identify that this is a method with side-effects - just by looking at the method signature.

But wait: what if you do need the ID to be an integer? That's not a problem; there's also a solution for that.

A solution with integer IDs

You may not like the easy solution above, but it is a good solution that fits in a wide variety in situations. Still, there can be valid reasons that using a GUID isn't an acceptable solution:

  • Some databases (e.g. SQL Server) don't particularly like if you use GUIDs as table keys. You can, but integer keys often perform better, and they are also shorter.
  • Sometimes, the Entity ID isn't only for internal use. Once, I worked on a customer support ticket system, and my suggestion of using a GUID as an ID wasn't met with enthusiasm. When a customer calls in about an existing support case, it isn't reasonable to ask him or her to read an entire GUID; it's simply too error-prone.
In some cases, you just need some human-readable integers as IDs. What can you do?

Here's my modified solution:

public interface IRepository<T>
{
    void Create(Guid id, T item);
 
    int GetHumanReadableId(Guid id);
 
    // other members
}

Notice that the Create method is the same as before. The client must still supply a GUID for the Entity, and that GUID may or may not be the 'real' ID of the Entity. However, in addition to the GUID, the system also associates a (human-readable) integer ID with the Entity. Which one is the 'real' ID isn't particularly important; the important part is that there's another method you can use to get the integer ID associated with the GUID: the GetHumanReadableId method.

The Create method is a Command, which is only proper, because creating (or saving) the Entity mutates the state of the system. The GetHumanReadableId method, on the other hand, is a Query: you can call it as many times as you like: it doesn't change the state of the system.

If you want to store your Entities in a database, you can still do that. When you save the Entity, you save it with the GUID, but you can also let the database engine assign an integer ID; it might even be a monotonically increasing ID (1, 2, 3, 4, etc.). At the same time, you could have a secondary index on the GUID.

When a client invokes the GetHumanReadableId method, the database can use the secondary index on the GUID to quickly find the Entity and return its integer ID.

Performance

"But," you're likely to say, "I don't like that! It's bad for performance!"

Perhaps. My first impulse is to quote Donald Knuth at you, but in the end, I may have to yield that my proposed design may result in two out-of-process calls instead of one. Still, I never promised that my solution wouldn't involve a trade-off. Most software design decisions involve trade-offs, and so does this one. You gain better encapsulation for potentially worse performance.

Still, the performance drawback may not involve problems as such. First, while having to make two round trips to the database may perform worse than a single one, it may still be fast enough. Second, even if you need to know the (human-readable) integer ID, you may not need to know it when you create it. Sometimes you can get away with saving the Entity in one action, and then you may only need to know what the ID is seconds or minutes later. In this case, separating reads and writes may actually turn out to be an advantage.

Must all code be designed like this?

Even if you disregard your concerns about performance, you may find this design overly complex, and more difficult to use. Do I really recommend that all code should be designed like that?

No, I don't recommend that all code should be designed according to the CQS principle. As Martin Fowler points out, in Object-Oriented Software Construction, Bertrand Meyer explains how to design an API for a Stack with CQS. It involves a Pop Command, and a Top Query: in order to use it, you would first have to invoke the Top Query to get the item on the top of the stack, and then subsequently invoke the Pop Command to modify the state of the stack.

One of the problems with such a design is that it isn't thread-safe. It's also more unwieldy to use than e.g. the standard Stack<T>.Pop method.

My point with all of this isn't to dictate to you that you should always follow CQS. The point is that you can always come up with a design that does.

In my career, I've met many programmers who write a poorly designed class or interface, and then put a little apology on top of it:

public interface IRepository<T>
{
    // CQS is impossible here, because we need the ID
    int Create(T item);
 
    // other members
}

Many times, we don't know of a better way to do things, but it doesn't mean that such a way doesn't exist; we just haven't learned about it yet. In this article, I've shown you how to solve a seemingly impossible design challenge with CQS.

Knowing that even a design problem such as this can be solved with CQS should put you in a better position to make an informed decision. You may still decide to violate CQS and define a Create method that returns an integer, but if you've done that, it's because you've weighed the alternatives; not because you thought it impossible.

Summary

It's possible to apply CQS to most problems. As always, there are trade-offs involved, but knowing how to apply CQS, even if, at first glance, it seems impossible, is an important design skill.

Personally, I tend to adhere closely to CQS when I do Object-Oriented Design, but I also, occasionally, decide to break that principle. When I do, I do so realising that I'm making a trade-off, and I don't do it lightly.


Comments

How would you model a REST End Point, for example:

POST /api/users

This should return the Location of the created url. Would you suggest, the location be the status of the create user command.

GET /api/users/createtoken/abc123

UI Developers do not like this pattern. It creates extra work for them.
2014-08-12 4:24 UTC
Steve Byrne
Would raising an event of some kind from the Repository be a valid solution in this case?

(Bear in mind I'm pretty ignorant when it comes to CQRS but I figured that it would get around not being able to return anything directly from a command)

But then again I suppose you'd still need the GetHumanReadableId method for later on...
2014-08-14 00:18 UTC

Tony, thank you for writing. What you suggest is one way to model it. Essentially, if a client starts with:

POST /api/users

One option for a response is, as you suggest, something like:

HTTP/1.1 202 Accepted
Location: /api/status/1234

A client can poll on that URL to get the status of the task. When the resource is ready, it will include a link to the user resource that was finally created. This is essentially a workflow just like the RESTbucks example in REST in Practice. You'll also see an example of this approach in my Pluralsight course on Functional Architecture, and yes, this makes it more complicated to implement the client. One example is the publicly available sample client code for that course.

While it puts an extra burden on the client developer, it's a very robust implementation because it's based on asynchronous workflows, which not only makes it horizontally scalable, but also makes it much easier to implement robust occasionally connected clients.

However, this isn't the only RESTful option. Here's an alternative. The request is the same as before:

POST /api/users

But now, the response is instead:

HTTP/1.1 201 Created
Location: /api/users/8765

{ "name" : "Mark Seemann" }

Notice that this immediately creates the resource, as well as returns a representation of it in the response (consider the JSON in the example a placeholder). This should be easier on the poor UI Developers :)

In any case, this discussion is entirely orthogonal to CQS, because at the boundaries, applications aren't Object-Oriented - thus, CQS doesn't apply to REST API design. Both POST, PUT, and DELETE verbs imply the intention to modify a resource's state, yet the HTTP specification describes how such requests should return proper responses. These verbs violate CQS because they both involve state mutation and returning a response.

2014-08-15 17:57 UTC

Steve, thank you for writing. Yes, raising an event from a Command is definitely valid, since a Command is all about side-effects, and an event is also a side effect. The corollary of this is that you can't raise events from Queries if you want to adhere to CQS. In my Pluralsight course on encapsulation, I briefly touch on this subject in the Encapsulation module (module 2), in the clip called Queries, at 1:52.

(BTW, in this article, and the course as well, I'm discussing CQS (Command Query Separation), not CQRS (Command Query Responsibility Segregation). Although these ideas are related, it's important to realise that they aren't synonymous. CQS dates back to the 1980s, while CQRS is a more recent invention, from the 2000s - the earliest dated publication that's easy to find is from 2010, but it discusses CQRS as though people already know what it is, so the idea must be a bit older than that.)

2014-08-16 7:37 UTC
Kenny Pflug

Mark, thank you for this post and sorry for my late response, but I've got some questions about it.

First of all, I would not have designed the repository the way you showed at the beginning of your post - instead I would have opted for something like this:

public interface IRepository<T>
{
    T CreateEntity();

    void AttachEntity(T entity);
}

The CreateEntity method behaves like a factory in that it creates a new instance of the specified entity without an ID assigned (the entity is also not attached to the repository). The AttachEntity method takes an existing entity and assigns a valid ID to it. After a call to this method, the client can get the new ID by using the corresponding get method on the entity. This way the repository acts like a service facade as it provides creational, querying and persistance services to the client for a single type of entities.

What do you think about this? From my point of view, the AttachEntity method is clearly a command, but what about the CreateEntity method? I would say it is a query because it does not change the state of the underlying repository when it is called - but is this really the case? If not, could we say that factories always violate CQS? What about a CreateEntity implementation that attaches the newly created entity to the repository?

2014-08-25 7:55 UTC

Kenny, thank you for writing. FWIW, I wouldn't have designed a Repository like I show in the beginning of the article either, but I've seen lots of Repositories designed like that.

Your proposed solution looks like it respects CQS as well, so that's another alternative to my suggested solutions. Based on your suggested method signatures, at least, CreateEntity looks like a Query. If a factory only creates a new object and returns it, it doesn't change the observable state of the system (no class fields mutated, no files were written to disk, no bytes were sent over the network, etc.).

If, on the other hand, the CreateEntity method also attaches the newly created entity to the repository, then it would violate CQS, because that would change the observable state of the system.

Your design isn't too far from the one I suggest in this very article - I just don't include the CreateEntity method in my interface definition, because I consider the creation of an Entity (in memory) to be a separate concern from being able to persist it.

Then, if you omit the CreateEntity method, you'll see that your AttachEntity method plays the same role is my Create method; the only difference is that I keep the ID separate from the Entity, while you have the ID as a writable property on the Entity. Instead of invoking a separate Query (GetHumanReadableId) to figure out what the server-generated ID is, you invoke a Query attached to the Entity (its ID property).

The reason I didn't chose this design is because it messes with the invariants of the Entity. If you consider an Entity as defined in Domain-Driven Design, an Entity is an object that has a long-lasting identity. This is most often implemented by giving the Entity an Id property; I do that too, but make sure that the id is mandatory by requiring clients to supply it through the constructor. As I explain in my Pluralsight course about encapsulation, a very important part about encapsulation is to make sure that it's impossible (or at least difficult) to put an object into an invalid state. If you allow an Entity to be put into a state where it has no ID, I would consider that an invalid state.

2014-08-26 15:10 UTC
Kenny Pflug

Thank you for your comprehensive answer, Mark. I also like the idea of injecting the ID of an entity directly into its constructor - but I haven't actually applied this 'pattern' in my projects yet. Although it's a bit off-topic, I have to ask: the ID injection would result in something like this:

public abstract class Entity<T>
{
    private readonly T _id;
    
    protected Entity(T id)
    {
        _id = id;
    }

    public T ID { get { return _id; } }

    public override int GetHashCode()
    {
        return _id.GetHashCode();
    }

    public override bool Equals(object other)
    {
        var otherEntity = other as Entity;
        if (otherEntity == null)
            return false;

        return _id == otherEntity._id;
    }
}

I know it is not perfect, so please think about it as a template.

Anyway, I prefer this design but often this results in unnecessarily complicated code, e.g. when the user is able to create an entity object temporarily in a dialog and he or she can choose whether to add it or discard of it at the end of the dialog. In this case I would use a DTO / View Model etc. that only exists because I cannot assign a valid ID to an entity yet.

Do you have a pattern that circumvents this problem? E.g. clone an existing entity with an temporary / invalid ID and assign a valid one to it (which I would find very reasonable)?

2014-08-26 22:30 UTC

Kenny, thank you for writing. Yes, you could model it like that Entity<T> class, although there are various different concerns mixed here. Whether or not you want to override Equals and GetHashCode is independent of the discussion about protecting invariants. This may already be clear to you, but I just wanted to get that out of the way.

The best answer to your question, then, is to make it simpler, if at all possible. The first question you should ask yourself is: can the ID really be any type? Can you have an Entity<object>? Entity<Random>? Entity<UriBuilder>? That seems strange to me, and probably not what you had in mind.

The simplest solution to the problem you pose is simply to settle on GUIDs as IDs. That would make an Entity look like this:

public class Foo
{
    private readonly Guid id;

    public Foo(Guid id)
    {
        if (id == Guid.Empty)
            throw new ArgumentException("Empty GUIDs not allowed.""id");
        this.id = id;
    }

    public Guid Id
    {
        get { return this.id; }
    }

    // other members go here...
}

Obviously, this makes the simplification that the ID is a GUID, which makes it easy to create a 'temporary' instance if you need to do that. You'll also note that it protects its invariants by guarding against the empty GUID.

(Although not particularly relevant for this discussion, you'll also notice that I made this a concrete class. In general, I favour composition over inheritance, and I see no reason to introduce an abstract base class only in order to take and expose an ID - such code is unlikely to change much. There's no behaviour here because I also don't override Equals or GetHashCode, because I've come to realise that doing so with the implementation you suggest is counter-productive... but that, again, is another discussion.)

Then what if you can't use a GUID. Well, you could still resort to the solution that I outline above, and still use a GUID, and then have another Query that can give you an integer that corresponds to the GUID.

However, ultimately, this isn't how I tend to architect my systems these days. As Greg Young has pointed out in a white paper draft (sadly) no longer available on the internet, you can't really expect to make Distributed Domain-Driven Design work with the 'classic' n-layer architecture. However, once you separate reads and writes (AKA CQRS) you no longer have the sort of problem you ask about, because your write model should be modelling commands and events, instead of Entities. FWIW, in my Functional Architecture with F# Pluralsight course, I also touch on some of those concepts.

2014-08-29 17:37 UTC

Why DRY?

Thursday, 07 August 2014 20:11:00 UTC

Code duplication is often harmful - except when it isn't. Learn how to think about the trade-offs involved.

Good programmers know that code duplication should be avoided. There's a cost associated with duplicated code, so we have catchphrases like Don't Repeat Yourself (DRY) in order to remind ourselves that code duplication is evil.

It seems to me that some programmers see themselves as Terminators: out to eliminate all code duplication with extreme prejudice; sometimes, perhaps, without even considering the trade-offs involved. Every time you remove duplicated code, you add a level of indirection, and as you've probably heard before, all problems in computer science can be solved by another level of indirection, except for the problem of too many levels of indirection.

Removing code duplication is important, but it tends to add a cognitive overhead. Therefore, it's important to understand why code duplication is harmful - or rather: when it's harmful.

Rates of change

Imagine that you copy a piece of code and paste it into ten other code bases, and then never touch that piece of code again. Is that harmful?

Probably not.

Here's one of my favourite examples. When protecting the invariants of objects, I always add Guard Clauses against nulls:

if (subject == null)
    throw new ArgumentNullException("subject");

In fact, I have a Visual Studio code snippet for this; I've been using this code snippet for years, which means that I have code like this Guard Clause duplicated all over my code bases. Most likely, there are thousands of examples of such Guard Clauses on my hard drive, with the only variation being the name of the parameter. I don't mind, because, in my experience, these two lines of code never change.

Yet many programmers see that as a violation of DRY, so instead, they introduce something like this:

Guard.AgainstNull(subject, "subject");

The end result of this is that you've slightly increased the cognitive overhead, but what have you gained? As far as I can tell: nothing. The code still has the same number of Guard Clauses. Instead of idiomatic if statements, they are now method calls, but it's hardly DRY when you have to repeat those calls to Guard.AgainstNull all over the place. You'd still be repeating yourself.

The point here is that DRY is a catchphrase, but shouldn't be an excuse for avoiding thinking explicitly about any given problem.

If the duplicated code is likely to change a lot, the cost of duplication is likely to be high, because you'll have to spend time making the same change in lots of different places - and if you forget one, you'll be introducing bugs in the system. If the duplicated code is unlikely to change, perhaps the cost is low. As with all other risk management, you conceptually multiply the risk of the adverse event happening with the cost of the damage associated with that event. If the product is low, don't bother addressing the risk.

The Rule of Three

It's not a new observation that unconditional elimination of duplicated code can be harmful. The Rule of Three exists for this reason:

  1. Write a piece of code.
  2. Write the same piece of code again. Resist the urge to generalise.
  3. Write the same piece of code again. Now you are allowed to consider generalising it.
There are a couple of reasons why this is a valuable rule of thumb. One is that the fewer examples you have of the duplication, the less evidence you have that the duplication is real, instead of simply a coincidence.

Another reason is that even if the duplication is 'real' (and not coincidental), you may not have enough examples to enable you to make the correct refactoring. Often, even duplicated code comes with small variations:

  • The logic is the same, but a string value differs.
  • The logic is almost the same, but one duplicate performs an extra small step.
  • The logic looks similar, but operates on two different types of object.
  • etc.
How should you refactor? Should you introduce a helper method? Should the helper method take a method argument? Should you extract a class? Should you add an interface? Should you apply the Template Method pattern? Or the Strategy pattern?

If you refactor too prematurely, you may perform the wrong refactoring. Often, people introduce helper methods, and then when they realize that the axis of variability was not what they expected, they add more and more parameters to the helper method, and more and more complexity to its implementation. This leads to ripple effects. Ripple effects lead to thrashing. Thrashing leads to poor maintainability. Poor maintainability leads to low productivity.

This is, in my experience, the most important reason to follow the Rule of Three: wait, until you have more facts available to you. You don't have to take the rule literally either. You can wait until you have four, five, or six examples of the duplication, if the rate of change is low.

The parallel to statistics

If you've ever taken a course in statistics, you would have learned that the less data you have, the less confidence you can have in any sort of analysis. Conversely, the more samples you have, the more confidence can you have if you are trying to find or verify some sort of correlation.

The same holds true for code duplication, I believe. The more samples you have of duplicated code, the better you understand what is truly duplicated, and what varies. The better you understand the axes of variability, the better a refactoring you can perform in order to get rid of the duplication.

Summary

Code duplication is costly - but only if the code changes. The cost of code duplication, thus, is C*p, where C is the cost incurred, when you need to change the code, and p is the probability that you'll need to change the code. In my experience, for example, the Null Guard Clause in this article has a cost of duplication of 0, because the probability that I'll need to change it is 0.

There's a cost associated with removing duplication - particularly if you make the wrong refactoring. Thus, depending on the values of C and p, you may be better off allowing a bit of duplication, instead of trying to eradicate it as soon as you see it.

You may not be able to quantify C and p (I'm not), but you should be able to estimate whether these values are small or large. This should help you decide if you need to eliminate the duplication right away, or if you'd be better off waiting to see what happens.


Comments

Markus Bullmann

I would add one more aspect to this article: Experience.

The more experience you gain the more likely you will perceive parts of your code which could possibly benefit from a generalization; but not yet because it would be premature.

I try to keep these places in mind and try to simplify the potential later refactoring. When your code meets the rule of three, you’re able to adopt your preparations to efficiently refactor your code. This leads to better code even if the code isn't repeated for the third time.

Needless to say that experience is always crucial but I like to show people, who are new to programing, which decisions are based on experience and which are based on easy to follow guidelines like the rule of three.

2014-08-20 22:03 UTC

Markus, thank you for writing. Yes, I agree that experience is always helpful.

2014-08-21 8:48 UTC

Encapsulation and SOLID Pluralsight course

Wednesday, 06 August 2014 12:19:00 UTC

My latest Pluralsight course is now available. This time it's about fundamental programming techniques.

Most programmers I meet know about encapsulation and Object-Oriented Design (OOD) - at least, until I start drilling them on specifics. Apparently, the way most people have been taught OOD is at odds with my view on the subject. For a long time, I've wanted to teach OOD the way I think it should be taught. My perspective is chiefly based on Bertrand Meyer's Object-Oriented Software Construction and Robert C. Martin's SOLID principles (presented, among other sources, in Agile Principles, Patterns, and Practices in C#). My focus is on Command Query Separation, protection of invariants, and favouring Composition over Inheritance.

Course screenshot

In my new Pluralsight course, I'm happy to be able to teach OOD the way I think it should be taught. The course is aimed at professional developers with a couple of years of experience, but it's based on decades of experience (mine and others'), so I hope that even seasoned programmers can learn a thing or two from watching it.

What you will learn

How do you make your code easily usable for other people? By following the actionable, measurable rules laid forth by Command/Query Separation as well as Postel’s law.

Learn how to write maintainable software that can easily respond to changing requirements, using object-oriented design principles. First, you'll learn about the fundamental object-oriented design principle of Encapsulation, and then you'll learn about the five SOLID principles - also known as the principles of object-oriented design. There are plenty of code examples along the way; they are in C#, but written in such a way that they should be easily understandable to readers of Java, or other curly-brace-based languages.

Is OOD still relevant?

Functional Programming is becoming increasingly popular, and regular readers of this blog will have noticed that I, myself, write a fair amount of F# code. In light of this, is OOD still relevant?

There's a lot of Object-Oriented code out there, or at least, code written in a nominally Object-Oriented language, and it's not going to go away any time soon. Getting better at maintaining and evolving Object-Oriented code is, in my opinion, still important.

Some of the principles I cover in this course are also relevant in Functional Programming.


Comments

I'm really enjoying your presentation! It is easy to follow and understand your explanations. I have a bit of a tangent question about your TryRead method in the Encapsulation module. You mention that your final implementation has a race condition in it. Can you explain the race condition?
2014-08-06 06:55 UTC

Dan, thank you for writing. You are talking about this implementation, I assume:

public bool TryRead(int id, out string message)
{
    message = null;
    var path = this.GetFileName(id);
    if (!File.Exists(path))
        return false;
    message = File.ReadAllText(path);
    return true;
}

This implementation isn't thread-safe, because File.Exists(path) may return true, and then, before File.ReadAllText(path) is invoked, another thread or process could delete the file, causing an exception to be thrown when File.ReadAllText(path) is invoked.

It's possible to make the the TryRead method thread-safe (I know of at least two alternatives), but I usually leave that as an exercise for the reader :)

2014-08-10 10:51 UTC
Bart van Nierop

The only safe implementation that I am aware of would be something along the lines of:

public bool TryRead(int id, out string message)
{
    
    var path = this.GetFileName(id);
    try
    {
        message = File.ReadAllText(path);
        return true;
    }
    catch
    {
        message = null;
        return false;
    }
}

Am I missing something?

2014-08-13 16:43 UTC

Bart, no, you aren't missing anything; that looks about right :) The other alternative is just a variation on your solution.

2014-08-13 17:56 UTC

Drain

Wednesday, 23 July 2014 14:25:00 UTC

A Drain is an filter abstraction over an Iterator, with the purpose of making out-of-process queries more efficient.

For some years now, the Reused Abstractions Principle has pushed me towards software architectures based upon fewer and fewer abstractions, where each abstraction, on the other hand, is reused over and over again.

In my Pluralsight course about A Functional Architecture with F#, I describe how to build an entire mainstream application based on only two abstractions:

More specifically, I use Reactive Extensions for Commands, and the Seq module for Queries (but if you're on C#, you can use LINQ instead).

The problem

It turns out that these two abstractions are enough to build an entire, mainstream system, but in practice, there's a performance problem. If you have only Iterators, you'll have to read all your data into memory, and then filter in memory. If you have lots of data on storage, this is obviously going to be prohibitively slow, so you'll need a way to select only a subset out of your persistent store.

This problem should be familiar to every student of software development. Pure abstractions tend not to survive contact with reality (there are examples in both Object-Oriented, Functional, and Logical or Relational programming), but we should still strive towards keeping abstractions as pure as possible.

One proposed solution to the Query Problem is to use something like IQueryable, but unfortunately, IQueryable is an extremely poor abstraction (and so are F# query expressions, too).

In my experience, the most important feature of IQueryable is the ability to filter before loading data; normally, you can perform projections in memory, but when you read from persistent storage, you need to select your desired subset before loading it into memory.

Inspired by talks by Bart De Smet, in my Pluralsight course, I define custom filter interfaces like:

type IReservations =
    inherit seq<Envelope<Reservation>>
    abstract Between : DateTime -> DateTime -> seq<Envelope<Reservation>>

or

type INotifications =
    inherit seq<Envelope<Notification>>
    abstract About : Guid -> seq<Envelope<Notification>>

Both of these interfaces derive from IEnumerable<T> and add a single extra method that defines a custom filter. Storage-aware implementations can implement this method by returning a new sequence of only those items on storage that match the filter. Such a method may

  • make a SQL query against a database
  • make a query against a document database
  • read only some files from the file system
  • etc.
For more details, examples, and full source code, see my Pluralsight course.

Generalized interface

The custom interfaces shown above follow a common template: the interface derives from IEnumerable<T> and adds a single 'filter' method, which filters the sequence based on the input argument(s). In the above examples, IReservations define a Between method with two arguments, while INotifications defines an About method with a single argument.

In order to generalize, it's necessary to find a common name for the interface and its single method, as well as deal with variations in method arguments.

All the really obvious names like Filter, Query, etc. are already 'taken', so I hit a thesaurus and settled on the name Drain. A Drain can potentially drain a sequence of elements to a smaller sequence.

When it comes to variations in input arguments, the solution is to use generics. The Between method that takes two arguments could also be modelled as a method taking a single tuple argument. Eventually, I've arrived at this general definition:

module Drain =
    type IDrainable<'a, 'b> =
        inherit seq<'a>
        abstract On : 'b -> seq<'a>
 
    let on x (d : IDrainable<'a, 'b>) = d.On x

As you can see, I decided to name the extra method On, as well as add an on function, which enables clients to use a Drain like this:

match tasks |> Drain.on id |> Seq.toList with

In the above example, tasks is defined as IDrainable<TaskRendition, string>, and id is a string, so the result of draining on the ID is a sequence of TaskRendition records.

Here's another example:

match statuses |> Drain.on(id, conversationId) |> Seq.toList with

Here, statuses is defined as IDrainable<string * Guid, string * string> - not the most well-designed instance, I admit: I should really introduce some well-named records instead of those tuples, but the point is that you can also drain on multiple values by using a tuple (or a record type) as the value on which to drain.

In-memory implementation

One of the great features of Drains is that an in-memory implementation is easy, so you can add this function to the Drain module:

let ofSeq areEqual s =
    { new IDrainable<'a, 'b> with
        member this.On x = s |> Seq.filter (fun y -> areEqual y x)
        member this.GetEnumerator() = s.GetEnumerator()
        member this.GetEnumerator() = 
            (this :> 'a seq).GetEnumerator() :> System.Collections.IEnumerator }

This enables you to take any IEnumerable<T> (seq<'a>) and turn it into an in-memory Drain by supplying an equality function. Here's an example:

let private toDrainableTasks (tasks : TaskRendition seq) =
    tasks
    |> Drain.ofSeq (fun x y -> x.Id = y)

This little helper function takes a sequence of TaskRendition records and defines the equality function as a comparison on each TaskRendition record's Id property. The result is a drain that you can use to select one or more TaskRendition records based on their IDs.

I use this a lot for unit testing.

Empty Drains

It's also easy to define an empty drain, by adding this value to the Drain module:

let empty<'a, 'b> = Seq.empty<'a> |> ofSeq (fun x (y : 'b) -> false)

Here's a usage example:

let mappedUsers = Drain.empty<UserMappedstring>

Again, this can be handy when unit testing.

Other implementations

While the in-memory implementation is useful when unit testing, the entire purpose of the Drain abstraction is to enable various implementations to implement the On method to perform a custom selection against a well-known data source. As an example, you could imagine an implementation that translates the input arguments of the On method into a SQL query.

If you want to see examples of this, my Pluralsight course demonstrates how to implement IReservations and INotifications with various data stores - I trust you can extrapolate from those examples.

Summary

You can base an entire mainstream application on the two abstractions of Iterator and Observer. However, the problem when it comes to Iterators is that conceptually, you'll need to iterate over all potentially relevant elements in your system - and that may be millions of records!

However impure it is to introduce a third interface into the mix, I still prefer to introduce a single generic interface, instead of multiple custom interfaces, because once you and your co-workers understand the Drain abstraction, the cognitive load is still quite low. A Drain is an Iterator with a twist, so in the end, you'll have a system built on 2½ abstractions.


Comments

A diff-output from the A Functional Architecture with F# master branch, after applying the Drain abstraction, is available here. Notice how Drain cuts the maintenance of multiple homogenous abstractions, and makes the code cleaner and easier to reason about.
2014-07-28 09:00 UTC

Hire me

Tuesday, 22 July 2014 08:30:00 UTC

July-October 2014 I have some time available, if you'd like to hire me.

Since I became self-employed in 2011, I've been as busy as always, but it looks like I have some time available in the next months. If you'd like to hire me for small or big tasks, please contact me. See here for details.


Page 1 of 28

"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!