How can you both follow Command Query Separation and assign unique IDs to Entities when you save them? This post examines some options.

In my Encapsulation and SOLID Pluralsight course, I explain why Command Query Separation (CQS) is an important component of Encapsulation. When first exposed to CQS, many people start to look for edge cases, so a common question is something like this:

What would you do if you had a service that saves to a database and returns the ID (which is set by the database)? The Save operation is a Command, but if it returns void, how do I get the ID?
This is actually an exercise I've given participants when I've given the course as a workshop, so before you read my proposed solutions, consider taking a moment to see if you can come up with a solution yourself.

Typical, CQS-violating design #

The problem is that in many code bases, you occasionally need to save a brand-new Entity to persistent storage. Since the object is an Entity (as defined in Domain-Driven Design), it must have a unique ID. If your system is a concurrent system, you can't just pick a number and expect it to be unique... or can you?

Instead, many programmers rely on their underlying database to add a unique ID to the Entity when it saves it, and then return the ID to the client. Accordingly, they introduce designs like this:

public interface IRepository<T>
{
    int Create(T item);
 
    // other members
}

The problem with this design, obviously, is that it violates CQS. The Create method ought to be a Command, but it returns a value. From the perspective of Encapsulation, this is problematic, because if we look only at the method signature, we could be tricked into believing that since it returns a value, the Create method is a Query, and therefore has no side-effects - which it clearly has. That makes it difficult to reason about the code.

It's also a leaky abstraction, because most developers or architects implicitly rely on a database implementation to provide the ID. What if an implementation doesn't have the ability to come up with a unique ID? This sounds like a Liskov Substitution Principle violation just waiting to happen!

The easiest solution #

When I wrote "you can't just pick a number and expect it to be unique", I almost gave away the easiest solution to this problem:

public interface IRepository<T>
{
    void Create(Guid id, T item);
 
    // other members
}

Just change the ID from an integer to a GUID (or UUID, if you prefer). This does enable clients to come up with a unique ID for the Entity - after all, that's the whole point of GUIDs.

Notice how, in this alternative design, the Create method returns void, making it a proper Command. From an encapsulation perspective, this is important because it enables you to immediately identify that this is a method with side-effects - just by looking at the method signature.

But wait: what if you do need the ID to be an integer? That's not a problem; there's also a solution for that.

A solution with integer IDs #

You may not like the easy solution above, but it is a good solution that fits in a wide variety in situations. Still, there can be valid reasons that using a GUID isn't an acceptable solution:

  • Some databases (e.g. SQL Server) don't particularly like if you use GUIDs as table keys. You can, but integer keys often perform better, and they are also shorter.
  • Sometimes, the Entity ID isn't only for internal use. Once, I worked on a customer support ticket system, and my suggestion of using a GUID as an ID wasn't met with enthusiasm. When a customer calls on the phone about an existing support case, it isn't reasonable to ask him or her to read an entire GUID aloud; it's simply too error-prone.
In some cases, you just need some human-readable integers as IDs. What can you do?

Here's my modified solution:

public interface IRepository<T>
{
    void Create(Guid id, T item);
 
    int GetHumanReadableId(Guid id);
 
    // other members
}

Notice that the Create method is the same as before. The client must still supply a GUID for the Entity, and that GUID may or may not be the 'real' ID of the Entity. However, in addition to the GUID, the system also associates a (human-readable) integer ID with the Entity. Which one is the 'real' ID isn't particularly important; the important part is that there's another method you can use to get the integer ID associated with the GUID: the GetHumanReadableId method.

The Create method is a Command, which is only proper, because creating (or saving) the Entity mutates the state of the system. The GetHumanReadableId method, on the other hand, is a Query: you can call it as many times as you like: it doesn't change the state of the system.

If you want to store your Entities in a database, you can still do that. When you save the Entity, you save it with the GUID, but you can also let the database engine assign an integer ID; it might even be a monotonically increasing ID (1, 2, 3, 4, etc.). At the same time, you could have a secondary index on the GUID.

When a client invokes the GetHumanReadableId method, the database can use the secondary index on the GUID to quickly find the Entity and return its integer ID.

Performance #

"But," you're likely to say, "I don't like that! It's bad for performance!"

Perhaps. My first impulse is to quote Donald Knuth at you, but in the end, I may have to yield that my proposed design may result in two out-of-process calls instead of one. Still, I never promised that my solution wouldn't involve a trade-off. Most software design decisions involve trade-offs, and so does this one. You gain better encapsulation for potentially worse performance.

Still, the performance drawback may not involve problems as such. First, while having to make two round trips to the database may perform worse than a single one, it may still be fast enough. Second, even if you need to know the (human-readable) integer ID, you may not need to know it when you create it. Sometimes you can get away with saving the Entity in one action, and then you may only need to know what the ID is seconds or minutes later. In this case, separating reads and writes may actually turn out to be an advantage.

Must all code be designed like this? #

Even if you disregard your concerns about performance, you may find this design overly complex, and more difficult to use. Do I really recommend that all code should be designed like that?

No, I don't recommend that all code should be designed according to the CQS principle. As Martin Fowler points out, in Object-Oriented Software Construction, Bertrand Meyer explains how to design an API for a Stack with CQS. It involves a Pop Command, and a Top Query: in order to use it, you would first have to invoke the Top Query to get the item on the top of the stack, and then subsequently invoke the Pop Command to modify the state of the stack.

One of the problems with such a design is that it isn't thread-safe. It's also more unwieldy to use than e.g. the standard Stack<T>.Pop method.

My point with all of this isn't to dictate to you that you should always follow CQS. The point is that you can always come up with a design that does.

In my career, I've met many programmers who write a poorly designed class or interface, and then put a little apology on top of it:

public interface IRepository<T>
{
    // CQS is impossible here, because we need the ID
    int Create(T item);
 
    // other members
}

Many times, we don't know of a better way to do things, but it doesn't mean that such a way doesn't exist; we just haven't learned about it yet. In this article, I've shown you how to solve a seemingly impossible design challenge with CQS.

Knowing that even a design problem such as this can be solved with CQS should put you in a better position to make an informed decision. You may still decide to violate CQS and define a Create method that returns an integer, but if you've done that, it's because you've weighed the alternatives; not because you thought it impossible.

Summary #

It's possible to apply CQS to most problems. As always, there are trade-offs involved, but knowing how to apply CQS, even if, at first glance, it seems impossible, is an important design skill.

Personally, I tend to adhere closely to CQS when I do Object-Oriented Design, but I also, occasionally, decide to break that principle. When I do, I do so realising that I'm making a trade-off, and I don't do it lightly.


Comments

How would you model a REST End Point, for example:

POST /api/users

This should return the Location of the created url. Would you suggest, the location be the status of the create user command.

GET /api/users/createtoken/abc123

UI Developers do not like this pattern. It creates extra work for them.
2014-08-12 4:24 UTC
Steve Byrne #
Would raising an event of some kind from the Repository be a valid solution in this case?

(Bear in mind I'm pretty ignorant when it comes to CQRS but I figured that it would get around not being able to return anything directly from a command)

But then again I suppose you'd still need the GetHumanReadableId method for later on...
2014-08-14 00:18 UTC

Tony, thank you for writing. What you suggest is one way to model it. Essentially, if a client starts with:

POST /api/users

One option for a response is, as you suggest, something like:

HTTP/1.1 202 Accepted
Location: /api/status/1234

A client can poll on that URL to get the status of the task. When the resource is ready, it will include a link to the user resource that was finally created. This is essentially a workflow just like the RESTbucks example in REST in Practice. You'll also see an example of this approach in my Pluralsight course on Functional Architecture, and yes, this makes it more complicated to implement the client. One example is the publicly available sample client code for that course.

While it puts an extra burden on the client developer, it's a very robust implementation because it's based on asynchronous workflows, which not only makes it horizontally scalable, but also makes it much easier to implement robust occasionally connected clients.

However, this isn't the only RESTful option. Here's an alternative. The request is the same as before:

POST /api/users

But now, the response is instead:

HTTP/1.1 201 Created
Location: /api/users/8765

{ "name" : "Mark Seemann" }

Notice that this immediately creates the resource, as well as returns a representation of it in the response (consider the JSON in the example a placeholder). This should be easier on the poor UI Developers :)

In any case, this discussion is entirely orthogonal to CQS, because at the boundaries, applications aren't Object-Oriented - thus, CQS doesn't apply to REST API design. Both POST, PUT, and DELETE verbs imply the intention to modify a resource's state, yet the HTTP specification describes how such requests should return proper responses. These verbs violate CQS because they both involve state mutation and returning a response.

2014-08-15 17:57 UTC

Steve, thank you for writing. Yes, raising an event from a Command is definitely valid, since a Command is all about side-effects, and an event is also a side effect. The corollary of this is that you can't raise events from Queries if you want to adhere to CQS. In my Pluralsight course on encapsulation, I briefly touch on this subject in the Encapsulation module (module 2), in the clip called Queries, at 1:52.

(BTW, in this article, and the course as well, I'm discussing CQS (Command Query Separation), not CQRS (Command Query Responsibility Segregation). Although these ideas are related, it's important to realise that they aren't synonymous. CQS dates back to the 1980s, while CQRS is a more recent invention, from the 2000s - the earliest dated publication that's easy to find is from 2010, but it discusses CQRS as though people already know what it is, so the idea must be a bit older than that.)

2014-08-16 7:37 UTC
Kenny Pflug #

Mark, thank you for this post and sorry for my late response, but I've got some questions about it.

First of all, I would not have designed the repository the way you showed at the beginning of your post - instead I would have opted for something like this:

public interface IRepository<T>
{
    T CreateEntity();

    void AttachEntity(T entity);
}

The CreateEntity method behaves like a factory in that it creates a new instance of the specified entity without an ID assigned (the entity is also not attached to the repository). The AttachEntity method takes an existing entity and assigns a valid ID to it. After a call to this method, the client can get the new ID by using the corresponding get method on the entity. This way the repository acts like a service facade as it provides creational, querying and persistance services to the client for a single type of entities.

What do you think about this? From my point of view, the AttachEntity method is clearly a command, but what about the CreateEntity method? I would say it is a query because it does not change the state of the underlying repository when it is called - but is this really the case? If not, could we say that factories always violate CQS? What about a CreateEntity implementation that attaches the newly created entity to the repository?

2014-08-25 7:55 UTC

Kenny, thank you for writing. FWIW, I wouldn't have designed a Repository like I show in the beginning of the article either, but I've seen lots of Repositories designed like that.

Your proposed solution looks like it respects CQS as well, so that's another alternative to my suggested solutions. Based on your suggested method signatures, at least, CreateEntity looks like a Query. If a factory only creates a new object and returns it, it doesn't change the observable state of the system (no class fields mutated, no files were written to disk, no bytes were sent over the network, etc.).

If, on the other hand, the CreateEntity method also attaches the newly created entity to the repository, then it would violate CQS, because that would change the observable state of the system.

Your design isn't too far from the one I suggest in this very article - I just don't include the CreateEntity method in my interface definition, because I consider the creation of an Entity (in memory) to be a separate concern from being able to persist it.

Then, if you omit the CreateEntity method, you'll see that your AttachEntity method plays the same role is my Create method; the only difference is that I keep the ID separate from the Entity, while you have the ID as a writable property on the Entity. Instead of invoking a separate Query (GetHumanReadableId) to figure out what the server-generated ID is, you invoke a Query attached to the Entity (its ID property).

The reason I didn't chose this design is because it messes with the invariants of the Entity. If you consider an Entity as defined in Domain-Driven Design, an Entity is an object that has a long-lasting identity. This is most often implemented by giving the Entity an Id property; I do that too, but make sure that the id is mandatory by requiring clients to supply it through the constructor. As I explain in my Pluralsight course about encapsulation, a very important part about encapsulation is to make sure that it's impossible (or at least difficult) to put an object into an invalid state. If you allow an Entity to be put into a state where it has no ID, I would consider that an invalid state.

2014-08-26 15:10 UTC
Kenny Pflug #

Thank you for your comprehensive answer, Mark. I also like the idea of injecting the ID of an entity directly into its constructor - but I haven't actually applied this 'pattern' in my projects yet. Although it's a bit off-topic, I have to ask: the ID injection would result in something like this:

public abstract class Entity<T>
{
    private readonly T _id;
    
    protected Entity(T id)
    {
        _id = id;
    }

    public T ID { get { return _id; } }

    public override int GetHashCode()
    {
        return _id.GetHashCode();
    }

    public override bool Equals(object other)
    {
        var otherEntity = other as Entity;
        if (otherEntity == null)
            return false;

        return _id == otherEntity._id;
    }
}

I know it is not perfect, so please think about it as a template.

Anyway, I prefer this design but often this results in unnecessarily complicated code, e.g. when the user is able to create an entity object temporarily in a dialog and he or she can choose whether to add it or discard of it at the end of the dialog. In this case I would use a DTO / View Model etc. that only exists because I cannot assign a valid ID to an entity yet.

Do you have a pattern that circumvents this problem? E.g. clone an existing entity with an temporary / invalid ID and assign a valid one to it (which I would find very reasonable)?

2014-08-26 22:30 UTC

Kenny, thank you for writing. Yes, you could model it like that Entity<T> class, although there are various different concerns mixed here. Whether or not you want to override Equals and GetHashCode is independent of the discussion about protecting invariants. This may already be clear to you, but I just wanted to get that out of the way.

The best answer to your question, then, is to make it simpler, if at all possible. The first question you should ask yourself is: can the ID really be any type? Can you have an Entity<object>? Entity<Random>? Entity<UriBuilder>? That seems strange to me, and probably not what you had in mind.

The simplest solution to the problem you pose is simply to settle on GUIDs as IDs. That would make an Entity look like this:

public class Foo
{
    private readonly Guid id;

    public Foo(Guid id)
    {
        if (id == Guid.Empty)
            throw new ArgumentException("Empty GUIDs not allowed.""id");
        this.id = id;
    }

    public Guid Id
    {
        get { return this.id; }
    }

    // other members go here...
}

Obviously, this makes the simplification that the ID is a GUID, which makes it easy to create a 'temporary' instance if you need to do that. You'll also note that it protects its invariants by guarding against the empty GUID.

(Although not particularly relevant for this discussion, you'll also notice that I made this a concrete class. In general, I favour composition over inheritance, and I see no reason to introduce an abstract base class only in order to take and expose an ID - such code is unlikely to change much. There's no behaviour here because I also don't override Equals or GetHashCode, because I've come to realise that doing so with the implementation you suggest is counter-productive... but that, again, is another discussion.)

Then what if you can't use a GUID. Well, you could still resort to the solution that I outline above, and still use a GUID, and then have another Query that can give you an integer that corresponds to the GUID.

However, ultimately, this isn't how I tend to architect my systems these days. As Greg Young has pointed out in a white paper draft (sadly) no longer available on the internet, you can't really expect to make Distributed Domain-Driven Design work with the 'classic' n-layer architecture. However, once you separate reads and writes (AKA CQRS) you no longer have the sort of problem you ask about, because your write model should be modelling commands and events, instead of Entities. FWIW, in my Functional Architecture with F# Pluralsight course, I also touch on some of those concepts.

2014-08-29 17:37 UTC
Vladimir #

It's a great topic. There's another approach used for this problem in some ORMs, particularly in NHibernate: Hi/Lo algorithm.

It is possible to preload a batch of IDs and distribute them among new objects. I've written a blog post about it: link

2014-11-15 14:52 UTC
Irfan Charania #

In the proposed solution that uses both integer IDs and Guids, would you expect child tables to use GUIDs as foreign keys also?

If the aggregate is passed in its entirety into a stored procedure, then scope_identity() can come into play. When a merge/replication scenario is not of concern, then you would be able to make use of integer IDs everywhere internally.

2017-10-13 3:51 UTC

Irfan, thank you for writing. To the degree you choose to use GUIDs as described in this article, it's in order to obey the CQS principle. Thus, it's entirely a concern of the client(s). How you relate tables inside of a relational database is an implementation detail.

If by 'aggregate' you mean an Aggregate Root as described in DDD, then the root is the root of a tree. Translated to typical relational database design, it'd be a row in a table. By definition, any children of such a root don't have individual identity, so I don't see how it'd make sense to associate a GUID with the children; you'd never query them individually, because if you do, then the Aggregate Root is no longer an Aggregate Root.

How you associate the children with the root is an implementation detail. Again, in typical, normalised database design, you'd typically find children via foreign key relationship. Since this is an implementation detail, you can use any key type you'd like for that purpose; most likely, that'll be some sort of integer.

2017-10-14 8:06 UTC
Ben Bedinghaus #

Hi Mark. I noticed in a comment to one of Kenny's questions you said you would not design a repository the way it was originally designed in this example. How would you design a repository layer? I've always found this type of repository design to not quite feel right but see it all over the place.
Thanks,
Ben

2018-06-05 16:12 UTC

Ben, thank you for writing. To be clear, the Repository pattern is one of the most misunderstood and misrepresented design patterns that I can think of. It was originally described in Martin Fowler's book Patterns of Enterprise Application Architecture, but if you look up the pattern description there, it has nothing to do with how people normally use it. It's a prime example of design patterns being ambiguous and open not only to interpretation, but to semantic shift.

Regardless of the interpretation, I don't use the Repository design pattern. Following the Dependency Inversion Principle, "clients […] own the abstract interfaces" (APPP, chapter 11), so if I need to, say, read some data from a database, I define a data reader interface in the client library. Following other SOLID principles as well, I'd typically end up with most interfaces having only a single member. These days, though, if I do Dependency Injection at all, I mostly use partial application of higher-order functions.

2018-06-06 6:32 UTC

Mark, when you get to choose the database design, do most of your tables include both a (supplied) Guid ID and a (database-generated) int ID?

2020-12-03 01:22 UTC

Tyson, thank you for writing. There are a couple of options.

You could just use a GUID as the table key. DBAs typically dislike that option because, if you use that as the table's clustered index, it'll force the database to physically rearrange the table every time you insert a row (since GUIDs aren't, in general, increasing). Notice, however, that this argument is entirely concerned with performance. Using a GUID as a table key has no impact on database design in general. It is, after all, just an integer.

Thus, if the table in question is likely to remain small (i.e. have few rows) and with only the occasional insert, I see no reason to have the standard INT IDENTITY column.

For busier tables, you can have both columns. For example, the code base I'm currently working with defines its Reservations table like this:

CREATE TABLE [dbo].[Reservations] (
    [Id]           INT              NOT NULL IDENTITY,
    [At]           DATETIME2 (7)    NOT NULL,
    [Name]         NVARCHAR (50)    NOT NULL,
    [Email]        NVARCHAR (50)    NOT NULL,
    [Quantity]     INT              NOT NULL,
    [PublicId]     UNIQUEIDENTIFIER NOT NULL UNIQUE,
    [RestaurantId] INT              NOT NULL
    PRIMARY KEY CLUSTERED ([Id] ASC)
)

The Id column exclusively exists to keep the database happy. The code that interacts with the database never uses it; it uses PublicId.

To be honest, I used to have a decent grasp on database design, SQL Server, and related performance considerations, but it's about a decade back. I'm sure you could produce some cases where such a design would be detrimental, but on the other hand, I believe that the majority of code out there isn't even near the extremes where you need to compromise on the design.

2020-12-03 7:27 UTC
Zach Robinson #

Hello, Mark! Thank you for the encyclopedia of knowledge and experience you have shared and continue to share with us through this blog and your books!

I've recently read Domain Modeling Made Functional by Scott Wlaschin and am currently reading your new book, Code That Fits in Your Head. I've taken quite a liking to the idea that not just entities, but also events should be modeled within a domain. In addition to common events such as queries/commands, any expected error or success cases can also be wrapped in a type. This manner of extensive typing seems to help improve the "What you see is all there is" factor when learning/understanding a code base. Given that we have some expected events that occur when interacting with a database through commands and queries, it might make sense to encapsulate that domain model in the return types of these methods.

The query portion is straightforward. As you've discussed in your book, we can use Maybe<T> to indicate a successful or unsuccessful query. We might also be so crazy as to use an Either<T, TException> to propagate some exception message/data to be logged/returned at the bread-end of our impure/pure workflow.

Would it violate CQS to do the same for commands? We're doing a side effect on the database, no question. We're also probably returning some data that is relevant to the incoming command parameter within the resulting message from either the IOSuccess or IOFailure. A pattern I've been using for a web API project is something similar to the following:

public sealed record IOSuccess(string Value);
public abstract record Failure(string Value);
public sealed record ValidationFailure(string Value) : Failure(Value);
public sealed record IOFailure(string Value) : Failure(Value);

public sealed record GetBalloonQuery(BalloonId Id);
public sealed record UpdateBalloonCommand(Balloon Balloon);
public sealed record PopBalloonCommand(BalloonId Id, IPointyObject Needle);

public interface IBalloonRepository
{
  Task<Either<IReadOnlyCollection<Balloon>, IOFailure>> GetBalloonsAsync();
  Task<Either<Balloon, IOFailure>> GetBalloonAsync(GetBalloonQuery query);
  Task<Either<IOSuccess, IOFailure>> UpdateBalloonAsync(UpdateBalloonCommand command);
  Task<Either<IOSuccess, IOFailure>> DeleteBalloonAsync(PopBalloonCommand command);
}
        

I think this fits quite well with the idea that we can XXX out the name of the method and the types will tell the full story, no question at all. I also like that this seems to follow from Domain Driven Design, where we are being super explicit about the domain event because of the specificity of the object parameters we're sending down the wire.

Does this idea hold any merit when looking to design APIs that follow the CQS philosophy? Where might you direct a greenhorn developer to learn more about this topic?

2022-03-31 04:30 UTC

Zach, thank you for writing.

As I write above, it's up to you if you want to follow CQS. If you do, however, you have to play by the rules, and the rules are that Commands don't return data.

We may have to make some concessions to the language in which we work. In C#, for example, we have to accept that Task (without a generic type argument) is 'asynchronous void', even though it looks like a return value.

If you want to be explicit about errors, then yes, there's always a risk that a state mutation may fail. Being explicit about that, too, might compel you to model a Command as having a return type like Task<Either<MyError, Unit>> (customarily, for Either values, errors go to the left, contrary to the isomorphic Result, where they go to the right). I don't, however, design side-effecting actions like that in C#.

Even so, in none of these variations does a Command return data in the happy-path scenario.

The API you suggest violate the rule that Commands mustn't return data. You are, of course, free to design APIs as you like, but CQS it isn't. As a counterexample, UpdateBalloonAsync returns an IOSuccess in the happy-path scenario, and that value again contains a string.

If you removed the string Value from IOSuccess it'd be isomorphic to unit. Transitively, then, the return type of UpdateBalloonAsync would be isomorphic to Task<Either<IOFailure, Unit>> (again, errors go to the left, because 'success is right'). If you allow Commands to throw exceptions, that again would be isomorphic to Task, with the implied understanding that the Command may throw an exception.

Where would I direct someone who want to learn more about the topic?

That's not that easy to answer. As Bertrand Meyer writes:

"Side-effect-producing functions, which have been elevated by some languages (C seems to be the most extreme example) to the status of an institution, conflict with the classical notion of function in mathematics."

As far as I can tell, that 'institution' continues with most mainstream languages like C#, Java, JavaScript, etcetera. This implies that you're unlikely to find strong treatment of the topic of CQS in 'traditional' object-oriented material.

As the above quote also implies by its use of the word function, you'll find that this distinction between pure functions and impure actions is more prevalent and explicit in functional programming.

2022-04-02 8:22 UTC


Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Monday, 11 August 2014 19:40:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Monday, 11 August 2014 19:40:00 UTC