Is it a violation of Command Query Separation to update an ID property when saving an Entity to a database?

In my Encapsulation and SOLID course on Pluralsight, I explain how the elusive object-oriented quality encapsulation can be approximated by the actionable principles of Command Query Separation (CQS) and Postel's law.

One of the questions that invariably arise when people first learn about CQS is how to deal with (database) server-generated IDs. While I've already covered that question, I recently came upon a variation of the question:

"The Create method is a command, then if the object passed to this method have some changes in their property, does it violate any rule? I mean that we can always get the new id from the object itself, so we don't need to return another integer. Is it good or bad practice?"
In this article, I'll attempt to answer this question.

Returning an ID by mutating input

I interpret the question like this: an Entity (as described in DDD) can have a mutable Id property. A Create method could save the Entity in a database, and then update the input value's Id property with the newly created record's ID.

As an example, consider this User class:

public class User
{
    public int Id { getset; }
 
    public string FirstName { getset; }
 
    public string LastName { getset; }
}

In order to create a new User in your database, you could define an API like this:

public interface IUserRepository
{
    void Create(User user);
}

An implementation of IUserRepository based on a relational database could perform an INSERT into the appropriate database, get the ID of the created record, and update the User object's Id property.

This test snippet demonstrates that behaviour:

var u = new User { FirstName = "Jane", LastName = "Doe" };
Assert.Equal(0, u.Id);
 
repository.Create(u);
Assert.NotEqual(0, u.Id);

When you create the u object, by not assigning a value to the Id property, it will have the default value of 0. Only after the Create method returns does Id hold a proper value.

Evaluation

Does this design adhere to CQS? Yes, it does. The Create method doesn't return a value, but rather changes the state of the system. Not only does it create a record in your database, but it also changes the state of the u object. Nowhere does CQS state that an operation can't change more than a single part of the system.

Is it good design, then? I don't think that it is.

First, the design violates another part of encapsulation: protection of invariants. The invariants of any Entity is that is has an ID. It is the single defining feature of Entities that they have enduring and stable identities. When you change the identity of an Entity, it's no longer the same Entity. Yet this design allows exactly that:

u.Id = 42;
// ...
u.Id = 1337;

Second, such a design puts considerable trust in the implicit protocol between any client and the implementation of IUserRepository. Not only must the Create method save the User object in a data store, but it must also update the Id.

What happens, though, if you replay the method call:

repository.Create(new User { FirstName = "Ada", LastName = "Poe" });
repository.Create(new User { FirstName = "Ada", LastName = "Poe" });

This will result in duplicated entries, because the repository can't detect whether this is a replay, or simply two new users with the same name. You may find this example contrived, but in these days of cloud-based storage, it's common to apply retry strategies to clients.

If you use one of the alternatives I previously outlined, you will not have this problem.

Invariants

Even if you don't care about replays or duplicates, you should still consider the invariants of Entities. As a minimum, you shouldn't be able to change the ID of an Entity. For a class like User, it'll also make client developers' job easier if it can provide some useful guarantees. This is part of Postel's law applied: be conservative in what you send. In this case, at least guarantee that no values will be null. Thus, a better (but by no means perfect) User class could be:

public class User
{
    public User(int id)
    {
        if (id <= 0)
            throw new ArgumentOutOfRangeException(
                nameof(id),
                "The ID must be a (unique) positive value.");;
 
        this.Id = id;
        this.firstName = "";
        this.lastName = "";
    }
 
    public int Id { get; }
 
    private string firstName;        
    public string FirstName
    {
        get { return this.firstName; }
        set
        {
            if (value == null)
                throw new ArgumentNullException(nameof(value));
            this.firstName = value;
        }
    }
 
    private string lastName;
    public string LastName
    {
        get { return this.lastName; }
        set
        {
            if (value == null)
                throw new ArgumentNullException(nameof(value));
            this.lastName = value;
        }
    }
}

The constructor initialises all class fields, and ensure that Id is a positive integer. It can't ensure that the ID is unique, though - at least, not without querying the database, which would introduce other problems.

How do you create a new User object and save it in the database, then?

One option is this:

public interface IUserRepository
{
    void Create(string firstName, string lastName);
 
    User Read(int id);
 
    void Update(User user);
 
    void Delete(int id);
}

The Create method doesn't use the User class at all, because at this time, the Entity doesn't yet exist. It only exists once it has an ID, and in this scenario, this only happens once the database has stored it and assigned it an ID.

Other methods can still use the User class as either input or output, because once the Entity has an ID, its invariants are satisfied.

There are other problems with this design, though, so I still prefer the alternatives I originally sketched.

Conclusion

Proper object-oriented design should, as a bare minimum, consider encapsulation. This includes protecting the invariants of objects. For Entities, it means that an Entity must always have a stable and enduring identity. Well-designed Entities guarantee that identity values are always present and immutable. This requirement tend to be at odds with most popular Object-Relational Mappers (ORM), which require that ID fields are externally assignable. In order to use ORMs, most programmers compromise on encapsulation, ending up with systems that are relational in nature, but far from object-oriented.


Comments

>" This includes protecting the invariants of objects. For Entities, it means that an Entity must always have a stable and enduring identity."

Great points.

However, how would you get from void Create(string firstName, string lastName) to User Read(int id) especially if an entity doesn't have a natural key?

2016-05-06 18:26 UTC

Vladimir, thank you for writing. Exactly how you get the ID depends on the exact requirements of your application. As an example, if you're developing a web application, you may simply be creating the user, and then that's the end of the HTTP request. Meanwhile, asynchronously, a background process is performing the actual database insertion and subsequently sends an email to the user with a link to click in order to verify the email. In that link is the generated ID, so when your server receives the next request, you already know the ID.

That may not always be a practical strategy, but I started to describe this technique, because my experience with CQRS and REST tells me that you have lots of options for communicating state and identity. Still, if you need an ID straight away, you can always pass a GUID as a correlation ID, so that you can later find the Entity you created. That's the technique I originally described.

2016-05-06 19:04 UTC

Thank you for the reply.

BTW, ORMs don't require you to make Ids externally assignable. You can have ORMs assign Ids for you internally, and that's a preferred design in most cases. Also, you can make the Id setters non-public and thus protect entities' encapsulation. There still are issues, of course. Namely, entities don't have established identities until they are saved to the DB or added to the Context/Session. But this issue isn't that big comparing to what you described in the post.

2016-05-06 19:25 UTC
Ethan Nelson

I was skimming through the comments and had this nagging suspicion that the entire topic of, "how do I get my ID with CQS", was a false dichotomy. The premise that the "ID" stored in the database is the "identity" of the object or entity I think can be challenged.

I prefer to look at the "ID" as purely database implementation details. Assuming the repository is SQL, the key type preferred by db admins will be the smallest-reasonable monotonically increasing ID... namely INT. But switch to Amazon DynamoDB and that conclusion flies out the window. With their totally managed and distributed indexes and hashing algorithms... GUID's would be fantastic.

What I WISH developers would do is take the time to think about their Domain and form their own understanding of what truly defines the identity of the entity. Whatever decision they make, it translates to a "natural key" for persistence.

Natural keys are great because a developer has promised, "these combinations of properties uniquely identify entities of type T". The result is unique indexes that can have dramatic performance benefits... outside the application benefit of being able to construct your repository like this:

public interface IRepository<T>
{
    void Create(T item);
}

As for the "ID" field? Who cares?... that's up to the database. The developer already knows exactly how to find this T. I wouldn't even include it on the domain model object.

This would give db admins options. We can look at the natural key size, evaluate how it is queried and order it appropriately. We can decide if the nature of typical joins merit clustering (as is common with Header -> Detail style records). If needed, we can optimize lookup of large keys through stored procedures and computed persisted checksum columns. Nullability and default values become part of design discussions on the front end. Finally, if we feel like it... we can assign a surrogate key, or use a GUID. The developer doesn't care, our dba's do.

Thinking back to my last 6 or so development projects, I've never had to "get the id". The next operation is always a matter of query or command on unique criteria I was responsible for creating in the first place. This approach makes my dba's happy.

2016-05-10 15:06 UTC

Ethan, thank you for your comment; I agree with 90 percent of what you wrote, which is also the reason I prefer the options I previously outlined.

Entities don't always have 'natural IDs', though, and even when they have, experience has taught me not to use them. Is a person's email address a natural ID? It is, but people change email addresses.

Here in Denmark, for many years many public software systems have used the Danish personal identification numbers as natural keys, as they were assumed to be nationally unique and stable. These numbers encode various information about a person, such as birth-date and sex. A few inhabitants, however, get sex change operations, and then desire a new number that corresponds to their new sex. Increasingly, these requests are being granted. New ID, same person.

I've seen such changes happen so many times during my career that I've become distrustful of 'natural IDs'. If I need to identify an Entity, I usually attach a GUID to it.

2016-05-10 20:04 UTC
Ethan Nelson

With regard to entity identification, I'm not sure stability matters. Who cares if the email address changes... as long as it is unique? Uniqueness is sufficient for the application to identify the entity. I agree from a strictly database standpoint; I'm not thrilled about assigning a primary key to a natural key due to the stability issue across relationships. But that does not invalidate the key for use as an identity in my mind. Both indexes can exist, both be unique, and each with their own purpose.

Also, it is difficult for me to envision a scenario where there is no natural key. If there are two rows where the only difference is an auto-incrementing integer, or different guids... what is the meaning of one row versus the next? That meaning cannot, by definition, be contained in the surrogate keys... they add nothing to the business meaning of the entity.

I feel I've derailed the discussion into database modeling theory, and we are talking about CQS with "create the entity" as a stimulating use case. Suffice it to say, IMHO, if (a, b) is unique today according to business rule, than it should be my unique identity from a developer perspective. By leaving row identifiers up to DBA's and dropping them from my domain model entirely, I minimize impact to those technical implementations when the business rules change such that (a, b, c) becomes the new unique, and, preserve CQS in the process.

2016-05-10 23:41 UTC
Endy Tjahjono

SQL Server has a feature called 'sequence' (from 2012 and above if I'm not mistaken). We can use sequence to generate unique int ID instead of using identity column. The sequence can be called without inserting row to the table. We can create a service that call this sequence to generate the next running number. The UI code can call this running number generator service to get a new ID, and then send the new entity complete with ID to the create service.

2017-02-06 07:19:00 UTC


Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or Google Plus, or somewhere else with a permalink. Ping me with the link, and I may add it as a comment.

Published

Friday, 06 May 2016 17:36:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!