# Tuesday, May 31, 2011

My recent series of blog posts about Poka-yoke Design generated a few responses (I would have been disappointed had this not been the case). Quite a few of these reactions relate to various serialization or translation technologies usually employed at application boundaries: Serialization, XML (de)hydration, UI validation, etc. Note that such translation happens not only at the perimeter of the application, but also at the persistence layer. ORMs are also a translation mechanism.

Common to most of the comments is that lots of serialization technologies require the presence of a default constructor. As an example, the XmlSerializer requires a default constructor and public writable properties. Most ORMs I’ve investigated seem to have the same kind of requirements. Windows Forms and WPF Controls (UI is also an application boundary) also must have default constructors. Doesn’t that break encapsulation? Yes and no.

Objects at the Boundary

It certainly would break encapsulation if you were to expose your (domain) objects directly at the boundary. Consider a simple XML document like this one:

<name>
  <firstName>Mark</firstName>
  <lastName>Seemann</lastName>
</name>

Whether or not we have formal contract (XSD) or not, we might stipulate that both the firstName and lastName elements are required. However, despite such a contract, I can easily create a document that breaks it:

<name>
  <firstName>Mark</firstName>
</name>

We can’t enforce the contract as there’s no compilation step involved. We can validate input (and output), but that’s a different matter. Exactly because there’s no enforcement it’s very easy to create malformed input. The same argument can be made for UI input forms and any sort of serialized byte sequence. This is why we must treat all input as suspect.

This isn’t a new observation at all. In Patterns of Enterprise Application Architecture, Martin Fowler described this as a Data Transfer Object (DTO). However, despite the name we should realize that DTOs are not really objects at all. This is nothing new either. Back in 2004 Don Box formulated the Four Tenets of Service Orientation. (Yes, I know that they are not in vogue any more and that people wanted to retire them, but some of them still make tons of sense.) Particularly the third tenet is germane to this particular discussion:

Services share schema and contract, not class.

Yes, and that means they are not objects. A DTO is a representation of such a piece of data mapped into an object-oriented language. That still doesn’t make them objects in the sense of encapsulation. It would be impossible. Since all input is suspect, we can hardly enforce any invariants at all.

Often, as Craig Stuntz points out in a comment to one of my previous posts, even if the input is invalid, we want to capture what we did receive in order to present a proper error message (this argument also applies on machine-to-machine boundaries). This means that any DTO must have very weak invariants (if any at all).

DTOs don’t break encapsulation because they aren’t objects at all.

Don’t be fooled by your tooling. The .NET framework very, very much wants you to treat DTOs as objects. Code generation ensues.

However, the strong typing provided by such auto-generated classes gives a false sense of security. You may think that you get rapid feedback from the compiler, but there are many possible ways you can get run-time errors (most notably when you forget to update the auto-generated code based on new schema versions).

An even more problematic result of representing input and output as objects is that it tricks lots of developers into dealing with them as though they represent the real object model. The result is invariably an anemic domain model.

More and more, this line of reasoning is leading me towards the conclusion that the DTO mental model that we have gotten used to over the last ten years is a dead end.

What Should Happen at the Boundary

Given that we write write object-oriented code and that data at the boundary is anything but object-oriented, how do we deal with it?

One option is to stick with what we already have. To bridge the gap we must then develop translation layers that can translate the DTOs to properly encapsulated domain objects. This is the route I take with the samples in my book. However, this is a solution that more and more I’m beginning to think may not be the best. It has issues with maintainability. (Incidentally, that’s the problem with writing a book: at the time you’re done, you know so much more than you did when you started out… Not that I’m denouncing the book – it’s just not perfect…)

Another option is to stop treating data as objects and start treating it as the structured data that it really is. It would be really nice if our programming language had a separate concept of structured data… Interestingly, while C# has nothing of the kind, F# has tons of ways to model data structures without behavior. Perhaps that’s a more honest approach to dealing with data… I will need to experiment more with this…

A third option is to look towards dynamic types. In his article Cutting Edge: Expando Objects in C# 4.0, Dino Esposito outlines a dynamic approach towards consuming structured data that shortcuts auto-generated code and provides a lightweight API to structured data. This also looks like a promising approach… It doesn’t provide compile-time feedback, but that’s only a false sense of security anyway. We must resort to unit tests to get rapid feedback, but we’re all using TDD already, right?

In summary, my entire series about encapsulation relates to object-oriented programming. Although there are lots of technologies available to represent boundary data as ‘objects’, they are false objects. Even if we use an object-oriented language at the boundary, the code has nothing to do with object orientation. Thus, the Poka-yoke Design rules don’t apply there.

Now go back and reread this post, but replace ‘DTO’ with ‘Entity’ (or whatever your ORM calls its representation of a relational table row) and you should begin to see the contours of why ORMs are problematic.

Tuesday, May 31, 2011 4:53:26 PM (Romance Daylight Time, UTC+02:00)
Absolutely. I spend a lot of time in this space, an while I enjoy using (and maintaining) tools to make this transition simple, whenever there isthe first hint of a problem I always advise "add a dedicated DTO". A similar question (using interfaces in WCF messages) came up on stackoverflow last week. But ultimately, any boundary talks "data", not behaviour. It also emphasises why BinaryFormatter (implementation-aware) sucks so much for ***any*** kind of data exchange.
Tuesday, May 31, 2011 6:37:16 PM (Romance Daylight Time, UTC+02:00)
Arg, had a longer comment but it got eaten when I clicked Save Comment.

Anyway the gist of it: great post, I commpletely agree that data objects aren't domain objects, and thinking of them as structured data is very liberating. Using Entities/DTOs/data objects as a composible part of a domain object is a nice way to create that separation which still allows you to (potentially directly) use ORM entities while avoiding having an anemic domain model. So, for example, CustomerDomain would/could contain a CustomerEntity, instead of trying to add behavior to CustomerEntity (which might be a generated class).
Phil Sandler
Tuesday, May 31, 2011 8:22:42 PM (Romance Daylight Time, UTC+02:00)
Phil, when you write 'Entity' I immediately think about the definition provided in "Domain-Driven Design". Those Entities are definitely Domain Objects which should have lots of behavior, but I'm getting the feeling that you're thinking about something else?
Wednesday, June 01, 2011 12:09:49 AM (Romance Daylight Time, UTC+02:00)
Hey Mark,

Yep, I meant "entity" more in the sense of how it is defined in some of the popular ORMs, like LLBLGen and Entity Framework. Essentially, an instance that corresponds (more or less) directly to a row in a table or view.
Phil Sandler
Wednesday, June 01, 2011 7:44:13 AM (Romance Daylight Time, UTC+02:00)
Yes, we might compose real domain objects from DTOs, but I'm not sure that is the best solution. When we map non-object-oriented structures to objects, they tend to drag along a lot of baggage which hurts OO modeling.

For instance, generating DTOs from a database schema provides an entire static structure (relationships, etc.) that tend to constrain us when we subsequently attempt to define an object model. I rather prefer being able to work unconstrained and then subsequently figure out how to persist it.
Wednesday, June 01, 2011 10:29:15 AM (Romance Daylight Time, UTC+02:00)
I like this article. As far as I undertand you for every entity you have (a domain object) you need a separate DTO for different representations (XML, SQL, etc.). So you need a third object which translates every DTO into a specific entity. Am I correct? If so, isn't that process too complex?
Wednesday, June 01, 2011 10:41:14 AM (Romance Daylight Time, UTC+02:00)
Yes, that's what I've been doing for the last couple of years, but the downside is the maintenance overhead. That's why I, in the last part of the post, discuss alternatives. Currently I think that using dynamic types looks most promising.

Keep in mind that in most cases, a DTO is not an end-goal in itself. Rather, the end-goal is often the wire-representation of the DTO (XML, JSON, etc.). Wouldn't it be better if we could just skip the DTO altogether and use a bit of convention-based (and testable) dynamic code to make that translation directly?
Wednesday, June 01, 2011 11:13:46 AM (Romance Daylight Time, UTC+02:00)
I guess in WCF service scenarios there is really no other way than to create a DTO layer and compose our domain objects from those DTOs when passing them from/to the client. Since I'm doing a lot of Silverlight development at the moment, this came up quite early.
Florian Hötzinger
Wednesday, June 01, 2011 12:13:08 PM (Romance Daylight Time, UTC+02:00)
@Mark: Yes, that would be quite a better way. Here comes the great article, you've mentioned, about the ExpandoObject and the dynamic way of dealing with json, xml and other representations of data. However, in case of SQL at this moment you still need some DTOs I guess. Maybe if we have some wrapper over the traditional IDataReader or some other mechanism to access the data, it will be possible again.
Wednesday, June 01, 2011 4:34:07 PM (Romance Daylight Time, UTC+02:00)
Hey Mark,

It's a difficult problem to solve perfectly--every solution had downsides. I have gone down the road of mapping DTOs (or data entities) via a translation layer and that can be painful as well. As you said, the tradeoff of using data objects as a composable part of the Domain Object is that you can't model your domain with complete freedom (which DDD purists would likely see as non-negotiable). The upside is that you have the potential for less maintenance.

Phil Sandler
Wednesday, June 01, 2011 8:43:25 PM (Romance Daylight Time, UTC+02:00)
Florian, in WCF it is possible to drop down to the message-level and write directly to the message. Even so, I'm not sure I'd go that route, but it's good to know that the option exists.
Wednesday, June 01, 2011 8:46:13 PM (Romance Daylight Time, UTC+02:00)
Boyan, I was specifically thinking about dropping down to IDataReader. For one example, please refer to Tomas Petricek's article Dynamic in F#: Reading data from SQL database.
Thursday, June 02, 2011 4:37:55 AM (Romance Daylight Time, UTC+02:00)
Hi Mark,

There is lots to argue about in this article (nothing "wrong", just semantics). The main point I would like to pick up on is about your point "What should happen at domain boundaries".

I would contest that rather than "translation" layers, its semantically better to think of it as an "interpretation object". When an app accepts input, there are very few assumptions that should be automatically attached to the unit of input. One of the fundamental ones and often the easiest to break is the assumption that the input is in any way valid, useful or complete.

The semantic concept(abstraction) that is wrapped around the unit of input (the object that encapsulates the input data) need to have a rich interface that can convey abstractions such as "Cannot interpret this input", "Can partially interpret the input but it contains some rubbish", "Input contains SQL injection attack", "Input is poorly formed and has a missing .DTD file" etc.

My argument is that the semantic concept of "translation" while obviously closely related to "interpretation" is semantically a "transformation process". A->B kind of idea, while "interpretation", at least in my head, is not so deterministic and seeks to extract domain abstractions from the input that are then exposed to the application as high level abstractions rather than low level "translated data/safe data/sanitized data".

Thanks,

hotsleeper
Thursday, June 02, 2011 8:51:27 AM (Romance Daylight Time, UTC+02:00)
Makes sense. I mainly used the term 'translation layer' as I assumed that most people would then immediately know what I meant :)
Thursday, June 02, 2011 11:50:15 AM (Romance Daylight Time, UTC+02:00)
What's the processing error that is currently preventing comments? I can't get a usefully long one successfully submitted.
Arved Sandstrom
Thursday, June 02, 2011 12:07:26 PM (Romance Daylight Time, UTC+02:00)
Sorry about that. This blog is running on dasBlog which was last updated two years ago. Migrating the blog to a better engine is on my to-do list, but it'll be a couple of months at least before I get there.
Saturday, June 04, 2011 5:51:30 PM (Romance Daylight Time, UTC+02:00)
FWIW, the maintenance cost is effectively zero if the DTO types are codegened.

Dynamic types are also an option -- anyone who has used Rails/ActiveRecord is familiar with their upsides and downsides. In short, maintaining them is free, but using them costs more.

My preferred approach (for now) is to use EF entities as boundary objects for the DB. They cost almost nothing to maintain -- a couple of mouse clicks when we change DB schemata -- since we use "Database First" modeling for internal reasons.

You only get into encapsulation trouble with this if you try to use the EF entities as business objects -- something many people do. My $0.02 is that it's usually wrong to put any kind of behavior on an EF entity.
Saturday, June 04, 2011 11:13:04 PM (Romance Daylight Time, UTC+02:00)
I agree on the rule of never putting any behavior on an EF entity (although, still, I used to put a translation method on most of them: ToDomainObject(), but still...).

My personal experience with EF was that even with code generation, it was far from frictionless, but YMMV...
Sunday, June 05, 2011 1:18:45 PM (Romance Daylight Time, UTC+02:00)
Hi Mark,

Depends! If we're building a framework that is used by others then there is a big difference between the applications boundary to the "outside world" (DTO's) and persistence.

A quote taken from NHDay - Loosely Coupled Complexity - CQRS (see minute 2:10) What is wrong with a design with DTO's? "Nothing." End of presentation. But then continues into CQRS :)

Especially the translation-layer when requests come into our application, and we try to rebuild the OO model to help save new objects with an ORM can be cumbersome.

The point is that the database follows from the domain design, not other way round like EF or ORM class generators make you try to believe.
Sunday, June 05, 2011 1:45:42 PM (Romance Daylight Time, UTC+02:00)
The point being made here is that despite the name, DTOs aren't 'objects'. It's certainly possible to create a complex application entirely with DTOs - it's just not object-oriented.

There may be nothing wrong with that, but if you decide that you need object-orientation to solve a business problem, you must transform DTOs into proper objects, because it's impossible to make OOD with DTOs.
Sunday, June 05, 2011 8:59:33 PM (Romance Daylight Time, UTC+02:00)
The "everything is an ________" mantra gets us in trouble every time, whether it's strings in TCL, objects in Ruby, etc. Objects are a programming construct in practice, and the problem is we're trying to treat data as objects... which it really isn't. I echo your sentiments to "stop treating data as objects and start treating it as the structured data that it really is."
Wednesday, June 08, 2011 4:14:18 PM (Romance Daylight Time, UTC+02:00)
Great article!

But why not think this further? What´s a boundary? Is a boundary where data is exchanged between processes? Or processes on different machines? Processes implemented using different platforms? Or is a boundary where data moves between .NET AppDomains? Or between threads?

The past year I´ve felt great relieve by very strictly appyling this rule: data that moves around is, well, just data.

That does not mean I´m back to procedural programming. I appreciate the benefits of object oriented languages.

But just because I can combine data and functions into one "thing", I should not always do it.

If data is state of a behavior, then data+function makes sense.

But if data is pushed around between "behaviroal objects" then data should be just data (maybe spiced with some convenience functions).

So what I´ve come to doing is "data flow design" or "behaviroal design". And that plays very nicely with async programming and distributing code across processes.
Thursday, June 09, 2011 11:47:16 AM (Romance Daylight Time, UTC+02:00)
Yes, we can take it further, but I don't believe that one approach necessarily rules out the other. However, it depends on how you decide to pass around structured data.

If you do it in a request/response style (even mapped into internal code), I'd say that you'd be doing procedural programming.

However, if you do it as Commands, passing structured data to void methods, you'd basically be heading in the direction of Pipes and Filters architecture. That's actually a very good place to be.
All comments require the approval of the site owner before being displayed.
Name
E-mail
(will show your gravatar icon)
Home page

Comment (Some html is allowed: a@href@title, b, em, i, strike, strong) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

Enter the code shown (prevents robots):

Live Comment Preview