Serialization with and without Reflection by Mark Seemann
An investigation of alternatives.
I recently wrote a tweet that caused more responses than usual:
"A decade ago, I used .NET Reflection so often that I know most the the API by heart.
"Since then, I've learned better ways to solve my problems. I can't remember when was the last time I used .NET Reflection. I never need it.
"Do you?"
Most people who read my tweets are programmers, and some are, perhaps, not entirely neurotypical, but I intended the last paragraph to be a rhetorical question. My point, really, was to point out that if I tell you it's possible to do without Reflection, one or two readers might keep that in mind and at least explore options the next time the urge to use Reflection arises.
A common response was that Reflection is useful for (de)serialization of data. These days, the most common case is going to and from JSON, but the problem is similar if the format is XML, CSV, or another format. In a sense, even reading to and from a database is a kind of serialization.
In this little series of articles, I'm going to explore some alternatives to Reflection. I'll use the same example throughout, and I'll stick to JSON, but you can easily extrapolate to other serialization formats.
Table layouts #
As always, I find the example domain of online restaurant reservation systems to be so rich as to furnish a useful example. Imagine a multi-tenant service that enables restaurants to take and manage reservations.
When a new reservation request arrives, the system has to make a decision on whether to accept or reject the request. The layout, or configuration, of tables plays a role in that decision.
Such a multi-tenant system may have an API for configuring the restaurant; essentially, entering data into the system about the size and policies regarding tables in a particular restaurant.
Most restaurants have 'normal' tables where, if you reserve a table for three, you'll have the entire table for a duration. Some restaurants also have one or more communal tables, typically bar seating where you may get a view of the kitchen. Quite a few high-end restaurants have tables like these, because it enables them to cater to single diners without reserving an entire table that could instead have served two paying customers.
In Copenhagen, on the other hand, it's also not uncommon to have a special room for larger parties. I think this has something to do with the general age of the buildings in the city. Most establishments are situated in older buildings, with all the trappings, including load-bearing walls, cellars, etc. As part of a restaurant's location, there may be a big cellar room, second-story room, or other room that's not practical for the daily operation of the place, but which works for parties of, say, 15-30 people. Such 'private dining' rooms can be used for private occasions or company outings.
A maître d'hôtel may wish to configure the system with a variety of tables, including communal tables, and private dining tables as described above.
One way to model such requirements is to distinguish between two kinds of tables: Communal tables, and 'single' tables, and where single tables come with an additional property that models the minimal reservation required to reserve that table. A JSON representation might look like this:
{ "singleTable": { "capacity": 16, "minimalReservation": 10 } }
This may represent a private dining table that seats up to sixteen people, and where the maître d'hôtel has decided to only accept reservations for at least ten guests.
A singleTable
can also be used to model 'normal' tables without special limits. If the restaurant has a table for four, but is ready to accept a reservation for one person, you can configure a table for four, with a minimum reservation of one.
Communal tables are different, though:
{ "communalTable": { "capacity": 10 } }
Why not just model that as ten single tables that each seat one?
You don't want to do that because you want to make sure that parties can eat together. Some restaurants have more than one communal table. Imagine that you only have two communal tables of ten seats each. What happens if you model this as twenty single-person tables?
If you do that, you may accept reservations for parties of six, six, and six, because 6 + 6 + 6 = 18 < 20. When those three groups arrive, however, you discover that you have to split one of the parties! The party getting separated may not like that at all, and you are, after all, in the hospitality business.
Exploration #
In each article in this short series, I'll explore serialization with and without Reflection in a few languages. I'll start with Haskell, since that language doesn't have run-time Reflection. It does have a related facility called generics, not to be confused with .NET or Java generics, which in Haskell are called parametric polymorphism. It's confusing, I know.
Haskell generics look a bit like .NET Reflection, and there's some overlap, but it's not quite the same. The main difference is that Haskell generic programming all 'resolves' at compile time, so there's no run-time Reflection in Haskell.
If you don't care about Haskell, you can skip that article.
- Serializing restaurant tables in Haskell
- Serializing restaurant tables in F#
- Serializing restaurant tables in C#
As you can see, the next article repeats the exercise in F#, and if you also don't care about that language, you can skip that article as well.
The C# article, on the other hand, should be readable to not only C# programmers, but also developers who work in sufficiently equivalent languages.
Descriptive, not prescriptive #
The purpose of this article series is only to showcase alternatives. Based on the reactions my tweet elicited I take it that some people can't imagine how serialisation might look without Reflection.
It is not my intent that you should eschew the Reflection-based APIs available in your languages. In .NET, for example, a framework like ASP.NET MVC expects you to model JSON or XML as Data Transfer Objects. This gives you an illusion of static types at the boundary.
Even a Haskell web library like Servant expects you to model web APIs with static types.
When working with such a framework, it doesn't always pay to fight against its paradigm. When I work with ASP.NET, I define DTOs just like everyone else. On the other hand, if communicating with a backend system, I sometimes choose to skip static types and instead working directly with a JSON Document Object Model (DOM).
I occasionally find that it better fits my use case, but it's not the majority of times.
Conclusion #
While some sort of Reflection or metadata-driven mechanism is often used to implement serialisation, it often turns out that such convenient language capabilities are programmed on top of an ordinary object model. Even isolated to .NET, I think I'm on my third JSON library, and most (all?) turned out to have an underlying DOM that you can manipulate.
In this article I've set the stage for exploring how serialisation can work, with or (mostly) without Reflection.
If you're interested in the philosophy of science and epistemology, you may have noticed a recurring discussion in academia: A wider society benefits not only from learning what works, but also from learning what doesn't work. It would be useful if researchers published their failures along with their successes, yet few do (for fairly obvious reasons).
Well, I depend neither on research grants nor salary, so I'm free to publish negative results, such as they are.
Not that I want to go so far as to categorize what I present in the present articles as useless, but they're probably best applied in special circumstances. On the other hand, I don't know your context, and perhaps you're doing something I can't even imagine, and what I present here is just what you need.
Comments
Haskell (the language) does not provide primitives to access to data representation per-se, during the compilation GHC (the compiler) erase a lot of information (more or less depending on the profiling flags) in order to provide to the run-time system (RTS) a minimal "bytecode".
That being said, provide three ways to deal structurally with values:
Template Haskell is low-level as it implies to deal with the AST, it is also harder to debug as, in order to be evaluated, the main compilation phase is stopped, then the Template Haskell code is ran, and finally the main compilation phase continue. It also causes compilation cache issues.
Generics take type's structure and generate a representation at type-level, the main idea is to be able to go back and forth between the type and its representation, so you can define so behavior over a structure, the good thing being that, since the representation is known at compile-time, many optimizations can be done. On complex types it tends to slow-down compilation and produce larger-than-usual binaries, it is generraly the way libraries are implemented.
Typeable is a purely value-level type representation, you get only on type whatever the type structure is, it is generally used when you have "dynamic" types, it provides safe ways to do coercion.
Haskell tends to push as much things as possible in compile-time, this may explain this tendency.
Thank you for writing. I was already aware of Template Haskell and Generics, but Typeable is new to me. I don't consider the first two equivalent to Reflection, since they resolve at compile time. They're more akin to automated code generation, I think. Reflection, as I'm used to it from .NET, is a run-time feature where you can inspect and interact with types and values as the code is executing.
I admit that I haven't had the time to more than browse the documentation of Typeable, and it's so abstract that it's not clear to me what it does, how it works, or whether it's actually comparable to Reflection. The first question that comes to my mind regards the type class
Typeable
itself. It has no instances. Is it one of those special type classes (like Coercible) that one doesn't have to explicitly implement?I don't know the .NET ecosystem well, but I guess you can borrow information at run-time we have at compile-time with TemplateHaskell and Generics, I think you are right then.
You can derive
Typeable
as any othe type classes:It's pretty "low-level",
typeRep
gives you aTypeRep a
(a
being the type represented) which is a representation of the type with primitive elements (More details here).Then, you'll be able to either pattern match on it, or cast it (which it not like casting in Java for example, because you are just proving to the compiler that two types are equivalent).