IQueryable is Tight Coupling by Mark Seemann
From time to time I encounter people who attempt to express an API in terms of IQueryable<T>. That's almost always a bad idea. In this post, I'll explain why.
In short, the IQueryable<T> interface is one of the best examples of a Header Interface that .NET has to offer. It's almost impossible to fully implement it.
Please note that this post is about the problematic aspects of designing an API around the IQueryable<T> interface. It's not an attack on the interface itself, which has its place in the BCL. It's also not an attack on all the wonderful LINQ methods available on IEnumerable<T>.
You can say that IQueryable<T> is one big Liskov Substitution Principle (LSP)violation just waiting to happen. In the next two section, I will apply Postel's law to explain why that is.
Consuming IQueryable<T> #
The first part of Postel's law applied to API design states that an API should be liberal in what it accepts. In other words, we are talking about input, so an API that consumes IQueryable<T> would take this generalized shape:
IFoo SomeMethod(IQueryable<Bar> q);
Is that a liberal requirement? It most certainly is not. Such an interface demands of any caller that they must be able to supply an implementation of IQueryable<Bar>. According to the LSP we must be able to supply any implementation without changing the correctness of the program. That goes for both the implementer of IQueryable<Bar> as well as the implementation of SomeMethod.
At this point it's important to keep in mind the purpose of IQueryable<T>: it's intended for implementation by query providers. In other words, this isn't just some sequence of Bar instances which can be filtered and projected; no, this is a query expression which is intended to be translated into a query somewhere else - most often some dialect of SQL.
That's quite a demand to put on the caller.
It's certainly a powerful interface (or so it would seem), but is it really necessary? Does SomeMethod really need to be able to perform arbitrarily complex queries against a data source?
In one recent discussion, it turns out that all the developer really wanted to do was to be able to select based on a handful of simple criteria. In another case, the developer only wanted to do simple paging.
Such requirements could be modeled much simpler without making huge demands on the caller. In both cases, we could provide specialized Query Objects instead, or perhaps even simpler just a set of specialized queries:
IFoo FindById(int fooId); IFoo FindByCorrelationId(int correlationId);
Or, in the case of paging:
IEnumerable<IFoo> GetFoos(int page);
This is certainly much more liberal in that it requires the caller to supply only the required information in order to implement the methods. Designing APIs in terms of Role Interfaces instead of Header Interfaces makes the APIs much more flexible. This will enable you to respond to change.
Exposing IQueryable<T> #
The other part of Postel's law states that an API should be conservative in what it sends. In other words, a method must guarantee that the data it returns conforms rigorously to the contract between caller and implementer.
A method returning IQueryable<T> would take this generalized shape:
IQueryable<Bar> GetBars();
When designing APIs, a huge part of the contract is defined by the interface (or base class). Thus, the return type of a method specifies a conservative guarantee about the returned data. In the case of returning IQueryable<Bar> the method thus guarantees that it will return a complete implementation of IQueryable<Bar>.
Is that conservative?
Once again invoking the LSP, a consumer must be able to do anything allowed by IQueryable<Bar> without changing the correctness of the program.
That's a big honking promise to make.
Who can keep that promise?
Current Implementations #
Implementing IQueryable<T> is a huge undertaking. If you don't believe me, just take a look at the official Building an IQueryable provider series of blog posts. Even so, the interface is so flexible and expressive that with a single exception, it's always possible to write a query that a given provider can't translate.
Have you ever worked with LINQ to Entities or another ORM and received a NotSupportedException? Lots of people have. In fact, with a single exception, it's my firm belief that all existing implementations violate the LSP (in fact, I challenge my readers to refer me to a real, publicly available implementation of IQueryable<T> that can accept any expression I throw at it, and I'll ship a free copy of my book to the first reader to do so).
Furthermore, the subset of features that each implementation supports varies from query provider to query provider. An expression that can be translated by the Entity framework may not work with Microsoft's OData query provider.
The only implementation that fully implements IQueryable<T> is the in-memory implementation (and referring to this one does not earn you a free book). Ironically, this implementation can be considered a Null Object implementation and goes against the whole purpose of the IQueryable<T> interface exactly because it doesn't translate the expression to another language.
Why This Matters #
You may think this is all a theoretical exercise, but it actually does matter. When writing Clean Code, it's important to design an API in such a way that it's clear what it does.
An interface like this makes false guarantees:
public interface IRepository { IQueryable<T> Query<T>(); }
According to the LSP and Postel's law, it would seem to guarantee that you can write any query expression (no matter how complex) against the returned instance, and it would always work.
In practice, this is never going to happen.
Programmers who define such interfaces invariably have a specific ORM in mind, and they implicitly tend to stay within the bounds they know are safe for that specific ORM. This is a leaky abstraction.
If you have a specific ORM in mind, then be explicit about it. Don't hide it behind an interface. It creates the illusion that you can replace one implementation with another. In practice, that's impossible. Imagine attempting to provide an implementation over an Event Store.
The cake is a lie.
Comments
If you have the following query you get different results with the EF LINQ provider and the LINQ to Objects provider
var r = from x in context.Xs select new { x.Y.Z }
dependant on the type of join between the sets or if Y is null
makes in-memory testing difficult
P.S. Kudos by the way. I love every single piece of it.
Also, having a generic repository as an intermediate step between an ORM and a specific repository lets you unit test specific repositories and/or use them with multiple persistence mechanisms (which is sometimes desirable). Despite the leaks, this "pattern" does have some uses in my book when used VERY carefully (although it is misused in 99 cases out of 100 when you see it).
Everybody who wants to implement the repositories using EF would use EntityRepository and everybody who wants to use other ORMs can create his own derived abstraction from IRepository.
But the consumer codes in my library just use the IRepository inteface (Ex. A CRUD ViewModel base class).
It it correct?
See http://bartdesmet.net/blogs/bart/archive/2008/08/15/the-most-funny-interface-of-the-year-iqueryable-lt-t-gt.aspx and also take a look at the section 9min 30 secs into this talk http://channel9.msdn.com/Events/PDC/PDC10/FT10
Who on Earth defines an API that *takes* an IQueryable? Nobody I know, and
I've never had the need to. So the first half of the post is kinda
irrelevant ;)
-- Implementing IQueryable{T} is a huge undertaking.
That's why NOBODY has to. The ORM you use will (THEY will take the huge
undertaking and they have, go look for EF and NHibernate and even RavenDB
who do provide such an API), as well as Linq to Objects for your testing
needs.
So, the IRepository.Query{T} is doing *exactly* what its definition
promises: "Mediates between the domain and data mapping layers using a
collection-like interface for accessing domain objects." (
http://martinfowler.com/eaaCatalog/repository.html)
In the ORM-implemented case, it goes against the data mapping layer, in the
Linq to Objects case, it does directly over the collection of objects.
It's the PERFECT abstraction to make code that goes against a DB/repository
unit-testable. (not that it frees you from having full integration
end-to-end tests that go against the *real* repository to make sure you're
not using any unsupported constructs against the in-memory version).
IQueryable{T} got rid of an entire slew of useless interfaces to abstract
the real repository. We should wholeheartedly embrace it and move forward,
instead of longing for the days when we had to do all that manually ;)
That's the thing, if this interface is used in the read side of a CQRS
system, there's no event store there. Views need flexible queryability. No?
Saying this:
public IQueryable{Customer} GetActiveCustomers() {
return CustomersDb.Where(x => x.IsActive).ToList().AsQueryable();
}
tells consumers that the return value of the method is a representation of how the results will be materialized when the query is executed. Saying this, OTOH:
public IEnumerable{Customer} GetActiveCustomers() {
return CustomersDb.Where(x => x.IsActive).ToList();
}
explicitly tells consumers that you are returning the *results* of a query, and it's none of the consuming code's business to know how it got there.
Materialization of those query results is what you seemed to be focused on, and that's not IQueryable's task. That's the task of the specific LINQ provider. IQueryable's job is only to *describe* queries, not actually execute them.
A repository interface should be explicit about the operations it provides over the set of data, and should not open the door to arbitrary querying that is not part of the applications overall design / architecture. YAGNI
I know that this may fly in the face of those who are used to arbitrary late-bound SQL style query flexibility and code-gen'd ORMs, but this type of setup is not one that's conducive to performance or scaling in the long term. ORMs may be good to get something up and running without a lot of effort (or thought into how the data is used), but they rapidly show their cracks / leaks over time. This is why you see sites like StackOverflow as they reach scale adopt MicroORMs that are closer to the metal like Dapper instead of using EF, L2S or NHibernate. (Other great Micro ORMs are Massive, OrmLite, and PetaPoco to name just a few.)
Is it work to be explicit in your contracts around data access? Absolutely... but this is a better long-term design decision when you factor in testability and performance goals. How do you test the perf characteristics of an API that doesn't have well-defined operations? Well... you can't really.
Also -- consider stores that are *not* SQL based as Mark alludes to. We're using Riak, where querying capabilities are limited in scope and there are performance trade-offs to certain types of operations on secondary indices. We accept this tradeoff b/c the operational characteristics of Riak are unparalleled and it has great latency characteristics when retrieving data. We need to be explicit about how we allow data to be retrieved in our data access layer because some things are a no-go in our underlying store (for instance, listing keys is extremely expensive at the moment in Riak).
Bind yourself to IQueryable and you've handcuffed yourself. IQueryable is simply not a fit in a lot of cases, and in others it's total overkill... data access is not one size fits all. Embrace the polyglot!
However, I don't yet see how to design an easy to use and read API with query objects. Maybe you could provide a repository implementation that works without IQueryable but is still readable and easy to use?
However, exposing IQueryable in an API is so powerful that it's worth this minor annoyance.
I like the idea of the OData or Web API queries, but would rather they just expose the query parameters and let us build adapters from the query to the datastore as relevant.
The whole idea of web service interfaces is to expose a technology-agnostic interoperable API to the outside world.
Exposing an IQueryable/OData endpoint effectively couples your services to using OData indefinitely as you wont be able to feasibly determine what 'query space' existing clients are binded to, i.e. what existing queries/tables/views/properties you need to freeze/support indefinitely. This is an issue when exposing any implementation at the surface area of your API, as it limits your ability to add your own custom logic on it, e.g. Authorization, Caching, Monitoring, Rate-Limiting, etc. And because OData is really slow, you'll hit performance/scaling problems with it early. The lack of control over the endpoint, means you're effectively heading for a rewrite: https://coldie.net/?tag=servicestack
Lets see how feasible is would be to move off an oData provider implementation by looking at an existing query from Netflix's OData api:
http://odata.netflix.com/Catalog/Titles?$filter=Type%20eq%20'Movie'%20and%20(Rating%20eq%20'G'%20or%20Rating%20eq%20'PG-13')
This service is effectively now coupled to a Table/View called 'Titles' with a column called 'Type'.
And how it would be naturally written if you weren't using OData:
http://api.netflix.com/movies?ratings=G,PG-13
Now if for whatever reason you need to replace the implementation of the existing service (e.g. a better technology platform has emerged, it runs too slow and you need to move this dataset over to a NoSQL/Full-TextIndexing-backed sln) how much effort would it take to replace the OData impl (which is effectively binded to an RDBMS schema and OData binary impl) to the more intuitive impl-agnostic query below? It's not impossible, but as it's prohibitively difficult to implement an OData API for a new technology, that rewriting + breaking existing clients would tend to be the preferred option.
Letting internal implementations dictate the external facing url structure is a sure way to break existing clients when things need to change. This is why you should expose your services behind Cool URIs, i.e. logical permanent urls (that are unimpeded by implementation) that do not change, as you generally don't want to limit the technology choices of your services.
It might be a good option if you want to deliver adhoc 'exploratory services' on it, but it's not something I'd ever want external clients binded to in a production system. And if I'm only going to limit its use to my local intranet what advantages does it have over just giving out a read-only connection string? which will allow others use with their favourite Sql Explorer, Reporting or any other tools that speaks SQL.
The testable interface abstraction allows for one-liners in the ORM-bound implementation, as well as the testable fake. Doesn't get any simpler than that.
That hardly seems like complicated stuff to me.
Mario, thank you for writing. It occasionally happens that I change my mind, as I learn new skills and gain more experience, but in this case, I haven't changed my mind. I indirectly use IQueryable<T> when I occasionally have to query a relational database with the Entity Framework or LINQ to SQL (which happens extremely rarely these days), but I never design my own interfaces around IQueryable<T>; my reasons for introducing an interface is normally to reduce coupling, but introducing any interface that exposes or depends on IQueryable<T> doesn't reduce coupling.
In Agile Principles, Patterns, and Practices in C#, Robert C. Martin explains in chapter 11 that "clients [...] own the abstract interfaces" - in other words: the client, which invokes methods on the interface, should define the methods that the interface should expose, based on what it needs - not based on what an arbitrary implementation might be able to implement. This is the design philosophy I always follow. A client shouldn't have to need to be able to perform arbitrary queries against a data access layer. If it does, it's such a Leaky Abstraction anyway that the interface isn't going to help; in such cases, just let the client talk directly to the ORM or the ADO.NET objects. That's much easier to understand.
The same argument also goes against designing interfaces around Expression<Func<T, bool>>. It may seem flexible, but the fact is that such an expression can be arbitrarily complex, so in practice, it's impossible to guarantee that you can translate every expression to all SQL dialects; and what about OData? Or queries against MongoDB?
Still, I rarely use ORMs at all. Instead, I increasingly rely on simple abstractions like Drain when designing my systems.