Drain by Mark Seemann
A Drain is a filter abstraction over an Iterator, with the purpose of making out-of-process queries more efficient.
For some years now, the Reused Abstractions Principle has pushed me towards software architectures based upon fewer and fewer abstractions, where each abstraction, on the other hand, is reused over and over again.
In my Pluralsight course about A Functional Architecture with F#, I describe how to build an entire mainstream application based on only two abstractions:
More specifically, I use Reactive Extensions for Commands, and the Seq module for Queries (but if you're on C#, you can use LINQ instead).The problem #
It turns out that these two abstractions are enough to build an entire, mainstream system, but in practice, there's a performance problem. If you have only Iterators, you'll have to read all your data into memory, and then filter in memory. If you have lots of data on storage, this is obviously going to be prohibitively slow, so you'll need a way to select only a subset out of your persistent store.
This problem should be familiar to every student of software development. Pure abstractions tend not to survive contact with reality (there are examples in both Object-Oriented, Functional, and Logical or Relational programming), but we should still strive towards keeping abstractions as pure as possible.
One proposed solution to the Query Problem is to use something like IQueryable, but unfortunately, IQueryable is an extremely poor abstraction (and so are F# query expressions, too).
In my experience, the most important feature of IQueryable is the ability to filter before loading data; normally, you can perform projections in memory, but when you read from persistent storage, you need to select your desired subset before loading it into memory.
Inspired by talks by Bart De Smet, in my Pluralsight course, I define custom filter interfaces like:
type IReservations = inherit seq<Envelope<Reservation>> abstract Between : DateTime -> DateTime -> seq<Envelope<Reservation>>
or
type INotifications = inherit seq<Envelope<Notification>> abstract About : Guid -> seq<Envelope<Notification>>
Both of these interfaces derive from IEnumerable<T> and add a single extra method that defines a custom filter. Storage-aware implementations can implement this method by returning a new sequence of only those items on storage that match the filter. Such a method may
- make a SQL query against a database
- make a query against a document database
- read only some files from the file system
- etc.
Generalized interface #
The custom interfaces shown above follow a common template: the interface derives from IEnumerable<T> and adds a single 'filter' method, which filters the sequence based on the input argument(s). In the above examples, IReservations define a Between method with two arguments, while INotifications defines an About method with a single argument.
In order to generalize, it's necessary to find a common name for the interface and its single method, as well as deal with variations in method arguments.
All the really obvious names like Filter, Query, etc. are already 'taken', so I hit a thesaurus and settled on the name Drain. A Drain can potentially drain a sequence of elements to a smaller sequence.
When it comes to variations in input arguments, the solution is to use generics. The Between method that takes two arguments could also be modelled as a method taking a single tuple argument. Eventually, I've arrived at this general definition:
module Drain = type IDrainable<'a, 'b> = inherit seq<'a> abstract On : 'b -> seq<'a> let on x (d : IDrainable<'a, 'b>) = d.On x
As you can see, I decided to name the extra method On, as well as add an on function, which enables clients to use a Drain like this:
match tasks |> Drain.on id |> Seq.toList with
In the above example, tasks
is defined as IDrainable<TaskRendition, string>, and id
is a string, so the result of draining on the ID is a sequence of TaskRendition records.
Here's another example:
match statuses |> Drain.on(id, conversationId) |> Seq.toList with
Here, statuses
is defined as IDrainable<string * Guid, string * string> - not the most well-designed instance, I admit: I should really introduce some well-named records instead of those tuples, but the point is that you can also drain on multiple values by using a tuple (or a record type) as the value on which to drain.
In-memory implementation #
One of the great features of Drains is that an in-memory implementation is easy, so you can add this function to the Drain module:
let ofSeq areEqual s = { new IDrainable<'a, 'b> with member this.On x = s |> Seq.filter (fun y -> areEqual y x) member this.GetEnumerator() = s.GetEnumerator() member this.GetEnumerator() = (this :> 'a seq).GetEnumerator() :> System.Collections.IEnumerator }
This enables you to take any IEnumerable<T> (seq<'a>) and turn it into an in-memory Drain by supplying an equality function. Here's an example:
let private toDrainableTasks (tasks : TaskRendition seq) = tasks |> Drain.ofSeq (fun x y -> x.Id = y)
This little helper function takes a sequence of TaskRendition records and defines the equality function as a comparison on each TaskRendition record's Id property. The result is a drain that you can use to select one or more TaskRendition records based on their IDs.
I use this a lot for unit testing.
Empty Drains #
It's also easy to define an empty drain, by adding this value to the Drain module:
let empty<'a, 'b> = Seq.empty<'a> |> ofSeq (fun x (y : 'b) -> false)
Here's a usage example:
let mappedUsers = Drain.empty<UserMapped, string>
Again, this can be handy when unit testing.
Other implementations #
While the in-memory implementation is useful when unit testing, the entire purpose of the Drain abstraction is to enable various implementations to implement the On method to perform a custom selection against a well-known data source. As an example, you could imagine an implementation that translates the input arguments of the On method into a SQL query.
If you want to see examples of this, my Pluralsight course demonstrates how to implement IReservations and INotifications with various data stores - I trust you can extrapolate from those examples.
Summary #
You can base an entire mainstream application on the two abstractions of Iterator and Observer. However, the problem when it comes to Iterators is that conceptually, you'll need to iterate over all potentially relevant elements in your system - and that may be millions of records!
However impure it is to introduce a third interface into the mix, I still prefer to introduce a single generic interface, instead of multiple custom interfaces, because once you and your co-workers understand the Drain abstraction, the cognitive load is still quite low. A Drain is an Iterator with a twist, so in the end, you'll have a system built on 2½ abstractions.
P.S. 2018-06-20. While this article is a decent attempt to generalise the query side of a fundamentally object-oriented approach to software architecture, I've later realised that dependency injection, even when disguised as partial application, isn't functional. The problem is that querying a database, reading files, and so on, is essentially non-deterministic, even when no side effects are incurred. The functional approach is to altogether reject the notion of dependencies.
Comments