Implementation of the C# IO container by Mark Seemann
Implementation details of the C# IO container.
This article is part of an article series about the IO container in C#. In the previous articles, you've seen how a type like IO<T>
can be used to distinguish between pure functions and impure actions.
The point of the article series is to illustrate the concept that Haskell uses to impose the functional interaction law at compile time. The implementation details really aren't important. Still, I believe that I know my readership well enough that a substantial fraction would be left unsatisfied if I didn't share the implementation details.
I consider this an appendix to the article series. It's not really what this is all about, but here it is, nonetheless.
Constructor #
Based on the public API already published, the constructor implementation hardly comes as a surprise.
private readonly Func<T> item; public IO(Func<T> item) { this.item = item; }
The IO<T>
class is little more than a wrapper around a lazily evaluated function, with the restriction that while you can put a Func<T>
object into an IO
object, you can never get it out again. Thus, the item
is a private
class field instead of a public property.
SelectMany #
The SelectMany
method is a little more tricky:
public IO<TResult> SelectMany<TResult>(Func<T, IO<TResult>> selector) { return new IO<TResult>(() => selector(item()).item()); }
To guarantee referential transparency, we don't want the method to trigger evaluation of the lazy value, so the selector
has to run inside a new lazy computation. This produces a lazy IO
value that the method then has to unwrap inside another lazy computation. Such a translation from Func<IO<TResult>>
to a new IO
object with a Func<TResult>
inside it is reminiscent of what in Haskell is known as a traversal.
UnsafePerformIO #
Finally, the UnsafePerformIO
method isn't part of the API, but as explained in the previous article, this is the special method that the hypothetical parallel-world framework calls on the IO<Unit>
returned by Main
methods.
internal T UnsafePerformIO() { return item(); }
Since only the framework is supposed to call this method, it's internal
by design. The only thing it does is to force evaluation of the lazy item
.
Conclusion #
Most of the implementation of IO<T>
is straightforward, with the single exception of SelectMany
, which has to jump through a few hoops to keep the behaviour lazy until it's activated.
Once more I want to point out that the purpose of this article series is to explain how a type system like Haskell's guarantees referential transparency. C# could do the same, but it'd require that all impure actions in all libraries in the entire .NET ecosystem were to return IO
values. That's not going to happen, but something similar already has happened. Read on in the next article.
Comments
In your previous post, you said
After several days, I finally feel like I fully understand this.
The concept of lazy has serveral slightly different definitions depending on the context. Haskell is a lazily evaluated language in the sense that its evaluation strategy is call by need. In C#, both
Lazy<T>
andFunc<T>
are lazy in the sense that neither actually contains aT
, but both could produce aT
if asked to do so. The difference is the presence or absence of caching. I remember all this by saying thatLazy<T>
is lazy with caching andFunc<T>
is lazy without caching. SoLazy<T>
is to call by need asFunc<T>
is to call by name.Therefore,
Lazy<T>
is the correct choice if we want to model or emulate the evaluation strategy of Haskell in C#. What about Haskell'sIO<T>
? Is it lazy with caching or lazy without caching? My guess was lazy without caching, but I finally installed the ghc Haskell compiler and compiled Haskell on my machine for the first time in order to test this. I think this example shows that Haskell'sIO<T>
is lazy without caching.I think this would be equivalent C# code in this parallel world that you have created.
What makes me think that I fully understand this now is that I think I see where you are going. I think you already knew all this and decided to model Haskell's
IO<T>
usingLazy<T>
anyway becauseTask<T>
is also lazy with caching just likeLazy<T>
, and your next post will discuss usingTask<T>
as a surrogate for Haskell'sIO<T>
. I think you want your C# implementation ofIO<T>
to be more like C#'sTask<T>
than Haskell'sIO<T>
.Thank you for including such a gem for me to think about...and enough motivation to finally put Haskell on my machine!
Tyson, thank you for writing. You've almost turned my blog into a peer-reviewed journal, and you've just pointed out a major blunder of mine 👍
I think I was mislead by the name
Lazy
, my attention was elsewhere, and I completely forgot about the memoisation that bothLazy<T>
andTask<T>
employ. It does turn out to be problematic in this context. Take the following example:Notice that this example reuses
getTime
twice. We'd like anyIO<T>
value to represent an impure computation, so evaluating it twice with a 2 millisecond delay in between ought to yield two different results.Due to the memoisation built into
Lazy<T>
, the first value is reused. That's not the behaviour we'd like to see.While this is a rather big error on my part, it's fortunately only of a technical nature (I think). As you imply, the correct implementation is to use
Func<T>
rather thanLazy<T>
:This addresses the above reuse bug. With this implementation, the above
Main
method prints two different values, even though it reusesgetTime
.Haskell
IO
doesn't memoise values, so thisFunc
-based implementation better emulates the Haskell behaviour, which is actually what I wanted to do all along.This mistake of mine is potentially so confusing that I think that it's best if I go back and edit the other posts in this articles series. Before I do that, though, I'd like to get your feedback.
Does this look better to you, or do you see other problems with it?
You're welcome. I am certainly a peer, and I benefit greatly from closly reviewing your posts. They always give me so much to think about. I am happy that we both benefit from this :)
Yes, your
Func<T>
-based implementation better emulates Haskell'sIO<T>
. My guess was that you had usedLazy<T>
with its caching behavior in mind. I do think it is a minor issue. I can't think of any code that I would write on purpose that would depend on this difference.I think editing the previous posts depends on exactly how you want to suggesst
Task<T>
as anIO<T>
surrogate.From a purely teaching perspective, I think I prefer to first implement Haskell's
IO<T>
in C# usingFunc<T>
, then suggest this implemention is essentialy the same asTask<T>
, then point out the caching difference for those that are still reading. It would be a shame to lose some readers eariler by pointing out the difference too soon. I wouldn't expect you lost any readres in your current presentation that includes the caching difference but without mentioning it.Func<T>
,Lazy<T>
, andTask<T>
are all lazy (in the sense that none contain aT
but all could produce one if requested. Here are their differences:Func<T>
is synchronous without caching,Lazy<T>
is synchronous with caching, andTask<T>
is asynchronous with caching.Func<T>
andTask<T>
. The nested typeFunc<Task<T>>
can asynchronously produce aT
without caching. Maybe this would be helpful in this article series.Overall though, I don't know of any other potential changes to consider.
For any reader following the discussion after today (July 24, 2020), it may be slightly confusing. Based on Tyson Williams' feedback, I've edited the article series with the above implementation. The previous incarnation of the article series had this implementation of
IO<T>
:As this discussion reveals, the memoisation performed by
Lazy<T>
causes problems. After thinking it through, I've decided to retroactively change the articles in the series. This is something that I rarely do, but in this case I think is the best way forward, as theLazy
-based implementation could be confusing to new readers.Readers interested in the history of these articles can peruse the Git log.