Implementation details of the C# IO container.

This article is part of an article series about the IO container in C#. In the previous articles, you've seen how a type like IO<T> can be used to distinguish between pure functions and impure actions.

The point of the article series is to illustrate the concept that Haskell uses to impose the functional interaction law at compile time. The implementation details really aren't important. Still, I believe that I know my readership well enough that a substantial fraction would be left unsatisfied if I didn't share the implementation details.

I consider this an appendix to the article series. It's not really what this is all about, but here it is, nonetheless.

Constructor #

Based on the public API already published, the constructor implementation hardly comes as a surprise.

private readonly Func<T> item;
 
public IO(Func<T> item)
{
    this.item = item;
}

The IO<T> class is little more than a wrapper around a lazily evaluated function, with the restriction that while you can put a Func<T> object into an IO object, you can never get it out again. Thus, the item is a private class field instead of a public property.

SelectMany #

The SelectMany method is a little more tricky:

public IO<TResult> SelectMany<TResult>(Func<TIO<TResult>> selector)
{
    return new IO<TResult>(() => selector(item()).item());
}

To guarantee referential transparency, we don't want the method to trigger evaluation of the lazy value, so the selector has to run inside a new lazy computation. This produces a lazy IO value that the method then has to unwrap inside another lazy computation. Such a translation from Func<IO<TResult>> to a new IO object with a Func<TResult> inside it is reminiscent of what in Haskell is known as a traversal.

UnsafePerformIO #

Finally, the UnsafePerformIO method isn't part of the API, but as explained in the previous article, this is the special method that the hypothetical parallel-world framework calls on the IO<Unit> returned by Main methods.

internal T UnsafePerformIO()
{
    return item();
}

Since only the framework is supposed to call this method, it's internal by design. The only thing it does is to force evaluation of the lazy item.

Conclusion #

Most of the implementation of IO<T> is straightforward, with the single exception of SelectMany, which has to jump through a few hoops to keep the behaviour lazy until it's activated.

Once more I want to point out that the purpose of this article series is to explain how a type system like Haskell's guarantees referential transparency. C# could do the same, but it'd require that all impure actions in all libraries in the entire .NET ecosystem were to return IO values. That's not going to happen, but something similar already has happened. Read on in the next article.

Next: Task asynchronous programming as an IO surrogate.


Comments

In your previous post, you said

Haskell is a lazily evaluated language. That's an important piece of the puzzle, so I've modelled the IO<T> example so that it only wraps Lazy values. That emulates Haskell's behaviour in C#.

After several days, I finally feel like I fully understand this.

The concept of lazy has serveral slightly different definitions depending on the context. Haskell is a lazily evaluated language in the sense that its evaluation strategy is call by need. In C#, both Lazy<T> and Func<T> are lazy in the sense that neither actually contains a T, but both could produce a T if asked to do so. The difference is the presence or absence of caching. I remember all this by saying that Lazy<T> is lazy with caching and Func<T> is lazy without caching. So Lazy<T> is to call by need as Func<T> is to call by name.

Therefore, Lazy<T> is the correct choice if we want to model or emulate the evaluation strategy of Haskell in C#. What about Haskell's IO<T>? Is it lazy with caching or lazy without caching? My guess was lazy without caching, but I finally installed the ghc Haskell compiler and compiled Haskell on my machine for the first time in order to test this. I think this example shows that Haskell's IO<T> is lazy without caching.

-- output: xx
main = 
  let io = putStr "x"
  in do { io ; io }

I think this would be equivalent C# code in this parallel world that you have created.

// output: x
static IO<Unit> MainIO(string[] args) {
  var io = Console.Write("x");
  return from _1 in io
         from _2 in io
         select Unit.Instance;
}

What makes me think that I fully understand this now is that I think I see where you are going. I think you already knew all this and decided to model Haskell's IO<T> using Lazy<T> anyway because Task<T> is also lazy with caching just like Lazy<T>, and your next post will discuss using Task<T> as a surrogate for Haskell's IO<T>. I think you want your C# implementation of IO<T> to be more like C#'s Task<T> than Haskell's IO<T>.

Thank you for including such a gem for me to think about...and enough motivation to finally put Haskell on my machine!

2020-07-13 20:46 UTC

Tyson, thank you for writing. You've almost turned my blog into a peer-reviewed journal, and you've just pointed out a major blunder of mine 👍

I think I was mislead by the name Lazy, my attention was elsewhere, and I completely forgot about the memoisation that both Lazy<T> and Task<T> employ. It does turn out to be problematic in this context. Take the following example:

static IO<Unit> Main(string[] args)
{
    IO<DateTime> getTime = Clock.GetLocalTime();
    return
        getTime.SelectMany(t1 =>
        Console.WriteLine(t1.Ticks.ToString()).Select(u1 =>
        {
            Thread.Sleep(2);
            return Unit.Instance;
        }).SelectMany(u2 =>
        getTime).SelectMany(t2 =>
        Console.WriteLine(t2.Ticks.ToString())));
}

Notice that this example reuses getTime twice. We'd like any IO<T> value to represent an impure computation, so evaluating it twice with a 2 millisecond delay in between ought to yield two different results.

Due to the memoisation built into Lazy<T>, the first value is reused. That's not the behaviour we'd like to see.

While this is a rather big error on my part, it's fortunately only of a technical nature (I think). As you imply, the correct implementation is to use Func<T> rather than Lazy<T>:

public sealed class IO<T>
{
    private readonly Func<T> item;
 
    public IO(Func<T> item)
    {
        this.item = item;
    }
 
    public IO<TResult> SelectMany<TResult>(Func<TIO<TResult>> selector)
    {
        return new IO<TResult>(() => selector(item()).item());
    }
 
    internal T UnsafePerformIO()
    {
        return item();
    }
}

This addresses the above reuse bug. With this implementation, the above Main method prints two different values, even though it reuses getTime.

Haskell IO doesn't memoise values, so this Func-based implementation better emulates the Haskell behaviour, which is actually what I wanted to do all along.

This mistake of mine is potentially so confusing that I think that it's best if I go back and edit the other posts in this articles series. Before I do that, though, I'd like to get your feedback.

Does this look better to you, or do you see other problems with it?

2020-07-19 14:32 UTC
Tyson, thank you for writing. You've almost turned my blog into a peer-reviewed journal, and you've just pointed out a major blunder of mine 👍

You're welcome. I am certainly a peer, and I benefit greatly from closly reviewing your posts. They always give me so much to think about. I am happy that we both benefit from this :)

While this is a rather big error on my part, it's fortunately only of a technical nature (I think). ...

Haskell IO doesn't memoise values, so this Func-based implementation better emulates the Haskell behaviour, which is actually what I wanted to do all along.

This mistake of mine is potentially so confusing that I think that it's best if I go back and edit the other posts in this articles series. Before I do that, though, I'd like to get your feedback.

Does this look better to you, or do you see other problems with it?

Yes, your Func<T>-based implementation better emulates Haskell's IO<T>. My guess was that you had used Lazy<T> with its caching behavior in mind. I do think it is a minor issue. I can't think of any code that I would write on purpose that would depend on this difference.

I think editing the previous posts depends on exactly how you want to suggesst Task<T> as an IO<T> surrogate.

From a purely teaching perspective, I think I prefer to first implement Haskell's IO<T> in C# using Func<T>, then suggest this implemention is essentialy the same as Task<T>, then point out the caching difference for those that are still reading. It would be a shame to lose some readers eariler by pointing out the difference too soon. I wouldn't expect you lost any readres in your current presentation that includes the caching difference but without mentioning it.

Func<T>, Lazy<T>, and Task<T> are all lazy (in the sense that none contain a T but all could produce one if requested. Here are their differences:

  • Func<T> is synchronous without caching,
  • Lazy<T> is synchronous with caching, and
  • Task<T> is asynchronous with caching.
We are missing a type that is asynchronous without caching. We can create such behavior though using Func<T> and Task<T>. The nested type Func<Task<T>> can asynchronously produce a T without caching. Maybe this would be helpful in this article series.

Overall though, I don't know of any other potential changes to consider.

2020-07-19 21:19 UTC

For any reader following the discussion after today (July 24, 2020), it may be slightly confusing. Based on Tyson Williams' feedback, I've edited the article series with the above implementation. The previous incarnation of the article series had this implementation of IO<T>:

public sealed class IO<T>
{
    private readonly Lazy<T> item;
 
    public IO(Lazy<T> item)
    {
        this.item = item;
    }
 
    public IO<TResult> SelectMany<TResult>(Func<TIO<TResult>> selector)
    {
        var res = new Lazy<IO<TResult>>(() => selector(item.Value));
        return new IO<TResult>(
            new Lazy<TResult>(() => res.Value.item.Value));
    }
 
    internal T UnsafePerformIO()
    {
        return item.Value;
    }
}

As this discussion reveals, the memoisation performed by Lazy<T> causes problems. After thinking it through, I've decided to retroactively change the articles in the series. This is something that I rarely do, but in this case I think is the best way forward, as the Lazy-based implementation could be confusing to new readers.

Readers interested in the history of these articles can peruse the Git log.

2020-07-24 8:22 UTC


Wish to comment?

You can add a comment to this post by sending me a pull request. Alternatively, you can discuss this post on Twitter or somewhere else with a permalink. Ping me with the link, and I may respond.

Published

Monday, 13 July 2020 06:02:00 UTC

Tags



"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!
Published: Monday, 13 July 2020 06:02:00 UTC