A first stab at the Brainfuck kata by Mark Seemann
I almost gave up, but persevered and managed to produce something that works.
As I've previously mentioned, a customer hired me to swing by to demonstrate test-driven development and tactical Git. To make things interesting, we agreed that they'd give me a kata at the beginning of the session. I didn't know which problem they'd give me, so I thought it'd be a good idea to come prepared. I decided to seek out katas that I hadn't done before.
The demonstration session was supposed to be two hours in front of a participating audience. In order to make my preparation aligned to that situation, I decided to impose a two-hour time limit to see how far I could get. At the same time, I'd also keep an eye on didactics, so preferably proceeding in an order that would be explainable to an audience.
Some katas are more complicated than others, so I'm under no illusion that I can complete any, to me unknown, kata in under two hours. My success criterion for the time limit is that I'd like to reach a point that would satisfy an audience. Even if, after two hours, I don't reach a complete solution, I should leave a creative and intelligent audience with a good idea of how to proceed.
After a few other katas, I ran into the Brainfuck kata one Thursday. In this article, I'll describe some of the most interesting things that happened along the way. If you want all the details, the code is available on GitHub.
Understanding the problem #
I had heard about Brainfuck before, but never tried to write an interpreter (or a program, for that matter).
The kata description lacks examples, so I decided to search for them elsewhere. The wikipedia article comes with some examples of small programs (including Hello, World), so ultimately I used that for reference instead of the kata page.
I'm happy I wasn't making my first pass on this problem in front of an audience. I spent the first 45 minutes just trying to understand the examples.
You might find me slow, since the rules of the language aren't that complicated. I was, however, confused by the way the examples were presented.
As the wikipedia article explains, in order to add two numbers together, one can use this idiom:
[->+<]
The article then proceeds to list a small, complete program that adds two numbers. This program adds numbers this way:
[ Start your loops with your cell pointer on the loop counter (c1 in our case) < + Add 1 to c0 > - Subtract 1 from c1 ] End your loops with the cell pointer on the loop counter
I couldn't understand why this annotated 'walkthrough' explained the idiom in reverse. Several times, I was on the verge of giving up, feeling that I made absolutely no progress. Finally, it dawned on me that the second example is not an explanation of the first example, but rather a separate example that makes use of the same idea, but expresses it in a different way.
Most programming languages have more than one way to do things, and this is also the case here. [->+<]
adds two numbers together, but so does [<+>-]
.
Once you understand something, it can be difficult to recall why you ever found it confusing. Now that I get this, I'm having trouble explaining what I was originally thinking, and why it confused me.
This experience does, however, drive home a point for educators: When you introduce a concept and then provide examples, the first example should be a direct continuation of the introduction, and not some variation. Variations are fine, too, but should follow later and be clearly labelled.
After 45 minutes I had finally cracked the code and was ready to get programming.
Getting started #
The kata description suggests starting with the +
, -
, >
, and <
instructions to manage memory. I briefly considered that, but on the other hand, I wanted to have some test coverage. Usually, I take advantage of test-driven development, and I while I wasn't sure how to proceed, I wanted to have some tests.
If I were to exclusively start with memory management, I would need some way to inspect the memory in order to write assertions. This struck me as violating encapsulation.
Instead, I thought that I'd write the simplest program that would produce some output, because if I had output, I would have something to verify.
That, on the other hand, meant that I had to consider how to model input and output. The Wikipedia article describes these as
"two streams of bytes for input and output (most often connected to a keyboard and a monitor respectively, and using the ASCII character encoding)."
Knowing that you can model the console's input and output streams as polymorphic objects, I decided to model the output as a TextWriter. The lowest-valued printable ASCII character is space, which has the byte value 32
, so I wrote this test:
[Theory] [InlineData("++++++++++++++++++++++++++++++++.", " ")] // 32 increments; ASCII 32 is space public void Run(string program, string expected) { using var output = new StringWriter(); var sut = new BrainfuckInterpreter(output); sut.Run(program); var actual = output.ToString(); Assert.Equal(expected, actual); }
As you can see, I wrote the test as a [Theory]
(parametrised test) from the outset, since I predicted that I'd add more test cases soon. Strictly speaking, when following the red-green-refactor checklist, you shouldn't write more code than absolutely necessary. According to YAGNI, you should avoid speculative generality.
Sometimes, however, you've gone through a process so many times that you know, with near certainty, what happens next. I've done test-driven development for decades, so I occasionally allow my experience to trump the rules.
The Brainfuck program in the [InlineData]
attribute increments the same data cell 32 times (you can count the plusses) and then outputs its value. The expected
output is the space character, since it has the ASCII code 32
.
What's the simplest thing that could possibly work? Something like this:
public sealed class BrainfuckInterpreter { private readonly StringWriter output; public BrainfuckInterpreter(StringWriter output) { this.output = output; } public void Run(string program) { output.Write(' '); } }
As is typical with test-driven development (TDD), the first few tests help you design the API, but not the implementation, which, here, is deliberately naive.
Since I felt pressed for time, having already spent 45 minutes of my two-hour time limit getting to grips with the problem, I suppose I lingered less on the refactoring phase than perhaps I should have. You'll notice, at least, that the BrainfuckInterpreter
class depends on StringWriter rather than its abstract parent class TextWriter
, which was the original plan.
It's not a disastrous mistake, so when I later discovered it, I easily rectified it.
Implementation outline #
To move on, I added another test case:
[Theory] [InlineData("++++++++++++++++++++++++++++++++.", " ")] // 32 increments; ASCII 32 is space [InlineData("+++++++++++++++++++++++++++++++++.", "!")] // 33 increments; ASCII 32 is ! public void Run(string program, string expected) { using var output = new StringWriter(); var sut = new BrainfuckInterpreter(output); sut.Run(program); var actual = output.ToString(); Assert.Equal(expected, actual); }
The only change is the addition of the second [InlineData]
attribute, which supplies a slightly different Brainfuck program. This one has 33 increments, which corresponds to the ASCII character code for an exclamation mark.
Notice that I clearly copied and pasted the comment, but forgot to change the last 32
to 33
.
In my eagerness to pass both tests, and because I felt the clock ticking, I made another classic TDD mistake: I took too big a step. At this point, it would have been enough to iterate over the program's characters, count the number of plusses, and convert that number to a character. What I did instead was this:
public sealed class BrainfuckInterpreter { private readonly StringWriter output; public BrainfuckInterpreter(StringWriter output) { this.output = output; } public void Run(string program) { var imp = new InterpreterImp(program, output); imp.Run(); } private sealed class InterpreterImp { private int programPointer; private readonly byte[] data; private readonly string program; private readonly StringWriter output; internal InterpreterImp(string program, StringWriter output) { data = new byte[30_000]; this.program = program; this.output = output; } internal void Run() { while (!IsDone) InterpretInstruction(); } private bool IsDone => program.Length <= programPointer; private void InterpretInstruction() { var instruction = program[programPointer]; switch (instruction) { case '+': data[0]++; programPointer++; break; case '.': output.Write((char)data[0]); programPointer++; break; default: programPointer++; break; } } }
With only two test cases, all that code isn't warranted, but I was more focused on implementing an interpreter than on moving in small steps. Even with decades of TDD experience, discipline sometimes slips. Or maybe exactly because of it.
Once again, I was fortunate enough that this implementation structure turned out to work all the way, but the point of the TDD process is that you can't always know that.
You may wonder why I decided to delegate the work to an inner class. I did that because I expected to have to maintain a programPointer
over the actual program
, and having a class that interprets one program has better encapsulation. I'll remind the reader than when I use the word encapsulation, I don't necessarily mean information hiding. Usually, I think in terms of contracts: Invariants, pre-, and postconditions.
With this design, the program
is guaranteed to be present as a class field, since it's readonly
and assigned upon initialisation. No defensive coding is required.
Remaining memory-management instructions #
While I wasn't planning on making use of the Devil's advocate technique, I did leave one little deliberate mistake in the above implementation: I'd hardcoded the data pointer as 0
.
This made it easy to choose the next test case, and the next one after that, and so on.
At the two-hour mark, I had these test cases:
[Theory] [InlineData("++++++++++++++++++++++++++++++++.", " ")] // 32 increments; ASCII 32 is space [InlineData("+++++++++++++++++++++++++++++++++.", "!")] // 33 increments; ASCII 32 is ! [InlineData("+>++++++++++++++++++++++++++++++++.", " ")] // 32 increments after >; ASCII 32 is space [InlineData("+++++++++++++++++++++++++++++++++-.", " ")] // 33 increments and 1 decrement; ASCII 32 [InlineData(">+<++++++++++++++++++++++++++++++++.", " ")] // 32 increments after movement; ASCII 32 public void Run(string program, string expected) { using var output = new StringWriter(); var sut = new BrainfuckInterpreter(output); sut.Run(program); var actual = output.ToString(); Assert.Equal(expected, actual); }
And this implementation:
private sealed class InterpreterImp { private int programPointer; private int dataPointer; private readonly byte[] data; private readonly string program; private readonly StringWriter output; internal InterpreterImp(string program, StringWriter output) { data = new byte[30_000]; this.program = program; this.output = output; } internal void Run() { while (!IsDone) InterpretInstruction(); } private bool IsDone => program.Length <= programPointer; private void InterpretInstruction() { var instruction = program[programPointer]; switch (instruction) { case '>': dataPointer++; programPointer++; break; case '<': dataPointer--; programPointer++; break; case '+': data[dataPointer]++; programPointer++; break; case '-': data[dataPointer]--; programPointer++; break; case '.': output.Write((char)data[dataPointer]); programPointer++; break; default: programPointer++; break; } } }
I'm only showing the inner InterpreterImp
class, since I didn't change the outer BrainfuckInterpreter
class.
At this point, I had used my two hours, but I think that I managed to leave my imaginary audience with a sketch of a possible solution.
Jumps #
What remained was the jumping instructions [
and ]
, as well as input.
Perhaps I could have kept adding small [InlineData]
test cases to my single test method, but I thought I was ready to take on some of the small example programs on the Wikipedia page. I started with the addition example in this manner:
// Copied from https://en.wikipedia.org/wiki/Brainfuck const string addTwoProgram = @" ++ Cell c0 = 2 > +++++ Cell c1 = 5 [ Start your loops with your cell pointer on the loop counter (c1 in our case) < + Add 1 to c0 > - Subtract 1 from c1 ] End your loops with the cell pointer on the loop counter At this point our program has added 5 to 2 leaving 7 in c0 and 0 in c1 but we cannot output this value to the terminal since it is not ASCII encoded To display the ASCII character ""7"" we must add 48 to the value 7 We use a loop to compute 48 = 6 * 8 ++++ ++++ c1 = 8 and this will be our loop counter again [ < +++ +++ Add 6 to c0 > - Subtract 1 from c1 ] < . Print out c0 which has the value 55 which translates to ""7""!"; [Fact] public void AddTwoValues() { using var input = new StringReader(""); using var output = new StringWriter(); var sut = new BrainfuckInterpreter(input, output); sut.Run(addTwoProgram); var actual = output.ToString(); Assert.Equal("7", actual); }
I got that test passing, added the next example, got that passing, and so on. My final implementation looks like this:
public sealed class BrainfuckInterpreter { private readonly TextReader input; private readonly TextWriter output; public BrainfuckInterpreter(TextReader input, TextWriter output) { this.input = input; this.output = output; } public void Run(string program) { var imp = new InterpreterImp(program, input, output); imp.Run(); } private sealed class InterpreterImp { private int instructionPointer; private int dataPointer; private readonly byte[] data; private readonly string program; private readonly TextReader input; private readonly TextWriter output; internal InterpreterImp(string program, TextReader input, TextWriter output) { data = new byte[30_000]; this.program = program; this.input = input; this.output = output; } internal void Run() { while (!IsDone) InterpretInstruction(); } private bool IsDone => program.Length <= instructionPointer; private void InterpretInstruction() { WrapDataPointer(); var instruction = program[instructionPointer]; switch (instruction) { case '>': dataPointer++; instructionPointer++; break; case '<': dataPointer--; instructionPointer++; break; case '+': data[dataPointer]++; instructionPointer++; break; case '-': data[dataPointer]--; instructionPointer++; break; case '.': output.Write((char)data[dataPointer]); instructionPointer++; break; case ',': data[dataPointer] = (byte)input.Read(); instructionPointer++; break; case '[': if (data[dataPointer] == 0) MoveToMatchingClose(); else instructionPointer++; break; case ']': if (data[dataPointer] != 0) MoveToMatchingOpen(); else instructionPointer++; break; default: instructionPointer++; break; } } private void WrapDataPointer() { if (dataPointer == -1) dataPointer = data.Length - 1; if (dataPointer == data.Length) dataPointer = 0; } private void MoveToMatchingClose() { var nestingLevel = 1; while (0 < nestingLevel) { instructionPointer++; if (program[instructionPointer] == '[') nestingLevel++; if (program[instructionPointer] == ']') nestingLevel--; } instructionPointer++; } private void MoveToMatchingOpen() { var nestingLevel = 1; while (0 < nestingLevel) { instructionPointer--; if (program[instructionPointer] == ']') nestingLevel++; if (program[instructionPointer] == '[') nestingLevel--; } instructionPointer++; } } }
As you can see, I finally discovered that I'd been too concrete when using StringWriter
. Now, input
is defined as a TextReader, and output
as a TextWriter
.
When TextReader.Read encounters the end of the input stream, it returns -1
, and when you cast that to byte
, it becomes 255
. I admit that I haven't read through the Wikipedia article's ROT13 example code to a degree that I understand how it decides to stop processing, but the test passes.
I also realised that the Wikipedia article used the term instruction pointer, so I renamed programPointer
to instructionPointer
.
Assessment #
Due to the switch/case
structure, the InterpretInstruction
method has a cyclomatic complexity of 12, which is more than I recommend in my book Code That Fits in Your Head.
It's not uncommon that switch/case
code has high cyclomatic complexity, and this is also a common criticism of the measure. When each case
block is as simple as it is here, or delegates to helper methods such as MoveToMatchingClose
, you could reasonably argue that the code is still maintainable.
Refactoring lists switch statements as a code smell and suggests better alternatives. Had I followed the kata description's additional constraints to the letter, I should also have made it easy to add new instructions, or rename existing ones. This might suggest that one of Martin Fowler's refactorings might be in order.
That is, however, an entirely different kind of exercise, and I thought that I'd already gotten what I wanted out of the kata.
Conclusion #
At first glance, the Brainfuck language isn't difficult to understand (but onerous to read). Even so, it took me so long time to understand the example code that I almost gave up more than once. Still, once I understood how it worked, the interpreter actually wasn't that hard to write.
In retrospect, perhaps I should have structured my code differently. Perhaps I should have used polymorphism instead of a switch statement. Perhaps I should have written the code in a more functional style. Regular readers will at least recognise that the code shown here is uncharacteristically imperative for me. I do, however, try to vary my approach to fit the problem at hand (use the right tool for the job, as the old saw goes), and the Brainfuck language is described in so imperative terms that imperative code seemed like the most fitting style.
Now that I understand how Brainfuck works, I might later try to do the kata with some other constraints. It might prove interesting.