REST implies Content Negotiation

Monday, 22 June 2015 09:10:00 UTC

If you're building REST APIs, you will eventually have to deal with Content Negotiation.

Some REST services support both JSON and XML, in which case it's evident that Content Negotiation is required. These days, though, more and more services forego XML, and serve only JSON. Does that mean that Content Negotiation is no longer relevant?

If you're building level 3 REST APIs this isn't true. You'll still need Content Negotiation in order to be able to evolve your service without breaking existing clients.

In a level 3 API, the distinguishing feature is that the service exposes Hypermedia Controls. In practice, it means that representations also contain links, like in this example:

{
    "users": [
        {
            "displayName""Jane Doe",
            "links": [
                { "rel""user""href""/users/1234" },
                { "rel""friends""href""/users/1234/friends" }
            ]
        },
        {
            "displayName""John Doe",
            "links": [
                { "rel""user""href""/users/5678" },
                { "rel""friends""href""/users/5678/friends" }
            ]
        }
    ]
}

This representation gives you a list of users. Notice that each user has an array of links. One type of link, identified by the relationship type "user", will take you to the user resource. Another link, identified by the relationship type "friends", will take you to that particular user's friends: another array of users.

When you follow the "user" link for the first user, you'll get this representation:

{
    "displayName""Jane Doe",
    "address": {
        "street""Any Boulevard 42",
        "zip""12345 Anywhere"
    }
}

This user representation is richer, because you also get the user's address. (In a real API, this representation should also have links, but I omitted them here in order to make the example more concise.)

Breaking changes

As long as you have no breaking changes, all is good. Things change, though, and now you discover that a user can have more than one address:

{
    "displayName""Jane Doe",
    "addresses": [
        {
            "street""Any Boulevard 42",
            "zip""12345 Anywhere"
        },
        {
            "street""Some Avenue 1337",
            "zip""67890 Nowhere"
        }
    ]
}

This is a breaking change. If the service returns this representation to a client that expects only a single "address" property, that client will break.

How can you introduce this new version of a user representation without breaking old clients? You'll need to keep both the old and the new version around, and return the old version to the old clients, and the new version to new clients.

Versioning attempt: in the URL

There are various suggested ways to version REST APIs. Some people suggest including the version in the URL, so that you'd be able to access the new, breaking representation at "/users/v2/1234", while the original representation is still available at "/users/1234". There are several problems with this suggestion; the worst is that it doesn't work well with Hypermedia Controls (links, if you recall).

Consider the list of users in the first example, above. Each user has some links, and one of the links is a "user" link, which will take the client to the original representation. This can never change: if you change the "user" links to point to the new representation, you will break old clients. You also can't remove the "user" links, for the same reason.

The only way out of this conundrum is to add another link for the new clients:

{
    "users": [
        {
            "displayName""Jane Doe",
            "links": [
                { "rel""user""href""/users/1234" },
                { "rel""userV2""href""/users/v2/1234" },
                { "rel""friends""href""/users/1234/friends" }
            ]
        },
        {
            "displayName""John Doe",
            "links": [
                { "rel""user""href""/users/5678" },
                { "rel""userV2""href""/users/v2/5678" },
                { "rel""friends""href""/users/5678/friends" }
            ]
        }
    ]
}

In this way, old clients can still follow "user" links, and updated clients can follow "userV2" links.

It may work, but isn't nice. Every time you introduce a new version, you'll have to add new links in all the places where the old link appears. Imagine the 'link bloat' when you have more than two versions of a resource:

{
    "users": [
        {
            "displayName""Jane Doe",
            "links": [
                { "rel""user""href""/users/1234" },
                { "rel""userV2""href""/users/v2/1234" },
                { "rel""userV3""href""/users/v3/1234" },
                { "rel""userV4""href""/users/v4/1234" },
                { "rel""userV5""href""/users/v5/1234" },
                { "rel""friends""href""/users/1234/friends" },
                { "rel""friendsV2""href""/users/1234/V2/friends" }
            ]
        },
        {
            "displayName""John Doe",
            "links": [
                { "rel""user""href""/users/5678" },
                { "rel""userV2""href""/users/v2/5678" },
                { "rel""userV3""href""/users/v3/5678" },
                { "rel""userV4""href""/users/v4/5678" },
                { "rel""userV5""href""/users/v5/5678" },
                { "rel""friends""href""/users/5678/friends" },
                { "rel""friendsV2""href""/users/5678/V2/friends" }
            ]
        }
    ]
}

Remember: we're talking about Level 3 REST APIs here. Clients must follow the links provided; clients are not expected to assemble URLs from templates, as will often be the case in Level 1 and 2.

Versioning through Content Negotiation

As is often the case, the root cause of the above problem is coupling. The attempted solution of putting the version in the URL couples the identity (the URL) of the resource to its representation. That was never the intent of REST.

Instead, you should use Content Negotiation to version representations. If clients don't request a specific version, the service should return the original representation.

New clients that understand the new representation can explicitly request it in the Accept header:

GET /users/1234 HTTP/1.1
Accept: application/vnd.ploeh.user+json; version=2

Jim Liddell has a more detailed description of this approach to versioning.

It enables you to keep the user list stable, without link bloat. It'll simply remain as it was. This means less work for you as API developer, and fewer bytes over the wire.

Disadvantages

The most common criticisms against using Content Negotiation for versioning is that it makes the API less approachable. The argument goes that with version information in the URL, you can still test and explore the API using your browser. Once you add the requirement that HTTP headers should be used, you can no longer test and explore the API with your browser.

Unless the API in question is an anonymously accessible, read-only API, I think that this argument misses the mark.

Both Level 2 and 3 REST APIs utilise HTTP verbs. Unless the API is a read-only API, a client must use POST, PUT, and DELETE as well as GET. This is not possible from a browser.

Today, many APIs are secured with modern authentication and authorisation formats like OAuth, where the client has to provide a security token in the Authorization header. This is not possible from a browser.

Most APIs will already be impossible to test from the browser; it's irrelevant that versioning via the Accept header also prevents you from using the browser to explore and test the API. Get friendly with curl or Fiddler instead.

Summary

If you want to build a true REST API, you should seriously consider using Content Negotiation for versioning. In this way, you prevent link bloat, and effectively decouple versioning from the identity of each resource.

If you've said REST, you must also say Content Negotiation. Any technology you use to build or consume REST services must be able to play that game.


Comments

For that example, it's only a breaking change because you also removed the old "address" property. It might have been easier to leave that in and populate it with the first address from the collection. Older clients could continue to use that, and newer ones could use the new "addresses" field. Obviously at some point you will need to deal with versioning, but it can be better to avoid it until absolutely necessary.

2015-06-30 13:37 UTC
Nik Petroff

I've been catching up on REST, and discovered the 5 levels of media type just a few minutes before reading your post. This keeps it approachable, you can still use your browser to explore, and the higher levels progressively enhance the API for smarter clients.

An example can be found here where Ali Kheyrollahi exposes Greg Young's CQRS sample though a REST inferface.

2015-07-01 11:37 UTC

stringf

Friday, 08 May 2015 05:40:00 UTC

stringf is an F# function that invokes any ToString(string) method.

F# comes with the built-in string function, which is essentially an adapter over Object.ToString. That often comes in handy, because it lets you compose functions without having to resort to lambda expressions all the time.

Instead of writing this (to produce the string "A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z"):

['A' .. 'Z'] |> List.map (fun x -> x.ToString()) |> String.concat ", "

You can write this:

['A' .. 'Z'] |> List.map string |> String.concat ", "

That's nice, but some .NET types provide an overloaded ToString function taking a format string as argument:

and so on.

While these methods can be useful, it would be nice to be able to use them as functions, like the string function. The problem, it seems, is that this ToString overload is part of no interface or base class.

Statically typed duck typing

That's not a problem for F#, because it enables us to do statically typed duck typing!

In this case, you can define a stringf function:

let inline stringf format (x : ^a) = 
    (^a : (member ToString : string -> string) (x, format))

You can use the function like this:

> DateTimeOffset.Now |> stringf "o";;
val it : string = "2015-05-07T14:24:09.8422893+02:00"
> DateTimeOffset.Now |> stringf "T";;
val it : string = "14:24:18"
> DateTime.Now |> stringf "t";;
val it : string = "14:24"
> TimeSpan.FromDays 42. |> stringf "c";;
val it : string = "42.00:00:00"
> 0.42m |> stringf "p0";;
val it : string = "42 %"

Perhaps it would be better to define more domain-specific functions like percent, roundTrippable, time, etc., but it's interesting that such a function as stringf is possible.


Functional design is intrinsically testable

Thursday, 07 May 2015 06:13:00 UTC

TDD with Functional Programming doesn't lead to test-induced damage. Here's why.

Over the years, there's been much criticism of Test-Driven Development (TDD). Perhaps David Heinemeier Hansson best condensed this criticism by claiming that TDD leads to test-induced design damage. This isn't a criticism you can just brush away; it hits a sore point.

Personally, I don't believe that TDD has to lead to test-induced damage (not even in Object-Oriented Programming), but I'm the first to admit that it's not a design methodology.

In this article, though, you're going to learn about the fundamental reason that TDD with Functional Programming doesn't lead to test-induced damage.

In Functional Programming, the ideal function is a Pure function. A Pure function is a function that always returns the same value given the same input, and has no side-effects.

Isolation

The first characteristic of a Pure function means that an ideal function can't depend on any implicit knowledge about the external world. Only the input into the function can influence the evaluation of the function.

This is what Jessica Kerr calls Isolation. A function has the property of Isolation when the only information it has about the external word is passed into it via arguments.

You can think about Isolation as the dual of Encapsulation.

In Object-Oriented Programming, Encapsulation is a very important concept. It means that while an object contains state, the external world doesn't know about that state, unless the object explicitly makes it available.

In Functional Programming, a function is Isolated when it knows nothing about the state of the external world, unless it's explicitly made available to it.

A Pure function, the ideal of Functional Programming, is Isolated.

Unit testing

Why is this interesting?

It's interesting if you start to think about what unit testing means. There are tons of conflicting definitions of what exactly constitutes a unit test, but most experts seem to be able to agree on this broad definition:

A unit test is an automated test that tests a unit in isolation from its dependencies.
Notice the use of the word Isolation in that definition. In order to unit test, you'll have to be able to isolate the unit from its dependencies. This is the requirement that tends to lead to Test-Induced Damage in Object-Oriented Programming. While there's nothing about Encapsulation that explicitly states that it's forbidden to isolate an object from its dependencies, it offers no help on the matter either. Programmers are on their own, because this concern isn't ingrained into Object-Oriented Programming.

Venn diagram showing that while there's an intersection between Encapsulation and Isolation, it's only here that Object-Oriented Programming is also testable.

You can do TDD with Object-Oriented Programming, and as long as you stay within the intersection of Encapsulation and Isolation, you may be able to stay clear of test-induced damage. However, that zone of testability isn't particularly big, so it's easy to stray. You have to be very careful and know what you're doing. Not surprisingly, many books and articles have been written about TDD, including quite a few on this blog.

The best of both worlds

In Functional Programming, on the other hand, Isolation is the ideal. An ideal function is already isolated from its dependencies, so no more design work is required to make it testable.

Stacked Venn diagram that show that an ideal function is a subset of isolated functions, which is again a subset of testable functions.

Ideal Functional design is not only ideal, but also perfectly testable, so there's no conflict. This is the underlying reason that TDD doesn't lead to test-induced damage with Functional Programming.

Summary

Isolation is an important quality of Functional Programming. An ideal function is Isolated, and that means that it's intrinsically testable. You don't have to tweak any design principles in order to make a function testable - in fact, if a function isn't testable, it's a sign that it's poorly designed. Thus, TDD doesn't lead to Test-Induced Damage in Functional Programming.

If you want to learn more about this, as well as see lots of code examples, you can watch my Test-Driven Development with F# Pluralsight course.


Test-Driven Development with F# Pluralsight course

Wednesday, 06 May 2015 07:21:00 UTC

Test-Driven Development and Functional Programming is a match made in heaven. Learn how and why in my new Pluralsight course.

A common criticism against Test-Driven Development (TDD) is that it leads to Test-Induced Damage. However, it doesn't have to be that way, and it turns out that with Functional Programming (FP), the design ideals of FP coincide with TDD.

Course screenshot

In my new Pluralsight course on Test-Driven Development with F#, you'll learn how the intersection between TDD and F# presents opportunities for better design and better testability.


Introduction to Property based Testing with F# Pluralsight course

Friday, 17 April 2015 06:23:00 UTC

My latest Pluralsight course is an introduction to Property-Based Testing with F#.

Soon after the release of my Unit Testing with F# Pluralsight course, it gives me great pleasure to announce my new course, which is an Introduction to Property-based Testing with F#.

In this course, you'll get an introduction to the concept of Property-Based Testing, and see a comprehensive example that demonstrates how to incrementally implement a feature that otherwise would have been hard to address in an iterative fashion.


C# will eventually get all F# features, right?

Wednesday, 15 April 2015 08:32:00 UTC

C# will never get the important features that F# has. Here's why.

The relationship between C# and F# is interesting, no matter if you look at it from the C# or the F# perspective:

There's nothing wrong with this. F# is a great language, so obviously it makes sense to look there for inspiration. Some features also go the other way, such as F# Query Expressions, which were inspired by LINQ.

It's not some hidden cabal that I'm trying to expose, either. Mads Torgersen has been quite open about this relationship.

Why care about F#, then?

A common reaction to all of this seems to be that if C# eventually gets all the best F# features, there's no reason to care about F#. Just stick with C#, and get those features in the future.

The most obvious answer to such statements is that F# already has those features, while you'll have to wait for a long time to get them in C#. While C# 6 gets a few features, they are hardly killer features. So perhaps you'll get the good F# features in C#, but they could be years away, and some features might be postponed to later versions again.

In my experience, that argument mostly falls on deaf ears. Many programmers are content to wait, perhaps because they feel that the language choice is out of their hands anyway.

What F# features could C# get?

Often, when F# enthusiasts attempt to sell the language to other programmers, they have a long list of language features that F# has, and that (e.g.) C# doesn't have. However, in the future, C# could hypothetically have those features too:

  • Records. C# could have those as well, and they're being considered for C# 7. Implementation-wise, F# records compile to immutable classes anyway.
  • Discriminated Unions. Nothing in principle prevents C# from getting those. After all, F# Discriminated Unions compile to a class hierarchy.
  • Pattern matching Also being considered for C# 7.
  • No nulls. It's a common myth that F# doesn't have nulls. It does. It's even a keyword. It's true that F# doesn't allow its Functional data types (records, unions, tuples, etc.) to have null values, but it's only a compiler trick. At run-time, these types can have null values too, and you can provide null values via Reflection. C# could get such a compiler trick as well.
  • Immutability. F#'s immutability 'feature' is similar to how it deals with nulls. Lots of F# can be mutable (everything that interacts with C# code), but the special Functional data types can't. Again, it's mostly in how these specific data types are implemented under the hood that provides this feature, and C# could get that as well.
  • Options. These are really just specialised Discriminated Unions, so C# could get those too.
  • Object Expressions. Java has had those for years, so there's no reason C# couldn't get them as well.
  • Partial Function Application. You can already do this in C# today, but the syntax for it is really awkward. Thus, there's no technical reason C# can't have that, but the C# language designers would have to come up with a better syntax.
  • Scripting. F# is great for scripting, but as the success of scriptcs has shown, nothing prevents C# from being a scripting language either.
  • REPL. A REPL is a really nice tool, but scriptcs already comes with a REPL, again demonstrating that C# could have that too.
This list in no way implies that C# will get any or all of these features. While I'm an MVP, I have no inside insight; I'm only speculating. My point is that I see no fundamental reason C# couldn't eventually get those features.

What F# features can C# never get?

There are a few F# features that many people point to as their favourite, that C# is unlikely to get. A couple of them are:

  • Type Providers. Someone that I completely trust on this issue told me quite authoritatively that "C# will never get Type Providers", and then laughed quietly. While I don't know enough about the technical details of Type Providers to be able to evaluate that statement, I trust this person completely on this issue.
  • Units of Measure. Here, I simply have to confess ignorance. While I haven't seen talk about units of measure for C#, I have no idea whether it's doable or not.
These are some loved features of F# that look unlikely to be ported to C#, but there's one quality of F# that I'm absolutely convinced will never make it to C#, and this is one of the killer features of F#: it's what you can't do in the language.

In a recent article, I explained how less is more when it comes to language features. Many languages come with redundant features (often for historical reasons), but the fewer redundant features a language has, the better.

The F# compiler doesn't allow circular dependencies. You can't use a type or a function before you've defined it. This may seem like a restriction, but is perhaps the most important quality of F#. Cyclic dependencies are closely correlated with coupling, and coupling is the deadliest maintainability killer of code bases.

In C# and most other languages, you can define dependency cycles, and the compiler makes it easy for you. In F#, the compiler makes it impossible.

Studies show that F# projects have fewer and smaller cycles, and that there are types of cycles (motifs) you don't see at all in F# code bases.

The F# compiler protects you from making cycles. C# will never be able to do that, because it would be a massive breaking change: if the C# compiler was changed to protect you as well, most existing C# code wouldn't be able to compile.

Microsoft has never been in the habit of introducing breaking changes, so I'm quite convinced that this will never happen.

Summary

C# could, theoretically, get a lot of the features that F# has, but not the 'feature' that really matters: protection against coupling. Since coupling is one of the most common reasons for code rot, this is one of the most compelling reasons to switch to F# today.

F# is a great language, not only because of the features it has, but even more so because if the undesirable traits it doesn't have.


Comments

I would say C# won't get non-nullable reference types either, even in the form of a compiler trick. It would either introduce too much of breaking changes or be very limited and thus not especially usefull.

2015-04-18 15:36 UTC

Vladimir, thank you for writing. You're probably correct. Many years ago, I overheard Anders Hejlsberg say that it wouldn't be possible to introduce non-nullable reference types into the .NET platform without massive breaking changes. I can't say I ever understood the reasoning behind this (nor was it ever explained to me), but when Anders Hejlsberg tells you that, you sort of have to accept it :)

FWIW, there's a bit of discussion about non-nullable reference types in the C# Design Meeting Notes for Jan 21, 2015, but I have to admit that I didn't follow the link to Eric Lippert's blog :$

2015-04-18 18:57 UTC

Less is more: language features

Monday, 13 April 2015 08:16:00 UTC

Many languages have redundant features; progress in language design includes removing those features.

(This post is also available in Japanese.)

There are many programming languages, and new ones are being introduced all the time. Are these languages better than previous languages? Obviously, that's impossible to answer, since there's no clear measurement of what constitutes a 'better' programming language.

Still, if you look at a historical trend, it looks as though one way to make a better language is to identify a redundant language feature, and design a new language that doesn't have that feature.

"perfection is attained not when there is nothing more to add, but when there is nothing more to remove." - Antoine de Saint Exupéry

In this article, you'll see various examples of language features that have already proven to be redundant, and other features where we are seeing strong indications that they are redundant.

Limitless ways to shoot yourself in the foot.

When the first computers were created, programs had to be written in machine code or assembly language. In machine code, you can express everything the CPU can do, because the machine code is written it terms of a CPU's instruction set. While it's possible to write correct programs in machine code, you can also write a lot of incorrect programs, including programs that crash horribly, or perhaps even destroy the machine on which they are running.

You're most likely used to writing code in a higher-level language, but even so, if you share my experience with this, you'll agree that it takes a lot of trial and error to get things right. For every correct program, there are many incorrect variations. The set of incorrect programs is much bigger than the set of correct programs.

With machine code, you have limitless ways to create incorrect programs. Yes: you can express everything the CPU can execute, but most of it will be incorrect. Thus, the set of valid programs is a small subset of the set of all possible instructions.

The set of all valid programs, inside the much larger set of all possible instructions.

Early computer programmers quickly discovered that writing in machine code was extremely error-prone, and the code was also as unreadable as it could be. One way to address this problem was to introduce assembly language, but it didn't solve the underlying problems: most assembly languages were just 'human-readable' one-to-one mappings over machine code, so you still have unlimited ways to express incorrect programs.

High-level languages

After having suffered with machine code and assembly language for some time, programmers figured out how to express programs in high-level languages. The first programming languages aren't in much use today, but a good example of a 'low-level' high-level language still in use today is C.

The set of all valid programs, inside the much larger set of all possible instructions, with the overlay of all possible instructions in a high-level language.

In C, you can express almost all valid programs. There are still tons of ways to write incorrect programs that will crash the program (or the computer), but since the language is an abstraction over machine code, there are instructions you can't express. Most of these are invalid instruction sequences anyway, so it's for the better.

Notice what happened: moving from machine code to a high-level programming language removes a feature of the language. In machine code, you can express anything; in a high-level language, there are things you can't express. You're okay with that, because the options that were taken away from you were bad for you.

GOTO

In 1968, Edsger Dijkstra published his (in)famous paper Go To Statement Considered Harmful. In it, he argued that the GOTO statement was an bad idea, and that programs would be 'better' without the GOTO statement. This sparked a decade-long controversy, but these days we've come to understand that Dijkstra was right. Some of the most popular languages in use today (e.g. Java and JavaScript) don't have GOTO at all.

The set of all valid programs, inside the much larger set of all possible instructions, with the overlay of all possible instructions in a high-level language without GOTO.

Since GOTO has turned out to be an entirely redundant language feature, a language without it doesn't limit your ability to express valid programs, but it does limit your ability to express invalid programs.

Do you notice a pattern?

Take something away, and make an improvement.

There's nothing new about this; Robert C. Martin told us that years ago.

You can't just arbitrarily take away anything, because that may constrain you in such a way that there are valid programs you can no longer write:

The set of all valid programs, inside the much larger set of all possible instructions, with the overlay of a language that takes away the wrong features.

You have to take away the right features.

Exceptions

Today, everyone seems to agree that errors should be handled via some sort of exception mechanisms; at least, everyone can agree that error codes are not the way to go (and I agree). We need something richer than error codes, and something that balances usefulness with robustness.

The problem with exceptions is that this mechanism is really only GOTO statements in disguise - and we've already learned that GOTO is considered harmful.

A better approach is to use a sum type to indicate either success or failure in a composable form.

Pointers

As Robert C. Martin also pointed out, in older languages, including C and C++, you can manipulate pointers, but as soon as you introduce polymorphism, you no longer need 'raw' pointers. Java doesn't have pointers; JavaScript doesn't do pointers; C# does allow you to use pointers, but I've personally never needed them outside of interop with the Windows API.

As these languages have demonstrated, you don't need access to pointers in order to be able to pass values by reference. Pointers can be abstracted away.

Numbers

Most strongly typed languages give you an opportunity to choose between various different number types: bytes, 16-bit integers, 32-bit integers, 32-bit unsigned integers, single precision floating point numbers, etc. That made sense in the 1950s, but is rarely important these days; we waste time worrying about the micro-optimization it is to pick the right number type, while we lose sight of the bigger picture. As Douglas Crockford explains, in JavaScript there's only a single number type, which is a great idea - just too bad it's the wrong single number type.

The set of all valid programs, inside the much larger set of all possible instructions, with the overlay of a language with a single, sane number type.

With the modern computer resources we have today, a programming language that would restrict you to a single, sane number type would be superior to the mess we have today.

Null pointers

Nulls are one of the most misunderstood language constructs. There's nothing wrong with the concept of a value that may or may not be present. Many great programming languages have this concept. In Haskell, it's called Maybe; in F#, it's called option; in T-SQL it's called null. What's common in all these languages is that it's an opt-in language feature: you can declare a value as being 'nullable', but by default, it isn't (nullable).

However, due to Tony Hoare's self-admitted billion-dollar mistake, mainstream languages have null pointers: C, C++, Java, C#. The problem isn't the concept of 'null', but rather that everything can be null, which makes it impossible to distinguish between the cases where null is an appropriate and expected value, from the cases where null is a defect.

Design a language without null pointers, and you take away the ability to produce null pointer exceptions.

The set of all valid programs, inside the much larger set of all possible instructions, with the overlay of a language without null pointers.

If Tony Hoare is correct that null pointers cost billions of dollars in the last few decades, getting rid of that source of defects can save you lots of money. From Turing-complete languages like T-SQL, Haskell, and F# we know that you can express all valid programs without null pointers, while a huge source of defects is removed.

Mutation

A central concept in Procedural, Imperative, and Object-Oriented languages is that you can change the value of a variable while executing your program. That's the reason it's called a variable. This seems intuitive, since a CPU contains registers, and what you actually do when you execute a program is that you move values in and out of those registers. It's also intuitive because the purpose of most programs is to change the state of the world: Store a record in a database. Send an email. Repaint the screen. Print a document. Etc.

However, it turns out that mutation is also a large source of defects in software, because it makes it much harder to reason about code. Consider a line of C# code like this:

var r = this.mapper.Map(rendition);

When the Map method returns, has rendition been modified? Well, if you follow Command Query Separation, it shouldn't, but the only way you can be sure is to review the implementation of the Map method. What if that method exhibits the same problem, by calling into other methods that could mutate the state of the application? There's nothing in C# (or Java, or JavaScript, etc.) that prevents this from happening.

In a complicated program with a big call stack, it's impossible to reason about the code, because everything could mutate, and once you have tens or hundreds of variables in play, you can no longer keep track of them. Did the isDirty flag change? Where? What about the customerStatus?

Imagine taking away the ability to mutate state:

The set of all valid programs, inside the much larger set of all possible instructions, with the overlay of the set of possible instructions in a language without mutation.

Most languages don't completely take away this ability, but as Haskell (a Turing complete language) demonstrates, it's possible to write any program without implicit state mutation.

At this point, many people might object that Haskell is too difficult and unintuitive, but to me, that kind of argumentation is reminiscent of the resistance to removing GOTO. If you are used to relying on GOTO, you have to learn alternative ways to model the same behaviour without GOTO. Likewise, if you are used to relying on state mutation, you have to learn alternative ways to model the same behaviour without state mutation.

Reference Equality

In Object-Oriented languages like C# and Java, the default equality comparison is Reference Equality. If two variables point to the same memory address, the variables are considered equal. If two variables have all identical constituent values, but point to two different memory addresses, they are considered different. That's not intuitive, and many software defects are caused by this.

What if you take away Reference Equality from a language?

The set of all valid programs, inside the much larger set of all possible instructions, with the overlay of the set of possible instructions in a language without Reference Equality.

What if all data structures instead had Structural Equality?

This one I'm not entirely sure about, but in my experience, I almost never need Reference Equality. For Basic Correctness you never need it, but I wonder if you may need the occasional reference comparison in order to implement some performance optimizations... Still, it wouldn't hurt to switch the defaults so that Structural Equality was the default, and there was a special function you could invoke to compare two variables for Reference Equality.

Inheritance

Even in 2015, inheritance is everywhere, although it's more than 20 years ago the Gang of Four taught us to favor object composition over class inheritance. It turns out that there's nothing you can do with inheritance that you can't also do with composition with interfaces. The converse doesn't hold in languages with single inheritance: there are things you can do with interfaces (such as implement more than one), that you can't do with inheritance. Composition is a superset of inheritance.

This isn't just theory: in my experience, I've been able to avoid inheritance in the way I've designed my code for many years. Once you get the hang of it, it's not even difficult.

Interfaces

Many strongly typed languages (for example Java and C#) have interfaces, which is a mechanism to implement polymorphism. In this way, you can bundle various operations together as methods on an interface. However, one of the consequences of SOLID is that you should favour Role Interfaces over Header Interfaces, and as I've previously explained, the logical conclusion is that all interfaces should only have a single member.

When interfaces only have a single member, the interface declaration itself tends to mostly be in the way - the only thing that matters is the operation. In C#, you could just use a delegate instead, and even Java has lambdas now.

There's nothing new about this. Functional languages have used functions as the basic compositional unit for years.

In my experience, you can model everything with single-member interfaces, which also implies that you can model everything with functions. Again, this isn't really surprising, since functional languages like Haskell are Turing complete too.

It's possible to get by without interfaces. In fact, Robert C. Martin finds them harmful.

Reflection

If you've ever done any sort of meta-programming in .NET or Java, you probably know what Reflection is. It's a set of APIs and language or platform features that enable you to inspect, query, manipulate, or emit code.

Meta-programming is an indispensable tool, so I'd be sorry to lose it. However, Reflection isn't the only way to enable meta-programming. Some languages are homoiconic, which means that a program in such languages is structured as data, which itself can be queried and manipulated like any other data. Such languages don't need Reflection as a feature, because meta-programming is baked into the language, so to speak.

In other words: Reflection is a language feature aimed at the goal of meta-programming. If meta-programming can be achieved via homoiconicity instead, it would imply that Reflection is a redundant feature.

Cyclic Dependencies

While it may be true that null pointers are the biggest single source of software defects, in my experience the greatest single source of unmaintainable code is coupling. One of the most problematic types of coupling is cyclic dependencies. In languages like C# and Java, cyclic dependencies are almost impossible to avoid.

Here's one of my own mistakes that I only discovered because I started to look for it: in the otherwise nice and maintainable AtomEventStore, there's an interface called IXmlWritable:

public interface IXmlWritable
{
    void WriteTo(XmlWriter xmlWriter, IContentSerializer serializer);
}

As you can tell, the WriteTo method takes an IContentSerializer argument.

public interface IContentSerializer
{
    void Serialize(XmlWriter xmlWriter, object value);
 
    XmlAtomContent Deserialize(XmlReader xmlReader);
}

Notice that the Deserialize method returns an XmlAtomContent value. How is XmlAtomContent defined? Like this:

public class XmlAtomContent : IXmlWritable

Notice that it implements IXmlWritable. Oh, dear!

Although I'm constantly on the lookout for these kinds of things, that one slipped right past me.

In F# (and, I believe, OCaml), on the other hand, this wouldn't even have compiled!

While F# has a way to introduce small cycles within a module using the and and rec keywords, there are no accidental cycles. You have to explicitly use those keywords to enable limited cycles - and there's no way to define cycles that span modules or libraries.

What an excellent protection against tightly coupled code! Take away the ability to (inadvertently) introduce Cyclic Dependencies, and get a better language!

The set of all valid programs, inside the much larger set of all possible instructions, with the overlay of the set of possible instructions in a language that disallows cycles.

This has even been shown to work in the wild: Scott Wlaschin examined C# and F# projects for cycles, and found that F# projects have fewer and smaller cycles than C# projects. This analysis was later enhanced and corroborated by Evelina Gabasova.

Summary

What I've tried to illustrate in this article is that there are many ways in which you could make a better language by taking away a particular feature from an existing language. Take away a redundant feature, and you'll still have a Turing complete language that can do (close to) anything, but with fewer options for shooting yourself in the foot.

Perhaps the ultimate programming language is a language without:

  • GOTO
  • Exceptions
  • Pointers
  • Lots of specialized number types
  • Null pointers
  • Mutation
  • Reference equality
  • Inheritance
  • Interfaces
  • Reflection
  • Cyclic dependencies

Have I identified all possible redundant features? Most likely not, so here's a great opportunity for language designers to define an even better language, by finding something new to take away!


Comments

The idea that Javascript is a superset of "valid programs", but that C is not, could do with more explanation.

My background is in server code (Haskell/Java/Python/Javascript/Rust), web code (Javascript) and embedded code (Rust/C) (with some ARM assembly for when Rust/C can't produce the needed program unaided). This makes the idea that I could express programs in Javascript, which I couldn't in C, very interesting.

I think that Rust is going to be very important in embedded development, in a few years.

P.S. I hope I have commented in the correct way.

2015-04-14 8:07 UTC

Chris, thank you for writing. Where in this article do I claim that there are programs you could express in JavaScript, but not in C?

2015-04-14 14:41 UTC

Thanks for the post Mark. Some great points; I like especially the idea of disallowing cyclic dependencies. That'd be awesome on a legacy Java project I'm working on now!

Having working in Scala and a little Haskell, I can say I love the Maybe type. One thing I have trouble visualising still is Exceptions. I think I need to find some more examples on doing without exceptions in real applications.

I guess there's somewhat of a middle ground between having the "safest" language, vs the most performant language. In a lot of cases you don't need awesome performance so the safest language is the better one... but in some you do. You occasionally need control over how your bits are packed.

Let's hope for the future!

2015-04-16 6:24 UTC

Lachlan, thank you for writing. When it comes to working without exceptions, the point is to replace them with something stronger. Scott Wlashin's post and presentation about Railway Oriented Programming is a great place to start. You can see another example in my No Mocks presentation, and in one of my up-coming Pluralsight courses.

When it comes to the balance between safe and performant, there will always be room for languages that sacrifice safety for speed, but in my experience, this shouldn't be a common concern. As soon as you're doing any sort of significant I/O, the cost of that tends to be so much higher than any potential imperfections in pure CPU processing.

Apart from that, I don't see why, in principle, a safe language couldn't also be performant. These two traits don't seem to me to be intrinsically mutually exclusive.

2015-04-17 8:55 UTC

As one of less-is-more advocates, I feel your article makes sense instinctively. However, one thing keeps bugging me: How do you define "valid programs"?

I think I know what you try to mean with it, but once we try to be more specific, validity can only be defined in terms of a certain framework of semantics. If we have such a framework, then we can derive a programming language from it, which would be a language that only accepts valid programs and reject others. Problem solved.

In reality we don't have such a framework. We don't know what is a valid program precisely. There are programs that doesn't seem to make sense, and there are programs that are obviously useful, but there are huge gray area inbetween. It's actually a moving target---the context, the environment, the user, and other outside factors will greatly affect what's valid or not. But the fact that we don't know what valid programs shakes the ground of this whole discussion, doesn't it? Or it can end up tautology---"valid programs are defined in terms of this language, so this language fit the best to cover the valid programs."

2015-05-15 03:09 UTC

Shiro Kawai, thank you for writing. In this article, I deliberately left the definition of validity vague. Why do we write software? If we exclude Katas and the like (which we do for training), software is ultimately written in order to solve problems; in order to be used. A valid program is a program that's useful.

This doesn't change the discussion, which is pragmatic. From experience, we know that we don't need GOTO; from experience, we know that we don't need pointers; from experience, we know that we don't need null pointers; from experience, we know that we don't need inheritance; etc.

2015-05-15 5:47 UTC

Hi, thanks for the reply. I believe I get your intent. What I wanted was to point out a danger of this kind of argument. Because when you write software to solve problems, you need to articulate the problem, and the way you frame the problem is inevitably influenced by the frame of your language of choice. In other words, when you think you tighten the circle of the language features, it's not necessarily that you get it close to supposed set of "valid programs", but you may just as well start ignoring useful programs outside of your langauge circle and you don't even realize that. You just think "ok, there may be some programs that falls outside but it's not that important." How do you verify that the excluded gray area isn't that important?

(I wrote "you", but actually this is something I constanly try to remind myself, since I tend to be dragged toward feature-minimalism. I love Scheme.)

For example, you talk single-member inheritance and you brought up Haskell, but you didn't mention type classes. Isn't type classe sort of interface, in a sense that it defines a protocol consists of multiple functions? Do you say it's actually redundant? Or will you dismiss programs that can be expressed well using type classes "unimportant"?

Another controversial (and probably obscure) feature is change-class in CL. This one is such a beast that it's difficult to cope with many modern features (obviously it invalidates strict typecheckers). However, in the context where the feature is needed, I don't know if there's a replacement. If you get rid of it, you just have to give up the kind of software that require the feature. That's a reasonable trade-off, but what you do is that you cut off a part of "valid programs" in order to tighten the circle.

There's no solid circle of valid programs, and as you tighten the circle of the language, you actually shape the circle of valid programs that suit to the circle of language you draw. That's not necessarily a bad thing, but the language designers need to be aware of it, that's what I wanted to say.

2015-05-15 7:40 UTC

It's a good point that a language shapes how you approach problems. In this discussion, I assume that all languages (both existing and hypothetical) are Turing complete, but even under that assumption, there will be problems that are difficult, or perhaps even impossible, to address in a particular language.

The question is whether that isn't true for all languages?

In a sense, the most 'powerful' languages are all the dialects of machine code. By definition, they should be able to express everything a particular CPU can do, so by corollary, they should be the most complete languages available. Despite being complete, these dialects don't have inheritance, exceptions, Reflection, interfaces, or a lot of other things. Such features are features that have been added to some languages. Logically, I don't see how 'removing' such a feature constrains one's ability to express 'all valid programs'.

To be fair, that argument of mine doesn't apply to all the language features I've described. For example, CPU instruction sets most certainly allow immutability.

What I was aiming at with my article was never a formal discussion about how a hypothetical language would, or wouldn't, be able to express 'all valid programs'. The purpose was more to point out that decades of experience should by now have taught the overall programmer community that there are many language features that we can do without.

As I pointed out, you can't arbitrarily remove features. Some features are extremely important within a particular language, while it may be irrelevant in another language. While I don't know much about Type Classes, it seems to be an example of this: it's very important in Haskell, but doesn't exist at all in C# or F#.

This is also the reason why I don't think we'll ever have a single, perfect programming language. In the future, we'll also have many different languages, and they'll be good at different types of problems.

Just as it is today, it'll be important to know more than a single programming language, because, exactly as you wrote, a problem may 'fall outside' a given language, but then, if you know another language, you may realise that you can use it to solve that particular problem.

2015-05-16 14:29 UTC

Unit Testing with F# Pluralsight course

Thursday, 02 April 2015 13:38:00 UTC

My latest Pluralsight course is an introduction to unit testing with F#.

Perhaps you already know all about unit testing. Perhaps you already know all about F#. But do you know how to write unit tests in F#?

Unit testing and F# Venn diagram.

My new Pluralsight course explains how to write unit tests with F#. If you already know F# and unit testing on .NET, it's quite straightforward. This is my first beginner-level course on Pluralsight, so regular readers of this blog may find it too basic.

Still, if you don't know what Unquote is and can do for you, you may want to consider watching module four, which introduces this great assertion library, and provides many examples.

This entire course will, together with some of my existing Pluralsight courses, serve as a basis for more courses on F# and Test-Driven Development.


POSTing JSON to an F# Web API

Thursday, 19 March 2015 16:02:00 UTC

How to write an ASP.NET Web API service that accepts JSON in F#.

It seems that many people have problems with accepting JSON as input to a POST method when they attempt to implement an ASP.NET Web API service in F#.

It's really quite easy, with one weird trick :)

You can follow my recipe for creating a pure F# Web API project to get started. Then, you'll need to add a Data Transfer Record and a Controller to accept your data:

[<CLIMutable>]
type MyData = { MyText : string; MyNumber : int }
 
type MyController() =
    inherit ApiController()
    member this.Post(myData : MyData) = this.Ok myData

That's quite easy; there's only one problem with this: the incoming myData value is always null.

The weird trick

In addition to routes etc. you'll need to add this to your Web API configuration:

GlobalConfiguration.Configuration.Formatters.JsonFormatter.SerializerSettings.ContractResolver <-
    Newtonsoft.Json.Serialization.CamelCasePropertyNamesContractResolver()

You add this in your Application_Start method in your Global class, so you only have to add it once for your entire project.

The explanation

Why does this work? Part of the reason is that when you add the [<CLIMutable>] attribute to your record, it causes the record type to be compiled with auto-generated internal mutable fields, and these are named by appending an @ character - in this case, the field names become MyText@ and MyNumber@.

Apparently, the default JSON Contract Resolver (whatever that is) goes for those fields, even though they're internal, but the CamelCasePropertyNamesContractResolver doesn't. It goes for the properly named MyText and MyNumber writeable public properties that the compiler also generates.

As the name implies, the CamelCasePropertyNamesContractResolver converts the names to camel case, so that the JSON properties become myText and myNumber instead, but I only find this appropriate anyway, since this is the convention for JSON.

Example HTTP interaction

You can now start your service and make a POST request against it:

POST http://localhost:49378/my HTTP/1.1
Content-Type: application/json

{
    "myText": "ploeh",
    "myNumber": 42
}

This request creates this response:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{"myText":"ploeh","myNumber":42}

That's all there is to it.

You can also receive XML instead of JSON using a similar trick.


Property Based Testing without a Property Based Testing framework

Monday, 23 February 2015 20:03:00 UTC

Sometimes, you don't need a Property-Based Testing framework to do Property-Based Testing.

In my previous post, I showed you how to configure FsCheck so that it creates char values exclusively from the list of the upper-case letters A-Z. This is because the only valid input for the Diamond kata is the set of these letters.

By default, FsCheck generates 100 random values for each property, and runs each property with those 100 values. My kata code has 9 properties, so that means 900 function calls (taking just over 1 second on my Lenovo X1 Carbon).

However, why would we want to select 100 random values from a set of 26 valid values? Why not simply invoke each property (which is a function) with those 26 values?

That's not so hard to do, but if there's a way to do it with FsCheck, I haven't figured it out yet. It's fairly easy to do with xUnit.net, though.

What you'll need to do is to change the Letters type to an instance class implementing seq<obj[]> (IEnumerable<object[]> for the single C# reader still reading):

type Letters () =    
    let letters = seq {'A' .. 'Z'} |> Seq.cast<obj> |> Seq.map (fun x -> [|x|])
    interface seq<obj[]> with
        member this.GetEnumerator () = letters.GetEnumerator()
        member this.GetEnumerator () =
            letters.GetEnumerator() :> Collections.IEnumerator

This is simply a class that enumerates the char values 'A' to 'Z' in ascending order.

You can now use xUnit.net's Theory and ClassData attributes to make each Property execute exactly 26 times - one for each letter:

[<TheoryClassData(typeof<Letters>)>]
let ``Diamond is as wide as it's high`` (letter : char) =
    let actual = Diamond.make letter
 
    let rows = split actual
    let expected = rows.Length
    test <@ rows |> Array.forall (fun x -> x.Length = expected) @>

Instead of 900 tests executing in just over 1 second, I now have 234 tests executing in just under 1 second. A marvellous speed improvement, and, in general, a triumph for mankind.

The point is that if the set of valid input values (the domain) is small enough, you may consider simply using all of them, in which case you don't need a Property-Based Testing framework. However, I still think this is probably a rare occurrence, so I'll most likely reach for FsCheck again next time I need to write some tests.


Page 1 of 31

"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!