REST lesson learned: Avoid user supplied data in URI segments by Mark Seemann
Be careful with user-supplied data in URI segments.
This is a lesson about design of REST APIs that I learned the hard way: if you built service URIs from dynamic data, be careful with the data you allow in URI segments.
Some HTTP frameworks (e.g. the ASP.NET Web API) let you define URI templates or routes that handle incoming requests - e.g.
which means that a request to
http://foo.ploeh.dk/api/fnaah/sgryt would map the value "fnaah" to controller, and the value "sgryt" to id.
If you are building a level 2 or less service, you may even publish your URI templates to consumers. If you are building a level 3 RESTful API, you may not be publishing your URI templates, but you can still use them internally.
Avoid user-supplied data in URI segments.
Be careful with the source of data you allow to populate URI segments of your URI template. At one point, I was involved with designing a REST API that (among other things) included a 'tag cloud' feature. If you wanted to see the contents of a specific tag, you could navigate to it.
Tags were all user-defined strings, and they had no internal ID, so our first attempt was to simply treat the value of the tag as the ID. That seemed reasonable, because we wanted the tag resource to list all the resources with that particular tag value. Thus, we modeled the URI template for tag resources like the above route template.
That worked well for a URL like
http://foo.ploeh.dk/api/tags/rock because it would simply match the value "rock" to the id variable, and we could then list all resources tagged with "rock".
However, some user had defined a tag with the value of "sticky & sweet" (notice the ampersand character), which meant that when you wanted to see all resources with this link, you would have to navigate to
http://foo.ploeh.dk/api/tags/sticky & sweet. However, that sort of URL is considered dangerous, and IIS will (by default) refuse to handle it.
Can you get around this by URL encoding the value? No, it's part of the request path, not part of any query string, so that's not going to work. The issue isn't that the URL is invalid, but that the server considers it to be dangerous. Even if you URL encode it, the server will decode it before handling it, and that would you leave you at square one.
You can either change the URI template so that the URL instead becomes
http://foo.ploeh.dk/api/tags?id=sticky%20%26%20sweet. This URL encodes the query string (the part of the URL that comes after the ?), but gives you an ugly URL.
Another alternative is to be very strict about input validation, and only allow users to create values that are safe when used as URI segments. However, that's putting an unreasonable technical limitation on an application feature. If a user wants to tag a resource with "sticky & sweet", the service should allow it.
In the end, we used a third alternative: we assigned an internal ID to all tags and mapped back and forth so that the URL for the "sticky & sweet" tag became
http://foo.ploeh.dk/api/tags/1234. Yes: that makes it impossible to guess the URL, but we were building a level 3 RESTful API, so clients are expected to follow links - not guess the URL.