Byte Rot: Media type: how much can you cram into a single token?

[Level C4]

Introduction

This post discusses the problems associated with the use of a single token as media type (usually as the main value of the Content-Type header in HTTP response or Accept header in request) to describe all attributes of the content.

Motivation and background

This has been bugging me for a while. But recently I engaged in a discussion on twitter with Glenn Block @gblock and the rest of the REST enthusiast community on the options in versioning RESTful services. There are generally 2 camps: those advocating using Content Negotiation for versioning (putting version number in Content-Type header) and those preferring to stick to classic resource based versioning (including version number in the URL). Regardless of which one is better, MediaType lacks the richness required to express a media type and adding version information to a media type is not possible considering current status of the media type.

One of the main problems associated with the use of media type is its current implementation in various systems is key based, i.e. it involves matching all or none of the media type. As we will see this causes considerable problems in effective consumption of media types.

Media Type

Media type has been described in various RFCs (main one being RFC 2046) while historically these have been limited what is known as MIME types. RFC 4288 defines the procedure for registering the media types describing a formal process which needs to be followed to publicly register.

Registering a media type for a public API is all well and good but as described by this book, use of private APIs far exceeds use of public ones and registering all media types exposed within private APIs is impractical and unwarranted.

Also with popularity of REST-based APIs, there are going to be more and more service endpoints exposed. If all such services are to define new media types, we would have an explosion of media types rendering current implementation of content negotiation

Media type is a case of an extreme semantic mix-up. A single token has been used to express many different facets of a media type. In fact the semantic space with all its axes will contain many useful points yet industry currently uses a very sparse set of points defined as media type values. Rest of this space is unusable - as such a very inefficient solution.

We will now have a look at facets/axes.

1- Human-illegibility

This is the lowest and least specific level of semantic definition of a media type. It is very simple: content of a media type can be read by a human (for example text/plain, application/xml or application/json) or the data is meant for the machine comprehension or rendering (for example image/png or video/mpeg)

Having this information separate to the actual media type can help tools such as Fiddler to decide whether they can display text of the content whose media type is unknown to the tool. Media types initially used "text" to denote such information (e.g. text/xml or text/javascript) but these have been replaced with

2- Formatting

This is the most common and important axis of a media type information which informs the tools/clients which parser/interpreter/renderer to use for consuming such content. text/plain, application/xml, application/json, image/png or video/mpeg are all examples of such use of the media type.

There are several known vendor-specific media types in this space such as application/vnd.ms-excel.

3- Schema

This is a further specialisation of the formatting. Common examples include application/rss+xml or application/hal+json. Basically these mean that in terms of formatting, they are the same as their parent (application/xml or application/json) yet they follow a superset schema. Use of + sign - as far as I know - is not canonical and is merely a convention followed by the industry to add schema to the established formats. Comprehension of this convention would be crucial to correct interpretation of the media type without the need for having a dictionary of all possible values, however, I believe most tools we have at the moment lack such features.

4- Domain/Vendor specific

This is where we see most of the expansion in the media type space. Basically you could output your own media type via your private API. Since you will be the main consumer of the API, integration could be easy but it is very common for private APIs to go public - especially if they are successful. An example of such media types can be found here.

5- Versioning

Versioning is the highest aspect of a media type which is normally added to Domain-specific media types. This is a popular solution to the Web API versioning problem.

For example, you could have application/mydomain.customer.1.1 as opposed to application/mydomain.customer or application/mydomain.customer.1.0

So where is the problem?

Basically information gets lost.

First problem is that clients might be interested in a lower order of these aspects of media type while in order to consume the resource, they are forced to comprehend higher order and extract the axes they are interested in. For example, a tool such as fiddler could be only interested in only whether it could display the information for the end user as plain text. A client capable of consuming XML and deserialising to objects is only interested at knowing whether it is XML while it might be represented with a media type which is essentially XML but has a different value. On the other hand, if a server uses HAL to send domain objects/view models to the client, either it has to use the standard application/hal+json or use the domain level name of the media type (with or without a version).

Another problem is that the content negotiation process will become more complex. In the lack of a standard in defining multi-axial media types, most systems implement a dictionary based rule on content negotiation as such maintaining list of possible content types becomes a burdensome task.

A solution

Basically I believe we can solve this by keeping the common media types but use media type extensions in the Content-Type header (or in the Accept header). For example:

Content-Type: application/xml; human-illegible=true; domain-name=customer; domain-version=1.1

This will ensure that existing clients and servers will not break while new clients and servers can use new extensions for content negotiation and more loosely coupled resource consumption. I will try to expand upon this idea in another post.

Conclusion

Cramming as much as information into a single token and then try parsing that one token is not a good idea especially when it comes to media type which is the communication bridge between loosely coupled world of HTTP clients and servers.

Media type token value covers 5 different aspects of the resource and separating the concerns of breaking these aspects into their own tokens can result in more robust and decoupled systems.

Byte Rot

Monday, 22 October 2012

Media type: how much can you cram into a single token?