Byte Rot: ContentNegotiation

Saturday, 8 December 2012

5 levels of media type

[Level C4] HTTP spec defines use of media type in the value of serveral headers including content-type and Accept. Server and client can engage in the process of content negotiation to decide on the best suitable media type.

With the rising popularity of the REST systems and adoption of pure HTTP APIs, we are using media type not only for delivering content formatting information but also metadata for domain level application data. Advanced use of the media type and its controversies has been discussed before.

IANA is responsible for registering internet media types. RFC 4288 defines the process of media type registration through IANA.

As can be seen below, registration of media types increased by 2000 dot-com boom and then declined. This trend was reversed again by the REST awareness and resurgence of web 2.0 around 2006. Recently, we see a slight decline in the registrations - partly explained by the use of private media types.

Source: IANA - http://www.iana.org/assignments/contact-people.html

While there will always be a need for registering new formats, media type has been used to described not only the format but also the schema and domain level description of the application data. Use of media type for versioning resources is a controversial yet fairly popular trend.

The problem with creating new media types for describing anything other than format is that you you will be requiring the clients to understand the new media type - hence the client burden of the media type. Such clients could be very well capable of handling the format (and all they need might be to use understand the format) but unable to comprehend the media type. For example an XmlSerializer is capable of handling the XML format and that is all it cares about.

One such attempt to conserve the formatting information of the media types yet provide higher level constructs is the use of + in the second part of the media types such as application/rss+xml which combines formatting with schema (see below). But as I explained before many systems use a dictionary-based media type processing and cannot separate format and schema information and also this is rather a convention and not a canonical implementation.

I will review the logical levels of media type and then propose a solution for the current issues we are experiencing in the industry.

5 levels of media type

Media type deals with different levels of information. Any solution needs to take into account backward-compatibility, interoperability and extensibility. Media type can provide information at different levels. As the level goes up, the number of clients able to comprehend and interact with that level diminishes.

Lowest level of information is whether the content is human-readable. This was initially envisaged in the text/* media type but there again, it was mixing human-readability with the formatting. As we know text/xml and text/javascript later were converted to application/xml and application/javascript.

Next level is formatting, i.e. how a parser/processor can read and understand the media type. This is the most important aspect of a media type. Examples are application/xml, image/* and video/*.

Schema is a common superset of the formatting. Here we define different schema commonly within the same format. Examples of this are application/rss+xml, application/atom+xml and application/collection+json.

Domain level is where we have a lot of new interest. Many companies are using private and public APIs for exposing their data and services. As such, a recent trend is to take the schema to the next level where it defines a domain ~~object~~ model. As we have described before here, domain model is part of the server's public domain and could be in the format of command or query messages.

Including version information for a domain ~~object~~ model is the highest level of a media type. This is useful by only a subset of the clients capable of version coherence and version content negotiation.

In fact clients each can use the media type at a particular level:

So forcing he higher levels of media type upon clients will reduce interoperability.

Solution

I propose using additional properties for preserving the interoperability and backward-compatibility yet allowing rich higher level information. Currently HTTP 1.1 allows for custom additional parameters to be defined.

For example, if I am using application/atom+xml for passing customer domain ~~object~~ model in a CRM API, I can keep the format in the value of the content-type header and include the rest of the information (Note: we have to replace / with _ since it is not allowed in the parameter value according to HTTP spec):

content-type: application%2fxml;schema=application%2fatom+xml;is-text=true;domain-model=MyDomain.Customer;version=1.0.2.0

Or alternatively use the schema level information as the main value:

content-type: application%2fatom+xml;format=application%2fxml;is-text=true;domain-model=MyDomain.Customer;version=1.0.2.0

Please note the values of parameters need to be UrlEncoded.

Compared to single-token approach, this will preserve the interoperability and backward-compatibility while allowing for extensibility.

Difference of single-token vs. 5LMT. Please note the the values need to be UrlEncoded so application/xml will appear as application%2fxml

Conclusion

Registering media types should be done mainly for new formats. The problem with using a single token is that by setting the token at a any level, lower levels need to be inferred - as such client needs to understand the exact media type.

5-level media type compared to the single token approach provides a more robust solution for extensibility and interoperability of clients and servers in private and public APIs.

Monday, 22 October 2012

Media type: how much can you cram into a single token?

[Level C4]

Introduction

This post discusses the problems associated with the use of a single token as media type (usually as the main value of the Content-Type header in HTTP response or Accept header in request) to describe all attributes of the content.

Motivation and background

This has been bugging me for a while. But recently I engaged in a discussion on twitter with Glenn Block @gblock and the rest of the REST enthusiast community on the options in versioning RESTful services. There are generally 2 camps: those advocating using Content Negotiation for versioning (putting version number in Content-Type header) and those preferring to stick to classic resource based versioning (including version number in the URL). Regardless of which one is better, MediaType lacks the richness required to express a media type and adding version information to a media type is not possible considering current status of the media type.

One of the main problems associated with the use of media type is its current implementation in various systems is key based, i.e. it involves matching all or none of the media type. As we will see this causes considerable problems in effective consumption of media types.

Media Type

Media type has been described in various RFCs (main one being RFC 2046) while historically these have been limited what is known as MIME types. RFC 4288 defines the procedure for registering the media types describing a formal process which needs to be followed to publicly register.

Registering a media type for a public API is all well and good but as described by this book, use of private APIs far exceeds use of public ones and registering all media types exposed within private APIs is impractical and unwarranted.

Also with popularity of REST-based APIs, there are going to be more and more service endpoints exposed. If all such services are to define new media types, we would have an explosion of media types rendering current implementation of content negotiation

Media type is a case of an extreme semantic mix-up. A single token has been used to express many different facets of a media type. In fact the semantic space with all its axes will contain many useful points yet industry currently uses a very sparse set of points defined as media type values. Rest of this space is unusable - as such a very inefficient solution.

We will now have a look at facets/axes.

1- Human-illegibility

This is the lowest and least specific level of semantic definition of a media type. It is very simple: content of a media type can be read by a human (for example text/plain, application/xml or application/json) or the data is meant for the machine comprehension or rendering (for example image/png or video/mpeg)

Having this information separate to the actual media type can help tools such as Fiddler to decide whether they can display text of the content whose media type is unknown to the tool. Media types initially used "text" to denote such information (e.g. text/xml or text/javascript) but these have been replaced with

2- Formatting

This is the most common and important axis of a media type information which informs the tools/clients which parser/interpreter/renderer to use for consuming such content. text/plain, application/xml, application/json, image/png or video/mpeg are all examples of such use of the media type.

There are several known vendor-specific media types in this space such as application/vnd.ms-excel.

3- Schema

This is a further specialisation of the formatting. Common examples include application/rss+xml or application/hal+json. Basically these mean that in terms of formatting, they are the same as their parent (application/xml or application/json) yet they follow a superset schema. Use of + sign - as far as I know - is not canonical and is merely a convention followed by the industry to add schema to the established formats. Comprehension of this convention would be crucial to correct interpretation of the media type without the need for having a dictionary of all possible values, however, I believe most tools we have at the moment lack such features.

4- Domain/Vendor specific

This is where we see most of the expansion in the media type space. Basically you could output your own media type via your private API. Since you will be the main consumer of the API, integration could be easy but it is very common for private APIs to go public - especially if they are successful. An example of such media types can be found here.

5- Versioning

Versioning is the highest aspect of a media type which is normally added to Domain-specific media types. This is a popular solution to the Web API versioning problem.

For example, you could have application/mydomain.customer.1.1 as opposed to application/mydomain.customer or application/mydomain.customer.1.0

So where is the problem?

Basically information gets lost.

First problem is that clients might be interested in a lower order of these aspects of media type while in order to consume the resource, they are forced to comprehend higher order and extract the axes they are interested in. For example, a tool such as fiddler could be only interested in only whether it could display the information for the end user as plain text. A client capable of consuming XML and deserialising to objects is only interested at knowing whether it is XML while it might be represented with a media type which is essentially XML but has a different value. On the other hand, if a server uses HAL to send domain objects/view models to the client, either it has to use the standard application/hal+json or use the domain level name of the media type (with or without a version).

Another problem is that the content negotiation process will become more complex. In the lack of a standard in defining multi-axial media types, most systems implement a dictionary based rule on content negotiation as such maintaining list of possible content types becomes a burdensome task.

A solution

Basically I believe we can solve this by keeping the common media types but use media type extensions in the Content-Type header (or in the Accept header). For example:

Content-Type: application/xml; human-illegible=true; domain-name=customer; domain-version=1.1

This will ensure that existing clients and servers will not break while new clients and servers can use new extensions for content negotiation and more loosely coupled resource consumption. I will try to expand upon this idea in another post.

Conclusion

Cramming as much as information into a single token and then try parsing that one token is not a good idea especially when it comes to media type which is the communication bridge between loosely coupled world of HTTP clients and servers.

Media type token value covers 5 different aspects of the resource and separating the concerns of breaking these aspects into their own tokens can result in more robust and decoupled systems.