Showing posts with label REST. Show all posts
Showing posts with label REST. Show all posts

Saturday, 24 November 2012

Introducing Client-Server Domain Separation


[Level C4]

If you have followed my posts on REST and its client-server implications, you already know I have a thing for the client-server relationship.

I have been thinking about Client-Server Domain Separation CSDS for a while. And I think it is time to do a brain dump.

TLDR;

So here is the definition CSDS - if you don't want to read the whole post. CSDS defines a single constraint which is just an expansion on REST's client-server constraint:
Client and server must define and live in their own bounded context
This will lead to 1) cleaner separation of concerns among clients and servers and 2) adoption of API as the building blocks of complex systems in an SOA world. CSDS is also not compatible with HATEOAS - as we will see. If you need to find out how such a seemingly trivial constraint can have such an impact, read the rest.

Background

REST defines a set of constraints that will lead to better architecture and design - well that is the claim but I personally do believe in it. One of those constrains is client-server. As far as REST is concerned, client and server are decoupled entities - these ideas were successfully used in the design of HTTP. Yet considering limitations of the clients back in the day REST dissertation was being written, I think we need to re-visit this constraint.

I suppose it all started with smartphones. We now have more processing power in our pockets than the Apollo that landed on the moon for the first time. Native apps allow for developing pretty complex applications while HTML/JS app has become a reality (better browsers, adoption of HTML5, better javascript runtime and development tools). 

The dilemma we are faced now is to decide where to implement a functionality (in other words put the business logic): client or server. Back in the late 90s or early 2000s we did not have a choice - we had to implement most of the functionality on the servers and use client-side code mainly for limited validation. We lived in a time of server dominance. Now we have the liberty to implement a sizeable chunk of functionality in both places but getting the balance right is difficult and has lead to mainly two opposing camps: Single Page Application followers and server domination supporters. I am more inclined towards the first but shall explain below why CSDS will lean more on the SPA side rather than server domination - although not as a matter of taste but as a matter of principle - question is whose concern is a functionality.

Other changes in the industry have contributed to the need to define client and server. Nowadays, it is only incidental that the client code in an html/js application is served from the server - we can package up javascript files with the application as in PhoneGap or Windows 8 metro applications. In mash-up applications, there is no single server defining the flow as such HATEOAS is meaningless.

Introduction

In here I briefly re-iterate what I explained in two related posts.

First of all, in order to decide where a functionality belongs - server or client - we need to understand whose concern it is. I talked about server-concern, client-server and mixed-concern and I explained their anti-patterns, each with an example.

In this post I tried to define client and server - as they stand now. So I am going to go back to the same definitions - with client definition slightly changed.

Server is responsible for defining a domain (server domain) and maintaining its state and consistency/integrity. Server is usually very complex but a good server hides its complexity behind its API. Server should not expose its internals to the outside world.

Server in CSDS


Client is responsible for using server(s) services to provide value to the end user - either directly if it is a client device or indirectly if client itself is a server. Client can also maintain a state but it is not its primary function.

We also touched on Application. For me application => value => user => client. Application, use and usability is mainly a client concern. Having said that, server will define a secondary level of API which could use underlying basic APIs and present a more useful representation of its state. As such, server is not completely oblivious to the user/value. For example, a high-street bank could have an API returning 10 most recent transactions on your account defined as a resource at /account/{id}/transaction/mostrecent,since this is a very common query. This is instead/or in addition to providing an API which allows client to define date ranges and number of transactions returned. The application sitting on a client device does have the liberty to show only 6 of them if its usability mandates such restriction.

CSDS definition

So CSDS can be seen as a superset style on the top of REST with a single constraint on REST's client-server constraint. This is similar to, for example, HATEOAS which builds upon hypermedia constraint. So the constraint is:
Client and server must define and live within their own bounded context.
In other words, decision to where put a functionality is to ascertain whose concern it is. Is that it? Yes, that is it. Yet, this is going to have quite a big impact as well as important repercussions.

First of all since each defines its own boundary, client's domain is separate from server's domain. Their interaction is only through the API. "Domain objects" in the diagram above are usually regarded as view-models which are a translated version of the corresponding models in these two different domains - which in DDD terms are called context map.

By keeping client and server in their own bounded context, internals of each can be changed independently. By separating the domains, we achieve the client-server decoupling which is the goal of the client-server constraint - as Fielding puts it.

So here are some of the aspects and implications of CSDS:

Client has full coherence of server's public domain

This means that client is free to have full coherence of server's public domain including all its public API, domain objects and schemata. It is able to call, discover and make full use of the public API in any order or fashion it needs.

Server is responsible for versioning its public domain

In CSDS, client building dependency on the top of the public domain is not regarded harmful and actually is seen as essential. Server already knows that by changing the public domain, it will be breaking clients as such server is responsible for visioning its public domain.

Server has got no clue about the client

In CSDS, server has no reliance on its knowledge about the client calling it. Of course, in HTTP, it can use user-agent header for statistical purposes. Or in the case of OAuth, it can know the name of the application and perhaps even limit the scope of the public API according to that but this is an authorization concern - authentication and authorization of the calls are server concerns. In other words, it should not make any assumptions about the client, client device or its capabilities.

Server got no clue about the client - one of the clients could be a server itself (not shown, could not find the original visio to add the server :( )

CSDS, HATEOAS and hypermedia

CSDS is not compatible with HATEOAS. Why? Well, HATEOAS talks about hypermedia (a server concern but part of public domain) as the engine of the application. What application? Server got no clue about it. When I am listening to Spotify, I can tweet the song I am listening to. Publishing this tweet is no different to doing this from a twitter client, TuneIn radio client, etc. Server does not know what application is using it (although as we said it could know the name of the application in OAuth as a string) or where in the application this tweet happens. As such it cannot be the engine of the application. Also in a mash-up application, no single server could be the engine - there are multiple servers. 

CSDS regards hypermedia an important aspect of REST; it is a semantic web of interconnected resources. Client will have full coherence of the axes of such relationships and can effectively use to navigate the semantic web - since it is part of the public domain. But for it to become the engine of the application is server dominance.

Server has a lot to worry about

CSDS acknowledges utmost complexity of the server. Reliable storage, big data, high availability HA, sharding, resilience, redundancy, etc are all server concerns. Implementing the right server-side architecture is not easy as such server is best to focus on its own concerns rather than dominating the client by implementing client's concerns too.

CSDS leads to a cleaner SOA, especially when client itself is a server

Recent server-side challenges and trends in achieving a scalable and highly available architecture has added the focus for achieving the right balance in the client-server separation.

Listening to Adrian Cockcroft's talk in Cambridge on Netflix's architecture and having read Daniel Jacobson's book, I have a lot of appreciation for what these guys are doing and I think this will become a roadmap for a cleaner and more decoupled SOA. Adrian explained how in Netflix, they have used a web of micro-SOA services through REST APIs to create a resilient architecture whereby they even send chaos monkeys and gorillas to bring down servers or even server zones. I believe this is only possible with separating domains of each micro-SOA service. So a lot of kudos to them and it is a place to watch.

Sunday, 28 October 2012

How limitation can be source of goodness - NoSQL, REST and more

[Level C3]

It just dawned on me. I came to a startling realisation that limitation/restriction/constraint  - which are words with negative connotation - can/will generate creativity and lead to goodness. This is more or less is saying "less is more". But looking at it from another angle.

Story of twitter

I do not know about you but I think Twitter is one of the biggest inventions of recent decades, somewhere along the lines of Gutenberg's printing or Priestly's Soda. Regardless of what most think about the revolution of social networking, I believe Twitter is not of the same breed - it is not a compressed facebook.

Twitter is centred around a stupidly simple idea: you have 140 characters to express yourself. No more. Yes, it also has re-tweet, follow, favourite, etc but these are features, if you remove them twitter will be more or less twitter although its usefulness will be limited. But if you remove the 140 character limitation suddenly it is not twitter anymore. Twitlonger is not twitter. With all due respect to @daltonc app.net with a limitation much different from 140 characters will not make a new twitter.

But why? Because by limiting you to use only 140 characters, you are forced to express yourself more succinctly. Makes you think really hard about what you want to say and remove all that matters less. Limitation leads you towards the ethos of twitter. You cannot write a line in the terms and condition of twitter  asking users to write only intelligent tweets but twitter has achieved this by enforcing its 140 characters limitation.

Why NoSQL matters

They say behind every successful man, there is a powerful woman. And I would say behind every inflexible and crippling architecture in the enterprise there is a big legacy database. Database becomes legacy soon after the first release, since other layers change but database cannot keep up. What I mean by database is an RDBMS database (SQL Server, Oracle, you name it.

We have cut down our processes, we do agile, we do lean, we have got continuous integration, we do unit testing and TDD, we do BDD and continuous deployment, we do ... we have built a process to minimise the risk and impact of change. Yet it is so difficult to change and the hardest to change is database.

The crux of this issue goes to the fact that business logic creeps into database. I recently witnessed how a complex calculation had to be done in an inaccurate rounding since it had to match the database calculation logic - yes there was calculation in database. 

We all believe that we should not put the business logic in database. But why do we keep doing it? Because we can. The problem is best practices and disciplines are difficult to enforce when project is late, we have a critical bug to fix or we need to make a quick change to the system and easiest solution is to change the stored procedure.

As far as SQL Server is concerned, we can have intra-model logic (calculated fields), domain-wide logic (stored procedure and user-defined functions) and cross-boundary logic (service broker) - and you can even deploy compiled code (SQL CLR). And since we can, we will.

And here is why I think NoSQL is useful - apart from all the hype around it. You simply cannot put business logic in it, so you don't. And this will lead into better design. So I like NoSQL not because of what I can do but because of what I cannot do. I would agree with Stonebraker's article and believe that NoSQL technologies could be low-tech compared to hi-tech RDBMS, but I cannot abuse them in the same way. NoSQL fully focuses on storage and retrieval and not replicating all those features that should not be implemented in a database (XML Manipulation, Message Bus, logic etc)

A table in SQL can translate to 5-6 Redis data structures so that you could effectively query and access data. So there is more work to be done but I like it since I would really think about what I need to store and what I need to query back (remember twitter?).

Lessons from REST

Regardless of all the bloated hype and endless controversies in interpreting REST, it works. It just simply works. So we have a set of constraints and if you follow them it will lead to goodness. Example? HTTP.

REST is an absolute example of how constraints can lead to a better design.

Limitation in creative arts

I am big fan of minimalism and Philip Glass is one of my favourite contemporary composers. In minimalism, a compact musical idea is repeated to create rhythm, melody and ultimately harmony - and this is usually created using layers of repetitive chords. For me who love minimal music, Satyagraha opera is the pinnacle of minimal musical expression through a limiting set of musical material. 

Apart from musical material, limitation in number of instruments is also an important aspect. String quartet is one of the most expressive forms of music - and I just love it be it Beethoven or Shostakovich. Rock counterpart for the string quartet is probably the rock trio (singer as an instrument?) where some of the best music ever produced (from Jimi Hendrix and Cream to Rage Against The Machine and Nirvana). 

In social and political terms, we have experienced an explosion of modern and beautiful art in the eastern bloc during the oppression of communist governments. Composers and film directors had to find their own language to express their art. Since they could no longer look outside for inspiration, they turned inside and a new era of creativity and great art flourished: Andrei Tarkovsky, Istvan SzaboAndrzej Wajda, Shepitko and many more. Shostakovich arguably produced his best works during the fierce Stalinist oppression. It is interesting that when the restrictions are lifted, artist is no longer able to produce the same quality of works. Wajda's Iron Man seemed like just a shabby copy of Marble Man. And Tarkovsky's two last films made outside Russia did not feel like the previous ones. 

The same oppression created the new wave of Iran Cinema with the likes of Makhmalbaf, Mehrjui, Kiarostami and others.

Now should we create an oppressive government so that we get a great artistic output?! No, but perhaps we can have a 60's style drug revolution which does the same :)


Monday, 22 October 2012

Media type: how much can you cram into a single token?

[Level C4]

Introduction

This post discusses the problems associated with the use of a single token as media type (usually as the main value of the Content-Type header in HTTP response or Accept header in request) to describe all attributes of the content.

Motivation and background

This has been bugging me for a while. But recently I engaged in a discussion on twitter with Glenn Block @gblock and the rest of the REST enthusiast community on the options in versioning RESTful services. There are generally 2 camps: those advocating using Content Negotiation for versioning (putting version number in Content-Type header) and those preferring to stick to classic resource based versioning (including version number in the URL). Regardless of which one is better, MediaType lacks the richness required to express a media type and adding version information to a media type is not possible considering current status of the media type.

One of the main problems associated with the use of media type is its current implementation in various systems is key based, i.e. it involves matching all or none of the media type. As we will see this causes considerable problems in effective consumption of media types.

Media Type

Media type has been described in various RFCs (main one being RFC 2046) while historically these have been limited what is known as MIME types. RFC 4288 defines the procedure for registering the media types describing a formal process which needs to be followed to publicly register.

Registering a media type for a public API is all well and good but as described by this book, use of private APIs far exceeds use of public ones and registering all media types exposed within private APIs is impractical and unwarranted.

Also with popularity of REST-based APIs, there are going to be more and more service endpoints exposed. If all such services are to define new media types, we would have an explosion of media types rendering current implementation of content negotiation 

Media type is a case of an extreme semantic mix-up. A single token has been used to express many different facets of a media type. In fact the semantic space with all its axes will contain many useful points yet industry currently uses a very sparse set of points defined as media type values. Rest of this space is unusable - as such a very inefficient solution.

We will now have a look at facets/axes.

1- Human-illegibility

This is the lowest and least specific level of semantic definition of a media type. It is very simple: content of a media type can be read by a human (for example text/plain, application/xml or application/json) or the data is meant for the machine comprehension or rendering (for example image/png or video/mpeg)

Having this information separate to the actual media type can help tools such as Fiddler to decide whether they can display text of the content whose media type is unknown to the tool. Media types initially used "text" to denote such information (e.g. text/xml or text/javascript) but these have been replaced with 

2- Formatting

This is the most common and important axis of a media type information which informs the tools/clients which parser/interpreter/renderer to use for consuming such content. text/plainapplication/xml, application/json, image/png or video/mpeg are all examples of such use of the media type. 

There are several known vendor-specific media types in this space such as application/vnd.ms-excel.

3- Schema

This is a further specialisation of the formatting. Common examples include application/rss+xml or application/hal+json. Basically these mean that in terms of formatting, they are the same as their parent (application/xml or application/json) yet they follow a superset schema. Use of + sign - as far as I know - is not canonical and is merely a convention followed by the industry to add schema to the established formats. Comprehension of this convention would be crucial to correct interpretation of the media type without the need for having a dictionary of all possible values, however, I believe most tools we have at the moment lack such features.

4- Domain/Vendor specific

This is where we see most of the expansion in the media type space. Basically you could output your own media type via your private API. Since you will be the main consumer of the API, integration could be easy but it is very common for private APIs to go public - especially if they are successful. An example of such media types can be found here.

5- Versioning

Versioning is the highest aspect of a media type which is normally added to Domain-specific media types. This is a popular solution to the Web API versioning problem.

For example, you could have application/mydomain.customer.1.1 as opposed to application/mydomain.customer or application/mydomain.customer.1.0

So where is the problem?

Basically information gets lost.

First problem is that clients might be interested in a lower order of these aspects of media type while in order to consume the resource, they are forced to comprehend higher order and extract the axes they are interested in. For example, a tool such as fiddler could be only interested in only whether it could display the information for the end user as plain text. A client capable of consuming XML and deserialising to objects is only interested at knowing whether it is XML while it might be represented with a media type which is essentially XML but has a different value. On the other hand, if a server uses HAL to send domain objects/view models to the client, either it has to use the standard application/hal+json or use the domain level name of the media type (with or without a version).

Another problem is that the content negotiation process will become more complex. In the lack of a standard in defining multi-axial media types, most systems implement a dictionary based rule on content negotiation as such maintaining list of possible content types becomes a burdensome task.

A solution

Basically I believe we can solve this by keeping the common media types but use media type extensions in the Content-Type header (or in the Accept header). For example:
Content-Type: application/xml; human-illegible=true; domain-name=customer; domain-version=1.1
This will ensure that existing clients and servers will not break while new clients and servers can use new extensions for content negotiation and more loosely coupled resource consumption. I will try to expand upon this idea in another post.

Conclusion

Cramming as much as information into a single token and then try parsing that one token is not a good idea especially when it comes to media type which is the communication bridge between loosely coupled world of HTTP clients and servers.

Media type token value covers 5 different aspects of the resource and separating the concerns of breaking these aspects into their own tokens can result in more robust and decoupled systems.

Monday, 6 August 2012

CacheCow.Client, using the benefits of HTTP Caching on the client

[Level T2]

Browsers are very sophisticated HTTP machines. We often fail to remember how much of the HTTP spec is implemented by the browsers.

As I have said before, ASP.NET Web API is a very powerful server-side framework but there is a client-side burden in using it or generally implementing a RESTful system - although Web API does not restrict you to a RESTful style.

Because of the client burden, we need more and more client-side libraries to implement lacking features that browser have had for such a long time - one of which is HTTP caching. If you use HttpClient out of the box, it will not implement any caching even though the resources are cacheable. Also all of the work for conditional GET or PUT calls (using if-none-match, etc) or cache validation (if there is must-revalidate) or checking whether your cache is stale has to be done in your own code.

CacheCow is an HTTP caching library for client and server in ASP.NET Web API that does all of above - see my earlier post on that. Storage of the cache is abstracted in ICacheStore and for now we can use in memory implementation (see below). So the features in the client library include:

  • Caching GET responses according to their caching headers
  • Verifying cached items for their staleness
  • Validating cached items if must-revalidate parameter of Cache-Control header is set to true. It will use ETag or Expires whichever exists
  • Making conditional PUT for resources that are cached based on their ETag or expires header, whichever exists

Today I released v0.1.3 of the CacheCow.Client on NuGet. This library would implement advanced HTTP caching with little or no configuration or hassle. All you have to do is to add the CachingHandler as a delegating handler to your HttpClient:

var client = new HttpClient(new DelegatingHandler()
       { 
           InnerHandler = new HttpClientHandler()
       });

This code will create an HttpClient that implements caching and stores the cache in memory. By implementing ICacheStore, you can store the cache in your custom repository. CacheCow is going to have persistent cache stores such as FileCacheStore, SqlCeCacheStore and SqliteCacheStore as a minimum. FileCacheStore will be similar to browser implementation of cache storage. Each of these cache stores will be implemented and released under its own NuGet package. To add an alternative cache store, you need to pass the store as a constructor parameter.

Usage

So in order to use, CacheCow.Client, use package manager in Visual Studio to download and add reference to it:

PM> Install-Package CacheCow.Client

This will also download and add reference to ASP.NET Web API client package, if you have not already added a reference to. Make sure try v0.1.3 or above (by the time of reading this).

After this you just need to create an HttpClient as above and add the CachingHandler as a delegating handler. That's it, you are ready to call services and cache the responses!

Sample

I am working on a sample project but for now, it is easiest to use the code below to call my CarManager Azure website which implements HTTP Caching. The code can be pasted from this GitHub gist.

CacheCow.Client adds a special header to the response which helps with debugging its various features. The header's name is x-cachecow and has a various flags on the operations done on the request/response. So in the code below, we will use this header to demonstrate the features of this library.

var client = new HttpClient(new CachingHandler()
                    {
                        InnerHandler = new HttpClientHandler()
                    }
 );
var initialResponse = client.GetAsync(
      "http://carmanager.azurewebsites.net/api/Car/5").Result;
var initialResponseHeader = initialResponse.Headers.Single(
       x => x.Key == CacheCowHeader.Name).Value.First();
Console.WriteLine(initialResponse.Headers.ETag.Tag);
Console.WriteLine(initialResponseHeader);

And we will see this to be printed:
"02e677a7799e484fb49447f8a600247d"
0.1.3.0;did-not-exist=true
As you can probably figure out, we have the ETag and the CacheCowHeader: first value is the version and did-not-exist means that item did not exist in the cache - which is understandable as this is the first call.

Now let's try this again:

var secondResponse = client.GetAsync("http://carmanager.azurewebsites.net/api/Car/5").Result;
var secondResponseHeader = secondResponse.Headers.Single(
      x => x.Key == CacheCowHeader.Name).Value.First();
Console.WriteLine(secondResponseHeader);

And what will print is:
0.1.3.0;did-not-exist=false;cache-validation-applied=true;retrieved-from-cache=true
So in fact, it existed in the cache, retrieved from the cache and cache validation was applied. Cache validation is the process by which client makes conditional call to retrieve/update a resource only if the condition is met (see Background section in this post). For example, in GET calls it will send the ETag with a if-none-match header to retrieve

If you call a PUT on a resource that is cached, CacheCow.Client will use its ETag or Expires value to make a conditional PUT, unless you set UseConditionalPut property to false.

By-passing caching

There are some cases where you might not want the result be cached or retrieved from the cache regardless of the caching logic. All you have to do is to set the CacheControl header to no-cache or no-store:

var nocacheRequest = new HttpRequestMessage(HttpMethod.Get, 
  "http://carmanager.azurewebsites.net/api/Car/5");
nocacheRequest.Headers.CacheControl = new CacheControlHeaderValue()
 {
  NoCache = true
 };
var nocacheResponse = client.SendAsync(nocacheRequest).Result;
var nocacheResponseHeader = nocacheResponse.Headers.FirstOrDefault(
 x => x.Key == CacheCowHeader.Name);
Console.WriteLine(nocacheResponseHeader);

This will print an empty header since we have by passed the caching.

Last but not least

Thanks for trying out and using CacheCow. Please send me your feedbacks and bugs. Just ping me on twitter or use GitHub's issue tracker.

Sunday, 5 August 2012

Hierarchical routing for ASP.NET Web API: RESTful resource organisation


Introduction and motivation

[Level T3] A question keeps popping up on various forums (StackOverflow, ASP.NET Forums, etc) on how to define routing in ASP.NET Web API. And more important is that there does not seem to be any consensus on how to approach this - well probably since ASP.NET Web API is still in preview (RC).

In this post, I want to have a fresh look at routing in ASP.NET Web API and present a hierarchical model for organising resources.

Background

Resources are important part of RESTful architectual style. In REST all server operations (API) are defined as interactions with resources - in HTTP terms it would mean Verb interactions with URLs. This is in contrast with RPC style where server operations are defined as method calls. 

ASP.NET Web API's routing mirrors ASP.NET MVC routing - similar to some other aspects of Web API which uses ASP.NET MVC as the baseline since it is a familiar model for developers.

Routing was first introduced to ASP.NET by MVC. Later on routing code was integrated into system.web.dll

Routing in ASP.NET MVC is very flat since all routes are defined from the root. This was one of the reasons MVC Areas were introduced to add another layer to top root routes. This can work for MVC but RESTful resource organisation usually requires more nested structure while using conventionally routing can result in configuration burden as well performance penalty.

How routing works

MVC (and Web API) routes are generally designed for a handful (and in most cases less than 100) routes. You were expected to use the default route for most cases and add a few more routes for occasional cases where the pattern was different. This in fact worked in most MVC applications but RESTful resource design requires richer and a more nested structure as we will see below.

Route definition in ASP.NET Web API - as you all have probably used and know - is based on adding route to the RouteCollection:

var routes = GlobalConfiguration.Configuration.Routes; // in web hosted environment
routes.MapHttpRoute(
 name: "DefaultApi",
 routeTemplate: "api/{controller}/{id}",
 defaults: new { id = RouteParameter.Optional }
);

Each mapping will add a new HttpRoute object to the collection. HttpRoute itself is an implementation of IHttpRoute:

public interface IHttpRoute
{
    string RouteTemplate { get; }

    IDictionary<string, object> Defaults { get; }

    IDictionary<string, object> Constraints { get; }

    IDictionary<string, object> DataTokens { get; }

    HttpMessageHandler Handler { get; }

    IHttpRouteData GetRouteData(string virtualPathRoot, HttpRequestMessage request);

    IHttpVirtualPathData GetVirtualPath(HttpRequestMessage request, IDictionary<string, object> values);
}

Most important method is GetRouteData where the matching takes place. Basically if an implementation of IHttpRoute returns a non-null IHttpRouteData then route has matched.

Now how the matching work? Well, RouteCollection will loop through all the routes and calls GetRouteData and the first one to return a non-null will be the matched route for a given URL:

// snippet from HttpRouteCollection
foreach (IHttpRoute route in _collection)
{
    IHttpRouteData routeData = route.GetRouteData(_virtualPathRoot, request);
    if (routeData != null)
    {
        return routeData;
    }
}

Did you notice something fundamental above? If you have 1000 routes and your given URL matches the 1000th route, GetRouteData (which involves complex and heavy string processing if you look at the Web API source code) has to be called for the first 999 routes and fail until it reaches your matched route. With 100, this probably not an issue but with numbers going up, performance will take a hit.

RESTful organisation of resources

This StackOverflow question is one of many questions on the Web API routing according to REST style. One of the main challenges is that, we as Microsoft developers, have used to design our API in an RPC fashion. This habit has been ingrained in our psyche since remoting days and then Web Services and more recently WCF. So it is only natural to fall into the same habit when designing or REST API.

Martin Fowler on his article Steps towards the glory of REST talks about 3 levels of REST implementation. We will be talking about the level 1 and briefly about 2. Level 3 which is the most noble constraint of REST focuses on hypermedia which is beyond the topic of our discussion.

So at the level 1, we design the server to expose its API as resources. In case of HTTP and URLs, this will look like the directory/file structure on the disk - and as such hierarchical. If we mix REST with DDD concepts, each aggregate root of the publicly exposed domain (which is called Server domain, see the post on Client-Server) will be exposed at the root (considering API root is /api/):

/api/{AggregateRoot}/{id}

An example for cars would be /api/Car/1243. As such, we would have a CarController that receives the id through the URL. Now all of this is easily achievable using the default route very much in the good old MVC fashion.

However, the picture gets more complicated when we want to expose the tyres of the car. Let's say a car has Front/Rear Left/Right tyres as such FL, FR, RL and RR. One approach would be to expose the tyres as the aggregate root and have an ID for each tyre:

/api/Tyre/1234567

Well, this will work but tyre is really not a aggregate root since it really would only have a meaning as part of the car. So ideally we should expose them only as part of a car:

/api/Car/1243/Tyre/FR

So we would need a TyreController to handle the scenario and in this case we need a route similar to below:

/api/Car/{carId}/{controller}/{id}

Now this can be written in the generic form of:

/api/{parent}/{parentId}/{controller}/{id}

But as you can see in this case we need to define carId as parentId. Also not all of the domain has similar constraints for ID so this approach will impose a heavy limitation on our routes. On the other hand, we might have more than one nested levels where this can be even more complexed.

In addition to entity levels, we also have operations. Operations also can be defined as a resource, for example in case of a puncture:

POST /api/Car/1243/Tyre/FR/Puncture

This will create a puncture in the tyre. When we say create, we do not mean that necessarily a physical record needs to be created against the tyre. In fact mapping between REST resources and domain models is usually loose. The point is every operation needs to exposed as a resource and all operations to be performed using HTTP verbs. For example, repairing puncture can be represented as:

DELETE /api/Car/1243/Tyre/FR/Puncture

So the operations will complicate the routing even more. If the domain is big and complex, we could end up with many routes. As such, the performance will be hampered and we end up with the burden of defining all these routes at the root level.


So what is the solution? Solution is to turn Web API's flat routing into a hierarchical one. 







Tuesday, 31 July 2012

Using range header for retrieving range of IEnumerable<T> in ASP.NET Web API


Introduction

[Level T3] In this post, we talk about using HTTP's Range header to achieve requesting data ranges for entities.

Background

HTTP spec defines a series headers that can be used for a client to request for a partial content. These operations are optional (in most cases spec uses word SHOULD) but most servers implement them and browsers have increasingly been using them. If you have ever resumed downloading a big file from internet, then you have used this feature (in fact all browsers use range if supported by server). In this case, client keeps requesting chunks and builds up the file until it is fully downloaded.

So here is how it works in a nutshell:

  1. Server can optionally informs clients while serving a resource that it supports partial content. It does that by sending Accept-Range header with a value of the unit it supports, normally bytes. In our case, our server sends back a custom unit that we call x-entity.
  2. Client, either informed by the server on partial content feature based on Accept-Range header or just simply tries its luck, sends a request with Range header with value of [unit]=[from]-[to] for example bytes=1024-2047. In this example, client asks for the second KB of the file. Range header can specify multiple ranges for example bytes=500-600,601-999
  3. Server will return the range requested and include a Content-Range header with value [units] [from]-[to]/[TotalCount]. For example bytes 1024-2047/12345678. It also returns status code 206 (partial content) to inform the client that the content is partial. If server does not support the range specified, it will send back status code 416.
Spec does consider using custom units so server can implement its own custom units and inform the client of the unit using Accept-Range header. Now the idea is that in ASP.NET Web API, we normally build many actions that return IEnumerable<T>. What if we could use the range to specify range of the enumerable to be returned? Hmmm....

This feature can be useful in case of pagination on the client so that instead of API implementing a range parameter all the time, we just use HTTP's built-in features and encapsulate the implementation in a reusable component, in this case a filter.

So in the code to follow, we define a custom range unit and call it x-entity. "x-" prefix is a common naming convention on the web to specify custom tokens that are not part of the canonical tokens defined in RFC specs.

Implementing range in ASP.NET Web API

So where is the best place to implement this? We have these requirements:

  • Access to request headers to read Range header
  • Access to response header to set Accept-Range
  • Access to content headers to set Content-Range header
  • Access to content so that it can filter IEnumerable<T>
DelegatingHandler might look promising but by the time it accesses the content, it is already turned into stream by MediaTypeFormatters.

MediaTypeFormatter is an interesting option. I actually created a RangeMediaTypeFormatterWrapper that would wrap the MTFs and intercept the content and if it was of type IEnumerable<T>, it would apply the filtering. Initially it seems MTF does not have access to request headers but in here we had an interesting discussion and it turns out it can access request using GetPerRequestFormatterInstance. But it also needs access to response headers.


So Glenn Block suggested filters and after some thoughts, it seems to be the right approach considering current limitations of MTF. The only drawback is that it has to be explicitly defined on the action - which can in some cases be a blessing in fact. In any case, filter approach as you will see is clean and does everything in the same place.

Filters in ASP.NET Web API is not much different from MVC. You get two methods: before (OnActionExecuting) and after (OnActionExecuted) the action where you can change values in request, response, action arguments or simply examine values.

Using the code

You can get the source code from GitHub. As you can see, we have a single controller called CarController. Project is running on port 50714 on my machine so all examples will include this port - yours could be different. So download the project, build and run it. I created a client project to implement all steps below using HttpClient but there is an issue with ASP.NET Web API implementation of the Range header that regardless of the unit set in the range header, always bytes is sent to the server.

As you can see, we have a simple action with EnableRange filter defined on it:

[EnableRange]
public IEnumerable<Car> Get()
{
 return CarRepository.Instance.Get();
}

So now we use fiddler (or similar tool capable of sending HTTP requests, such as Google's Postman) to send this request:

GET http://localhost:50714/api/Car HTTP/1.1
User-Agent: Fiddler
Host: localhost:50714

We will get all the cars in our repository in JSON format. But note the Accept-Range header with value of  x-entity:

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/json; charset=utf-8
Expires: -1
Accept-Ranges: x-entity
Server: Microsoft-IIS/8.0
Date: Tue, 31 Jul 2012 18:59:56 GMT
Content-Length: 1125

[{"Id":1,"Make":"Vauxhall","Model":"Astra","BuildYear":1997,...

So this should tell the client that it can use Range header. Now let's send a range header requesting 3rd item to 6th item (total of 4 items):

GET http://localhost:50714/api/Car HTTP/1.1
User-Agent: Fiddler
Host: localhost:50714
Range: x-entity=2-5

And here is the response:

HTTP/1.1 206 Partial Content
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/json; charset=utf-8
Content-Range: x-entity 2-5/10
Expires: -1
Server: Microsoft-IIS/8.0
Date: Tue, 31 Jul 2012 19:00:19 GMT
Content-Length: 447

[{"Id":3,"Make":"Toyota","Model":"Yaris","BuildYear":2003,"Price":3750.0,...

Note the Content-Range header above and also the fact that we got the entities we requested in JSON (not shown fully above). So it tells us that it has sent back items from index 2 to index 5 and total of items is 10. Also note the 206 response.

Server can send back * if number of items is not known at the time of serving the request. I have used this option since I do not want to run a Count() on an IEnumerable<T>. It is very likely that the data is being retrieved from database and we do not want to load the whole table into memory. So my approach is to try cast the value into ICollection using as keyword. If it case OK then I get the count, otherwise I set the count to *.

Another option in the spec is that to in the range is optional so the client can send a range header with value 2-*. In this case, we must skip the first 2 items and return the rest:

GET http://localhost:50714/api/Car HTTP/1.1
User-Agent: Fiddler
Host: localhost:50714
Range: x-entity=2-

In this case, our server returns this response:

HTTP/1.1 206 Partial Content
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/json; charset=utf-8
Content-Range: x-entity 2-9/10
Expires: -1
Server: Microsoft-IIS/8.0
Date: Tue, 31 Jul 2012 21:18:05 GMT
Content-Length: 897

[{"Id":3,"Make":"Toyota","Model":"Yaris","BuildYear":2003,"Price":3750.0, ....

Notes on implementation

The crux if the implementation is to call Skip() and Take() on IEnumerable<T>. Our code has to be able to work with all types hence cannot be generic. On the other hand, filters (and attributes as a whole) cannot use generics. As such we just have to use reflection to do this:

[EnableRange]
var skipMethod = t.GetMethods().Where(m => m.Name == "Skip" && m.GetParameters().Count() == 2)
 .First().MakeGenericMethod(_elementType);
var takeMethod = t.GetMethods().Where(m => m.Name == "Take" && m.GetParameters().Count() == 2)
 .First().MakeGenericMethod(_elementType);
...
value = skipMethod.Invoke(null, new object[] { value,  from});
if(to.HasValue)
 value = takeMethod.Invoke(null, new object[] { value, to - from + 1 });

Also it is useful to note that the return value is not accessible in the filter and we have to resort to using Content and casting it to ObjectContent and use the Value property.

Conclusion

Range header defined in HTTP spec is useful in retrieving partial content. We can use custom units and we defined x-entity unit to enable selecting a range of entities (commonly used in pagination scenarios) and implemented it using a filter.

Tuesday, 10 July 2012

What does REST's Client-Server mean now?

Introduction

[Level C4] In "What does coupling mean ...", we reviewed three client-server patterns (with their anti-patterns) based on the concern. In this post, we will carefully examine the new meaning of client-server web applications. This will serve as a primer on my new work to define Client Server Domain Separation (CSDS).

Motivation

As I explained in the coupling post, I was challenged by recent emergence of attempts to define client-server relationship, one of which being ROCA. There seems to be a recent trend in the REST-aware community by disgruntled developers who believe too much control has been shifted to the client-side. As such they are trying to come up with practices/styles to move some of the control back to the server.

Background

REST has a strong emphasis on separation of client and server. Fielding in his PhD dissertation outlines the constraint:
Separation of concerns is the principle behind the client-server constraints. By separating the user interface concerns from the data storage concerns, we improve the portability of the user interface across multiple platforms and improve scalability by simplifying the server components
As we saw in the coupling post, we need to understand if a functionality is client-concern, server-concern or mixed concern. Implementing the functionality in the wrong side of wire can be initially tolerated but finally takes its toll. Below we will try defining building blocks of our discussion.

When we talk about the domain below, we refer to the concerns implemented/expose in the client or server.

Server

Server is responsible for defining a domain (server domain) and maintaining its state and consistency/integrity. Server usually has very complex components yet it hides its complexity behind its services.

Server
Figure 1 - Server exposes a public domain hiding its complexity

Service is composed of API and domain objects. In a RESTful world, API is HTTP REST. Domain objects are representative of the server's domain model. While Server can have a complex domain, we refer to server domain as only the publicly available domain.

Server should not have a knowledge of the client type. Although such information can be shared with the server (for example through User-Agent header), it should not be used other than for auditing or statistics.

Domain objects sent to the client are usually looked down upon and treated as second class citizen. They are  sometimes called DTO (Data Transfer Object) or ViewModel. While this is OK in a server development scenario when the focus is to decide how much of the server's whole domain to be exposed, these models are not to be mistaken with Value Objects since they are entities (they have identity according to DDD).

Domain object can be a fully rendered HTML (markup) in its semantic form (i.e. no display semantics such as <b> or <i>). Documents are domain objects for example a blog domain has a post domain object.

Server itself can be a client of one or several servers. ifttt is a beautiful example of it. 

Client

Client is responsible for using server(s) services to provide value to the user. In the process, it defines a domain which is usually different from server domain, although there is always an overlap. Client has a life of its own, able to maintain some level of functionality with no server access.


Figure 2 - Client and server domains - now and before
Please note that we did not mention user on the definition of server. Server is being abstracted away from the user by the client. Client can still provide some of its values to the user even when server down. Funny enough, I lost connectivity for half an hour while I was typing this blog and I carried on typing. When re-connected, Blogger saved my worked.

So let's bring an example to highlight a few important points:

I love listening to online radio while working. I use TuneIn on my android to listen to music based on my mood. TuneIn is a great app that allows you to search online radios and podcasts. So one of the functionalities is the directory service of radio stations. This functionality is provided by TuneIn servers (that maintain the state and its consistency/integrity). It defines a server model that consists of name, URL, style, icon, etc. Client does not know how the directory is created or how often the information gets updated.
On the other hand, when I click on a station to listen, I connect to the radio server. Domain of the server has music streaming, current artist, etc. For the radio server, it is all the same if you listen to music in your browser or in the native client on your phone. Client does not know how music is stored, chosen, etc.
Now if I really like a song, I can share it in twitter. Twitter server, while for security reasons (OAuth) knows the application I am using, it does not really care. Publishing my tweet is all the same for it.

So as you can see, Twitter's server domain is fully concerned with users, their tweets, re-tweeting, etc while TuneIn client only cares about publishing a tweet, so their domains have a tiny (yet important) overlap.

Some the client functionality has nothing to do with server. Playing music on the device is fully a client concern so does not exist on the server domain. For example, online radio's streaming servers would not know if the data they are sending will be even heard by the user (e.g. speaker could be on mute or worse, the client uses the data for illegal dumping of the songs).

Let's have a look at Figure 2. I think our TuneIn example fully described the "Now" diagram so let's focus on the "classic" case. This is the classic client (a web application running on the browser - see below) where all the logic served by the server. In extreme cases, server even generates client scripts apart from hosting the static logic (Javascript). Client's domain is fully engulfed by the server meaning server is aware of all the client logic.

Does the classic client look to you as the REST Eutopia? Having read Fielding's dissertation snippet above, which one do you think represent REST better?

Web Application

What is a web application? This definition has been drastically changed over the last few years with the emergence of diverse client devices capable of running advanced Javascript or native code. Web applications can be found in the forms below (not particularly ordered and probably some missing):
  • Single Page Application (SPA) running on various devices including top-end mobile devices
  • Native rich clients on desktop/laptop
  • Native client apps on mobile devices
  • Bundled HTML/Javascript apps running on mobile devices (PhoneGap) or desktop (Windows 8)
  • Traditional web applications run by browsers
  • Browser as an application to display markup
Figure 3 - Server cannot really see behind the cloud and which client is using it


Web application is the usage abstraction of the client. While the server domain used to pretty much define the web application, client is becoming more and more important in defining it.

Now what does web do in the "web application"? Nowadays, almost every application is a web application: the client uses one or more cloud services to enrich the experience. Unless it is a simple drawing or editing tool, most applications are web applications.

Some of the forms deserve more attention. Bundled HTML/Javascript apps remind us that in some cases it is in fact incidental that the server hosts the files. Some Javascript files are hosted on CDN and downloaded and cached for a long time.

Also not all logic is Javascript. Microsoft RIA service (regardless of whether I like it or not - and I don't!) sends server's domain rules to the client very much like Javascript. Generating Javascript or binaries is all bad the same as it breaches client's independence.

Web Application and REST

Basing your web application on REST will help achieving better separation as well as making an efficient use of the web.

Having said that, today's world demands more and more from the computing industry. Server Push model (gaining popularity in node.js/websocket world) requires a stateful server which is a REST no-no but today's virtualisation and cloud elasticity has made scalability a much smaller problem.

For most web applications, however, following all REST constraints is the best practice as very few applications require Server Push model.

So what do I think of the new trend?

Well, I think the trend cannot resist the wind of change. Computing industry has been pushed to provide more value and has to do it cheaper, faster and richer. Separating client and server will help to achieve this more effectively:

With regard to ROCA I must say it is a worthwhile effort to understand the client and server interactions and contains useful common-sense practices. However I cannot subscribe to it since:

  • must-server advocates "classic" model (engulfed client) and ignores the client server separation prescribed by REST. 
  • It does not respect client domain (must-no-duplication) and it rules and logic.
  • Single-Page-Application is not ROCA-compliant (see discussions).
  • Not all clients are browsers, in fact less and less clients are browsers. ROCA is heavily targeted for browser applications with many of the constraints directly related to HTML, CSS or Javascript
  • It does not fully appreciate the inherent complexity of the client domain (must-jslimits and mustnot-jsengine)
  • A ROCA client will not be able to provide any useful offline feature
  • Includes lower-end non-browser clients yet does not appreciate upper-end clients (must-non-browser)

Conclusion

Client and server have their own domain - as REST prescribes. Server defines a domain and maintains its state and integrity. It hides its complexity behind its services while other than authentication and authorization does not need to know anything about the client.

Client through the usage of server(s) services provides value to the user. It defines its own domain that has overlap with server(s) domain.

Web application is a plethora of different devices and technologies. As such, defining a style requires considering all such scenarios.

We will talk more about Client-Server Domain Separation (CSDS) in the upcoming posts.

Monday, 25 June 2012

Introducing PocoHttp: Consuming HTTP data services


Introduction

[Level T2PocoHttp is a non-opinionated Open Source .NET library for seamless consumption of HTTP data services using a familiar IQueryable<T> interface. Version 0.1 of the library is now available on GitHub - NuGet package to follow soon.

This post explains the background and motivation for this library as well as its usage and possible future directions.

Motivation and background

ASP.NET Web API exposes full richness of the HTTP spec. As such, many new avenues have been opened for creating HTTP client-server applications. Having said that, with HTTP spec aimed to be Touring-Complete, there is a client burden involved in implementing clients capable of deep protocol coherency.

ASP.NET Web API has made it easy to achieve protocol coherency as part of the HTTP pipeline - modelled as Russian Dolls (see Part 6 of the series).

With a lot of WCF services (that commonly were serving domain's value objects) soon to be moved to Web API, there is a requirement for abstracting HTTP-level aspects for seamless consumption of such services. PocoHttp is designed to provide an IQueryable<T> interface which is familiar to developers who have been working with ORMs such as Entity Framework, NHibernate, etc.

WCF Data Services are able to provide this seamless communication but they
  • Can only generate AtomPub payloads
  • As such no content negotiation or ability to generate plain XML or JSON (it does honour Accept header but returns OData specific format)
  • Use OData query syntax which is deemed by many in the community not RESTful since it assumes non-HTTP syntax coherence in the form of query string parameters on the client
  • Fully expose the data to the outside world
Hence there is a need for a .NET client library to take advantage of new HTTP features exposed in System.Net.Http to be able to consume HTTP data services whether developed in ASP.NET Web API or in any other platform. PocoHttp rises up to that challenge.

Minimal example

Before going much further into details, it might be useful to present a minimal usage of how to use it. Example below is from the PocoHttp samples (which hosts a minimal server too) available from GitHub.

var pocoClient = new PocoClient()
        {
            BaseAddress = new Uri("http://localhost:12889/api/")
        };
var list = pocoClient.Context<Car>() // calling "/api/Cars"
 .Take(1).ToList(); // getting the first item
Console.WriteLine(list[0]);

As can be seen above, pocoClient is initialised with a BaseAddress (Note the trailing slash which is mandatory for correct URI composition) and then a generic context of Car type is queried. PocoClient assumes (using the naming convention setup) that the Car should be exposed at BaseAddress + "Cars" so it makes an HTTP request and turns the result into Car objects.

PocoHttp's Grammar model

Basically PocoHttp translates IQueryable<T> queries into HTTP semantics (query string parameter in case of GET or a query criteria object in the body in case of POST) using a grammar and a set of naming convention settings - and then turns the result into .NET types using media type formatting provided by the ASP.NET Web API.



Currently, two built-in grammars are provided: OData and Pagination. OData implementation does not implement full features of OData (see below). Grammar can be implemented to translate the queries into other providers such as MongoDB or other custom syntaxes.

Normally, a grammar modifies the request by adding query string parameters. While it is not recommended, a grammar can modify the request to turn it into an HTTP POST and send a payload containing the criteria. The reason this is not recommended is that using POST method for GETting resources is not REST-friendly and responses to POST results cannot be cached while caching results of calls to data services can be very useful.

OData implementation

Current implementation of OData grammar supports below features:
  • Where: current implementation supports direct property expression. Complex property expressions (such as Car.Make.StartsWith("M")) not yet supported
  • AND, OR, greater than, lesser than, lesser than or equal, greater than or equal, not equal in WHERE expressions
  • Take
  • Skip
  • OrderBy
  • OrderByDescending 

Exposing an HTTP service in ASP.NET Web API (so that it can be consumed in PocoHttp) is simple: you just need to return IQueryable<T> from your action and decorate the action with [Queryable] attribute.

Having said that, while this feature is currently available in ASP.NET Web API RC, the future of OData support on Web API is a bit unclear as the Queryable attribute has been removed in ASP.NET Web API source code and it might not ship with RTM. As far as we can find out from ASP.NET team, OData support will be implemented using OData libraries and might ship out of band after the RTM. So the team is committed to full OData support but the timeline is unclear. As for now, you can happily use Queryable attribute in ASP.NET Web API RC.

As for PocoHttp, I am committed to implement full OData query syntax in the future releases.

Using PocoHttp against existing AtomPub OData services

Nothing prevents us to consume classic OData services that return AtomPub by just adding AtomPub media type formatter to the formatters. This will allow you to take advantage of all the niceties of HttpClient while using your existing OData services.

I will be having a post on this soon.

Disclaimer

As I initially said, this is a non-opinionated framework. You can expose your full domain model through an IQueryable<T> interface out to the client but this is not necessarily a good thing. Exposing bowels of your server and data to the outside world is an anti-pattern and could be a costly mistake. For example you can easily expose your database to the outside world allowing this query:

pocoClient.Context<Car>()
 .Where(car => car.Make.EndsWith("L"))

turn into this SQL statement that can bring down the server by having to look into each record:
SELECT * FROM CAR WHERE MAKE LIKE '%L'

Also free form Linq queries enable the client to select data using criteria which involves columns that do not have indices upon.

Pagination grammar

Pagination grammar is a simple non-OData syntax which is REST-friendlier. An example of the syntax is /api/Cars?skip=100&count=20.

Pagination vocabulary only supports 4 constructs:

  1. Skip: normally represented by skip
  2. Take: normally represented by count
  3. OrderBy: normally represented by order
  4. OrderByDescending: normally represented by orderDesc

All above representations can be changed by the client. A typical server implementation supporting these parameters will be

public IEnumerable<Car> Get(int skip, int count)
{
    return _repository.Skip(skip).Take(count);
}
As it can be seen, server implementation only supports skip and count and ignores the OrderBy and OrderByDescending.

Overriding naming convention and routing

There are times that you might want to override default route:
pocoClient.Context<Person>()
 .Where(p => p.Name == "Ali"); // calls /api/Persons?$filter=(Name eq 'Ali')
while you might want it to go to /api/Employees.

You have to ways to achieve this:

  1. To decorate your entity with EntityUriAttribute. In this case [EntityUri("Employees")]
  2. Pass the full URI as below when creating context:

pocoClient.Context<Car>("http://server/api/Employees")
 .Where(car => car.Make.EndsWith("L"))

Conclusion

In this post, we have introduced PocoHttp which is an emerging open source client library for consuming HTTP data services.

PocoHttp can be used to consume ASP.NET Web API services where IQueryable<T> is returned, existing OData services or any other HTTP data service built in other platforms - provided the grammar is implemented for the query syntax.

Using the CachingHandler in ASP.NET Web API


Introduction


[NOTE: Please also see updated post and the new framework called CacheCow
This class has been removed from WebApiContrib code and NO LONGER SUPPORTED]


[Level T3] Caching is an important concept in HTTP and comprises a sizeable chunk of the spec. ASP.NET Web API exposes full goodness of the HTTP spec and caching can be implemented as a message handler explained in Part 6 and 7. I have implemented a CachingHandler and contributed the code to the WebApiContrib in GitHub.

This post will have two sections: first a primer on HTTP caching and then how to use the handler. This code uses ASP.NET Web API RC and with .NET 4 (VS 2010 or 2012).

Background

NOTE: This topic is fairly advanced and complex. You do not necessarily need to know all this in order to use CachingHandler and are welcomed to skip it but more in-depth knowledge of HTTP could go a long way. 

Caching is a very important feature in the HTTP. RFC 2616 extensively covers this topic in section 13. Review of all the spec is beyond the scope of this post but we will briefly touch on the subject.

First of all let's get this straight: this is not about putting an object in memory as we do with HttpRuntime.Cache. In fact server does not store anything (more on this below), it only tells the client (or mid-stream cache or proxy servers) what can be cached, how long and validates the caching.

Basically, HTTP provides semantics and mechanism for the origin server, client and mid-stream proxy/cache servers to effectively reduce traffic by validating the version of the resource they have against the server and retrieve the resource only if it has changed. This process is usually referred to as cache validation.

In HTTP 1.0, server would return a LastModified header with the resource. A client/user agent could use this value and send it in the If-Modified-Since header in the subsequent GET requests. If the resource was not changed, server would respond with a 304 (Not Modified) otherwise the resource would be sent back with a new LastModified header. This would be also useful in PUT scenarios where a client sends a PUT request to update a resource only if it has not changed: a If-Unmodified-Since header is sent. If the resource has not changed, server fulfils the request and sends a 2xx response (usually 202 Accepted) otherwise a 412 (Precondition Failed) is sent.

For many reasons (including the fact that HTTP dates do not have milliseconds) it was felt that this mechanism was not adequate. In HTTP 1.1, ETag (Entity Tag) was introduced which is an opaque Id in the format of a quoted string that is returned with the resource. ETags can be strong (refer to RFC 2616 for more info) or weak in which case they start with w/ such as w/"12345". ETag works like a version ID for a resource - if two ETags (for the same resource) are equal, it means those two versions of the resource are the same.

ETag for the same resource could be different according to various headers. For example a client can send an Accept-Language header of de-DE or en-GB and server would send different ETags. Server can define this variability for each resource by using Vary header.

Cache validation for GET and PUT requests are similar but instead will use If-Match (for PUT) and If-None-Match (for GET).

Well... complex? Yeah, pretty much and yet I have not covered some other aspects and the edge cases. But don't worry! You will be abstracted from a lot of it if you use the CachingHandler.

Using CachingHandler right out of the box

OK, using the CachingHandler is straightforward especially if you go with the default settings. You can have a look at the CarManager sample in the WebApiContrib project. CarManager.Web sample demonstrate server side setup of the caching and CarManager.CachingClient for making calls to the server and testing various caching scenarios.

All you have to do is to create an ASP.NET Web API project and add the CachingHandler as a delegating handler as we learnt in Part 6:

GlobalConfiguration.Configuration.MessageHandlers.Add(cachingHandler);

And you are done! Now let's try the server with some scenarios. I would suggest that you use the CarManager sample, otherwise create a simple handler and implement GET, PUT and POST on it.

Now let's make some request and look at the response. I would suggest using Fiddler or Chrome's Postman to send requests and view the response.

So if we send a GET request:

GET http://localhost:8031/api/Cars HTTP/1.1
User-Agent: Fiddler
Host: localhost:8031

We get back this response (or similar; some headers removed and body truncated for clarity):

HTTP/1.1 200 OK
ETag: "54e9a75f2dbb4edca672f7a2c4a73dca"
Vary: Accept
Cache-Control: no-transform, must-revalidate, max-age=604800, private
Last-Modified: Thu, 21 Jun 2012 23:35:46 GMT
Content-Type: application/json; charset=utf-8

[{"Id":1,"Make":"Vauxhall","Model":"Astra","BuildYear":1997,"Price":175.0....

So we see the ETag header here along with important caching headers. Now if we make another GET call, we get back the same response but ETag and last modified stays the same.

Generating same ETag is fine (showing our CachingHandler is doing something) but we have not yet seen any caching. That is where client has to do some work to do. It has to use If-None-Match with the ETag to conditionally ask for the resource: if it matches server, it will get back 304 but if not, server will return the new resource:
GET http://localhost:8031/api/Cars HTTP/1.1
User-Agent: Fiddler
Host: localhost:8031
If-None-Match: "54e9a75f2dbb4edca672f7a2c4a73dca"
Here we get back 304 (Not modified) as expected:
HTTP/1.1 304 Not Modified
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
ETag: "54e9a75f2dbb4edca672f7a2c4a73dca"
Server: Microsoft-IIS/8.0
Date: Sun, 24 Jun 2012 07:34:29 GMT

A typical server that can use CachingHandler

CachingHandler makes a few RESTful assumptions about the server for effective caching. Some of these assumptions can be overridden but generally not recommended. These assumptions are:

  • HTTP verbs (POST/GET/PUT/DELETE for CRUD operations) are used to modify resources - and not RPC-style resources (such as POST /api/Cars/Add). This is the most fundamental assumption.
  • All resources are to be modified through the same HTTP pipeline that implements caching. If a resource modified outside the pipeline, cache state needs to be updated by the same process.
  • Resource are organised in a natural cache-friendly manner: invalidation of related resources can be done with a minimal setup (more details upcoming).

Defining cache state

As we said, this has nothing to do with HttpRuntime.Cache! Unfortunately ASP.NET implementation makes it really confusing between the HTTP cache (where the resource gets cached on the client or midstream cache servers) and server caching (when the rendered output gets cached on the server).

Cache state is a collection of data that helps keep track of each resource along with its last modified date and ETag. It might initially seem that for a resource, there exists a single of such pieces of information. But as we touched upon above, a resource can have different representations each of which needs to be stored separately on the client while they will most likely invalidated together. 

For example, resource /api/disclaimer can exist in multiple languages as such client has to cache each representation separately but when disclaimer changes, all such representations need to be invalidated. That will require a storage of some sort to keep track of all this data. Current implementation comes with an in-memory storage but in a web farm scenario this needs to be a persistent store.

Cache state storage and cache invalidation

So we do not need to store the cache on the server, but we DO need to store ETag and various states on the server. If we only have a single server, this state can be stored in the memory. In a web farm (or even web garden) scenario a persisted store is needed. This store is called Entity Tag Store and is represented by a simple interface:

public interface IEntityTagStore
{
 bool TryGetValue(EntityTagKey key, out TimedEntityTagHeaderValue eTag);
 void AddOrUpdate(EntityTagKey key, TimedEntityTagHeaderValue eTag);
 bool TryRemove(EntityTagKey key);
 int RemoveAllByRoutePattern(string routePattern);
 void Clear();
}

We need to have an implementation of this interface so that the cache management can be abstracted away from controllers and instead done in the DelegatingHandlers.

Introducing some concepts (you may skip and come back to it later)

This store will keep track of the Etag (and related state stored in TimedEntityTagHeaderValue) based on an Entity Tag key which is calculated based on the resource URI and content of the important headers (which their list will be on the Vary header). In-Memory implementation for a single server is provided out of the box with CachingHandler - I would be creating a SQL-Server implementation of it very soon; watch this space.

It is important to note that change in the resource will most likely invalidate all forms of the resource so all permutations of important headers will be invalidated. So invalidation is usually performed at the resource level. Sometimes several related resources can be represented as a RoutePattern. By default, URI of a resource is its RoutePattern.

Also in some cases, change in a resource will invalidate linked resources. For example, a POST to /api/cars to add a car will invalidate /api/cars/fastest and /api/cars/mostExpensive. In this case, "/api/cars/*" can be defined as the linked RoutePattern (since /api/cars does not qualify for /api/cars/*).

Some assumptions in CachingHandler

  1. Re-emphasising that resource can only change through the HTTP API (using PUT, POST and DELETE verbs). If resources are to be changed outside the API, it is responsibility of the application to use IEntityTageStore to invalidate the cache for those resources.
  2. If no Vary header is defined by the application, CachingHandler creates weak ETags.
  3. Change in the resource will invalidates all forms of the resources (all permutations of important header values)

CarManager sample

I have considered a pretty complex and interrelated routing and caching requirement for the sample to display what is possible. Resources available are:

  1. /api/Car/{id}: GET, PUT and DELETE
  2. /api/Cars: GET and POST
  3. /api/Cars/MostExpensive: GET
  4. /api/Cars/Fastest: GET
So creating a car would invalidate cache for 2, 3 and 4. Updating car with id=1 invalidates 2,3 and 4 in addition to the car itself at /api/Car/1. Deleting a car will invalidate 2, 3 and 4.

The client sample (CarManager.CachingClinet) displays how to call the server with headers to validate the cache.

Configuring CachingHandler

Best place to start is the CarManager.Web sample to give an idea on how various setup can be used to configure complex caching requirements. Basically the points below can be used to configure the CachingHandler

Constructor

A list of request header names can be passed that will define the vary headers. If no vary header is passed, system only generates weak ETags. Also optionally an implementation of IEntityTagStore otherwise by default InMemoryEntityTagStore will be used.


EntityTagKeyGenerator

This is an opportunity to provide linked resource for a resource.

 Other customisation points

CachingHandler provides properties in the form of functions with default implementation that can be changed  to override the default behaviour which we will cover in the upcoming post on "Extending CachingHandler".


Conclusion

CachingHandler is a server-side DelegatingHandler which can be used to abstract away caching from individual ApiControllers so that controller logic can be coded without having to worry about caching. 

The  code is hosted on GitHub (currently sitting at my fork waiting to be merged merged!) and comes with both a client and server sample - CarManager.Web and CarManager.CachingClient.