Saturday, 22 September 2012

Take your Web API service consumption up to 11 with CacheCow.Client

ASP.NET Web API is here and a lot of teams have already started building software using it. If you have followed this and other blogs on webapibloggers.com, you probably have seen many various possibilities and avenues this framework brings for designing and building clean and scalable services.

ASP.NET Web API exposes all the goodness of HTTP. Caching is an important feature of HTTP and ASP.NET Web API allows for building services and clients that take advantage of this feature. I, along with a few friends in the Web API community, have been busy building caching extensions for Web API in a project called CacheCow which is hosted on GitHub.

CacheCow has two separate components: server and client. These will be used independently by service providers (server) and their consumers (client).

Server component allows for easy handling of HTTP caching scenarios on server by generating ETag, responding to validation of cache (see earlier posts on this subject, especially this and for full list here), cache invalidation and storage of cache metadata. Storage of cache metadata is possible in various stores, currently in-memory and SQL Server have been implemented and RavenDB and Redis is on the pipeline [UPDATE: RavenDB is implemented and NuGet package is available here]. Since storage has been abstracted away, any storage mechanism can be plugged in without making any server changes.

Client component looks after making cache-aware requests, cache validation and cache storage. Currently in-memory and file-based storage is available but other stores such as Redis, SQL Server, MongoDB and RavenDB are in the pipeline. Since storage has been abstracted away, any storage mechanism can be plugged in without making any client changes. One of important features of storage is total and per-site quota.

It is important to note that while clients can be browsers or native Apps (WPF, Silverlight, iOS, Android, etc), arguably more often than not they will be server components themselves. For example, an ASP.NET web site can call services of an ASP.NET Web API server. Also middleware components could similarly use resources exposed by Web API. As such it is very crucial that cache storage solutions are performant, scalable and configurable.

In this post, I will look into CacheCow.Client a little but more. For more info, you can read previous posts on the topic in this blog.

CacheCow.Client alternatives

The only alternative to CacheCow.Client (that I am aware of) is using WinINET caching. Internet Explorer also uses this so the cache store will be the same. This is basically windows' HTTP request stack which has been exposed in .NET Framework since v 2.0 through WebRequest:

RequestCachePolicy policy = 
        new RequestCachePolicy( RequestCacheLevel.Default);
WebRequest request = WebRequest.Create(uri);
request.CachePolicy = policy;
WebResponse response = request.GetResponse();

As you can see, we can define a cache policy which will be applied to the request and according to the policy, Internet Explorer cache is used. Cache policy has a few possible values that are defined here. Notable values include:

  • CacheOnly: retrieves the request only from cache
  • BypassCache: does not use cache at all and goes straight to the server
  • CacheIfAvailable: retrieves from local or intermediate cache if resource available otherwise retrieve from server
  • Default: Similar to previous but current cache policy takes effect 

This same mechanism is now exposed in HttpClient but basically is built on the top of WebRequest. Henrik fully covers this feature in his blog here.


Basically in order to use WinINET caching with the new Web API stack, you need to create an HttpClient but provide WebRequestHandler as the MessageHandler:

HttpClient client = new HttpClient(new WebRequestHandler()
                           {
                               CachePolicy = new RequestCachePolicy( RequestCacheLevel.Default)
                           });
// this is a sample. It is not advised to use .Result since can lead to deadlock!
var httpResponseMessage = client.GetAsync("http://carmanager.softxnet.co.uk/api/car/3").Result;
var httpResponseMessage2 = client.GetAsync("http://carmanager.softxnet.co.uk/api/car/3").Result;

Using this feature, you can enable caching with little coding on the client.

Why I would choose CacheCow.Client rather than WinINET

Because it goes to 11! As we saw, it is very easy to get started with caching in HttpClient. But as we noted, it is very likely that HttpClient could be used in a server context hence having a reliable and scalable solution is very important in production.

Here are a few advantages of CacheCow.Client over WinINET (or rather disadvantages of WinINET):

1. Caching will be shared with Internet Explorer

In a production scenario, you need an implementation which is predictable and reliable. If someone uses Internet Explorer on the machine, storage area for your application's resources will be taken by just simple browsing. This can lead Internet Explorer to flush application resources in order to store resources for the browsing session. 

2. You have little control over quota

With CacheCow.Client, you can define a global and a per-site quota for storage of resources while such feature is not accessible (although there could be some registry entries for changing these variables) in WinINET caching. Also these variables could be overwritten by installation of a newer version of Internet Explorer.

3. Cache is local to the machine and cannot be shared across servers

In a production scenario, it is desirable to be able to store caches in a central store so network traffic and requests could be limited while with WinINET caching, each server will use its own local cache store.

4. WinINET is file-based

With WinINET, cache is stored in a file location while for a high-throughput production environment, robust caching using solutions such as Redis is required. CacheCow client by abstracting the storage can use any number of storage mechanisms such as Redis, MongoDB, RavenDB, etc.

5. CachePolicy is global for the HttpClient instance

Sometimes you might need to bypass caching. With WinINET, this has to be done with changing policy at the client level which applies across all requests for that HttpClient while CacheCow.Client respects will not use cached resources if you set CacheControl header of the request to no-cache. This basically recommended implementation based on HTTP specification (RFC2616).

6. With WinINET you do not know if request was retrieved from cache

With WinINET, there is no way to tell if response was retrieved from the cache or origin server. CacheCow.Client provides x-cachecow header which provides various information which can be used for debugging and troubleshooting scenarios.

Introducing CacheCow.Client.FileCacheStore

Last week I finished first version of a persistent cache store which is file based. This is available using NuGet and (package name is CacheCow.Client.FileCacheStore) and the code available at GitHub.

Using this persistent store is very easy. After getting the package from NuGet, create an HttpCient while as a delegating handler, pass CachingHandler (covered before here) while setting the store to a new instance of FileStore. While creating a FileStore, you need to specify a folder for storing the cached resources:

var httpClient = new HttpClient(
 new CachingHandler(
 new FileStore("c:\\Cache"))
{
 InnerHandler = new HttpClientHandler()
});

That is all you have to do! Now all your requests will store cacheable resources in a file-based persistent store. 

Currently for quota it uses default values but I am in the process of exposing values so you can configure quota.

CacheCow roadmap

After exposing quota settings, I will be working on CacheCow.Client.RedisCacheStore for a high throughput production level cache storage.

Please keep me posted by your comments, feedback and raising bug/issues on the GitHub page. You are awesome!

Monday, 17 September 2012

Server-side Async: Careful with that Axe, Eugene

[Level T3]

In a previous post, I talked about the dangers lurking in doing server-side async operations in .NET 4.0. As you know, .NET 4.5 provides a much better syntax allowing async/await keywords to take your TPL Task-Soups to a much more readable and organised code. But even so, async will make debugging your application more difficult and bugs could take much longer to be reproduced, isolated and fixed.

Task-Soup

In .NET 4.0, when we add up continuations to create a chained task, we could end up with a few problems:

  1. We could end up with an unobserved exception problem. This is nicely described by Ayende here
  2. Nested lambda expressions could create unexpected problems with closure of variables
  3. The code becomes hard to read.
On the third note, I will just bring an example from my own code in CacheCow. What is it that we are actually returning here?

return response.Then(r =>
{
 if (r.Content != null)
 {
  TraceWriter.WriteLine("SerializeAsync - before load",
   TraceLevel.Verbose);

  return r.Content.LoadIntoBufferAsync()
   .Then(() =>
   {
    TraceWriter.WriteLine("SerializeAsync - after load", TraceLevel.Verbose);
    var httpMessageContent = new HttpMessageContent(r);
    // All in-memory and CPU-bound so no need to async
    return httpMessageContent.ReadAsByteArrayAsync();
   })
   .Then( buffer =>
      {
       TraceWriter.WriteLine("SerializeAsync - after ReadAsByteArrayAsync", TraceLevel.Verbose);
       return Task.Factory.FromAsync(stream.BeginWrite, stream.EndWrite,
        buffer, 0, buffer.Length, null, TaskCreationOptions.AttachedToParent);                                                        
      }
     );

   ;
 }

Even looking at brackets gives me headache.

Is Async worth it at all?

Now we talk a lot about Async operations and its role in improving scalability. But really, is it worth it? How much scalability would it bring? Would it help or hinder?

The answer to these questions is yes, it does help. The more IO you do on your server-side actions, the more you benefit from improvement from scalability. So it is highly advisable to implement your ApiController actions as Async by returning Task or Task<T>

The truth is, it will help even with your non-IO-bound operations although it is not advisable to use Async in such scenarios. You can test it for yourself, create a sync and an async controller to do exactly the same operation and use a benchmarking tool to compare the performance.

I have a CarManager sample on GitHub which I use for testing CacheCow.Server and it contains two simple  controllers: CarController and CarAsyncController. All these do is to use an in-memory repository and their GET only looking up the dictionary by its key:

// sync version
public Car Get(int id)
{
 return _carRepository.Get(id);
}


// async version (on another controller)
public Task<Car> GetAsync(int id)
{
 return Task.Factory.StartNew(() => _carRepository.Get(id));
}

So if you use a benchmarking tool such as Apache Benchamrk ab.exe, you could see a slight increase in throughput using the async controller. In my case, there was a 10% increase in throughput using async.

My ordeal with a bug

Development of CacheCow has been marred by existent of a problem which as we will see, turns out to be not in my code. I have been battling with this for a few weeks (on and off) and could not progress CacheCow development because of that.

OK, here is how my story begins; I think the Sherlock Holmes nature of this troubleshooting could be amusing for others too. After realising that using simple ContinueWith will not flow the context (see previous post) I was tasked with changing all such cases with Then in the TaskHelpers which checks existence of SynchronizationContext and flows the context if it exists.

On the other hand, lostdev, one of CacheCow's most loyal users, informed me of an occasional null reference exception in CacheCow.Server. Now, I had already fixed a bug related to null reference exception when the a resource was being retrieved for the first time. I attributed the problem to the fix I had made and reported that the problem is fixed in the current version.

So I started developing file-based cache storage for CacheCow.Client (which will have its own post very soon) and replaced all ContinueWith cases with Then.

And then I started to experience deadlocks in CacheCow.Client when I was using file-based caching and sending concurrent GET requests to the server. As soon as I would remove FileStore, and replace with InMemoryCacheStore, it would work. So I started searching through the client code, debug, look at the threads, debug again, change code, debug... to no avail. As soon as I was using file-based caching it would start to appear so it had to be on the client.

Then I noticed a strange thing: I could only run 4 concurrent calls and rest would be blocked. Why? Then I started playing with the maxconnection property of the system.net configuration:

  <system.net>
 <connectionManagement>
   <add address = "*" maxconnection = "N" />
 </connectionManagement>
  </system.net>

and interestingly, by setting the N to a high number, I would get more concurrent connections - but only up to the number defined. Hmmm... so the requests do not quite finish. OK, I fired up Sysinternals' TcpView but unfortunately these connections did not show up (and I do not know why).

I was getting nowhere until I accidentally loaded an earlier version of the server code. To my surprise, I did not get the deadlock but this error which @Tugberk separately reported earlier but attributed to order of handlers:

[NullReferenceException: Object reference not set to an instance of an object.]
System.Web.Http.WebHost.HttpControllerHandler.EndProcessRequest(IAsyncResult result) +112
System.Web.Http.WebHost.HttpControllerHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result) +10
System.Web.CallHandlerExecutionStep.OnAsyncHandlerCompletion(IAsyncResult ar) +129

OK, so it is probably happening on the server but the continuation code gets deadlocked on unhandled exception. I am close! So it was time to go to bed and I was positive that I would nail it the day after.

It was funny that I woke up the day after and with my in-bed reading on tweets, stumbled on @Tugberk's tweet on issue he had just created. That sounds exceedingly similar, so we just doubled checked our scenarios and it turned out that an HttpResponseMessage with empty RequestMessage property is not handled in Web API and a null reference exception is thrown at the end of the response clean-up code. And the reason I was seeing it only with file-based cache store was that the part of server-side code to return such responses was being triggered only using file-based store (since it was capable of persisting caches and was trying to validate the cache).

So as you can see, a seemingly unrelated problem can really confuse the nature of the bugs in async scenarios.

Conclusion

First of all, always use request.CreateResponse() instead of using new HttpResponseMessage. I googled for cases of new HttpResponseMessage and found +3000 entries. This is really dangerous and I think this is a bug in Web API and needs to be fixed. If you are using new, make sure you set the RequestMessage property.

And in general, be careful with doing server-side async operations. It is really a powerful axe but with it you are not quite sure what a slightly off swing could bring. Careful with that axe Eugene.

Monday, 10 September 2012

Going away

There are feelings that are so complex they don't have a name - unless we make one. They can be described, but not easily. Now "Going away" is one of those feelings.

Have you ever been to a place, on a holiday, and enjoying the places you visit and yet you cannot stop thinking about the fact that this is probably the last time you see this place? That is why you start taking pictures, videos, you want to register everything since you know your memory won't. It is likely that you would not go back to those pictures but having them is like you "own" those memories.

* * *

Beaumaris (map) is a tiny town in the remotest corner of the north west of Wales - in the Anglesey island. Once visiting Wales (we love Wales), on our hunt for nice places to eat, we found ourselves in there. Passing sailing boats on a green road with the Irish Sea on the right and stony walls on the left, we approached a town boasting a rich history - later found out about its Castle. We had no idea what we will find there.

Memory of that night was not captured on any picture. It did not need to. It felt like a surreal experience of two invisible tourists/observers entering an almost dream-like banquet. It was almost fictional. It reminds me of the opening (and closing) scene of the Russian Ark.

We parked in the corner of the square overlooking the Irish Sea and found our way through a small entrance to the main street. Ye Olde Bull's Head Inn was easy to find: a very old traditional pub with a classy hotel on the top. We ordered drinks and sat down in very old wooden chair and benches. While walking we were careful of our head: like all old pubs ceiling is low and wooden bars are visible.

And there was a banquet: a reunion of army forces of some sort - many of them American. Now out of  all places, why Beaumaris? Had they travelled all the way from the ocean to meet at such a remote place? I had no answer. I did chat to a few officers walking around there but my inquiry did not produce much more information. I have not been able to find any reference to this reunion on internet.

After 40 minutes, we were ushered to the brasserie where food was served. Suddenly atmosphere changed. It was a modern extension to the old building and an ethereal ambient light had made the chic contemporary design even more magical. Starter, food and dessert were excellent. Probably among best foods we ever had.

On leaving, I felt like "going away". But it did not matter. I had absorbed all those beautiful yet surreal moments.
* * *

I am visiting my family in Iran now. I do go back almost every year although now most my friends and my brother and sister are abroad.

My mother-in-law is ill. Medically speaking she is terminal. But everything is possible. Isn't it?

This is a chance to spend time with her and my parents. Things are not easy. Seeing suffering and not being able to cure is not easy - although there are things we can do. My wife is doing most of them so really there is not much for me to do. I occasionally help them out and I am just there, in case. My mother-in-law is a brave woman. She has been fighting and keeps fighting. This is so important for someone battling with a fierce illness. My wife is also very brave. She is fighting her emotions trying to help medically as much as she can - she is a doctor, a real one not like me who chickened out to do what he likes to do.

My holiday is running out - and the feeling of "going away" is coming back.

* * *

We visited Beaumaris again only after two years. This time there were no banquets but the place had no less magic: it was exactly as we had seen in that evening two years ago. Ye Olde pub with its crisp beers and the food excellent like before. 

I felt really special. It felt like an honour to visit the place again. I was fortunate enough to visit a "gone away" place.

And with my mother-in-law, I think it is possible - and perhaps definite. In this world. Or in the next.

Friday, 24 August 2012

Server-side TPL Async: Don't risk learning these lessons the hard way

[Level T2]

There has been more than a few times that I have felt I know all about TPL. Only to realise sometime later I was wrong, very wrong. Now you might read this and say to yourself "Come on, this is basic stuff. I know it well, thank you vety much". Well, it is possible that you could be right but I advise you carry on reading; what follows can surprise you.

None of what I am gonna talk about is new or it is being blogged about for the first time. Brad Wilson has an excellent series on the topic here but this is to serve as a digest of his posts targeted at a broader audience in addition to a few other points.

While this post is not directly related to ASP.NET Web API, most examples (and cases) are related to day-to-day scenarios we encounter in ASP.NET Web API.

Remember this covers pre-async/await keywords in .NET 4.5 and what you need to do if you are using .NET 4.0 and not async/await. Using async/await will cover you for some of the problems described below but not all.

Don't fire and forget

Tasks are ideal for decoupling pieces of functionality. For example, I can perform a database operation and at the same time audit the operation by outputing a log entry, writing to a file, etc. Using tasks I can de-couple these operations so that my database task returns without having to wait for audit to finish. This makes sense since database operation is high priority but audit is low priority:

private void DoDbStuff()
{
   CallDatabase();
   // doing audit entry asynchronously not to bog down database operation
   Task.Factory.StartNew(()=> AuditEntry("Database stuff was done"));
}

In fact, let's say we do not even care if audit is successful or not so we just fire and forget, it most audit will fail which is low priority. OK, it all seems innocent?

No! This innocent operation can bring down your application. Reason for it is that all async exceptions must be observed even if you do not care about them. If you don't, they will haunt you when you least expect them, at the time finalizer for task is run by GC. Such an unhandled exception will kill your app.

The link above talks about various ways of observing an exception. The most practical is to use a continuation and access the .Exception property of the task (just accessing the property is enough, does not need to do anything with the exception itself).

private void DoDbStuff()
{
   CallDatabase();
   // doing audit entry asynchronously not to bog down database operation
   Task.Factory.StartNew(()=> AuditEntry("Database stuff was done"))
      .ContinueWith(t => t.Exception); // fire and forget!
}

Another option which is more of a safe-guard against accidental unobserved exception, is to register to UnobservedTaskException on TaskScheduler:

 TaskScheduler.UnobservedTaskException +=
  (e, sender) => LogException(e);

So we register a handler to handle unobserved exceptions and this way they will be "observed". If you need to read more on this, have a look at Jon Skeet's post here.

This problem has made Ayende Rahien to run for the hills.

Respect SynchronizationContext

Uncle Jeffrey Richter tells us that

By default, the CLR automatically causes the first thread's execution context to flow to any helper threads.

And then we also learn that we can use ExecutionContext.SuppressFlow() to suppress flow of the thread context.

Now, what happens when we use ContinueWith()? It turns out unlike standard thread switches, context does not flow (I do not have a reference, if you do please let me know). This will help with improving performance of asynchronous task as we know context switching is expensive (and big part of it is context flow).

So why is it important? It is important because so many developers are used to HttpContext.Current. This context is stored in the thread storage area and passed along at the time of context switching. So if the context does not flow, HttpContext.Current will be null.

SynchronizationContext is a similar (but not same) concept. It is about a state that can be shared and used by different threads at the time of switching. I cannot explain this better than Stephen here. So using Post on SynchronizationContext ensures that the execution of continuation will happen in the same context and not necessarily by the same thread.

So basically the idea is that if you are in a Task pipeline (best example being MessageHandlers in ASP.NET Web API), you need to take responsibility for passing the context along the pipeline.

This is a snippet from ASP.NET Web API Source code that displays the steps. First of all you check to see if current context is null, if it is not then you have to use Post() to flow the context:

SynchronizationContext syncContext = SynchronizationContext.Current;

    TaskCompletionSource<Task<TOuterResult>> tcs = new TaskCompletionSource<Task<TOuterResult>>();

    task.ContinueWith(innerTask =>
    {
        if (innerTask.IsFaulted)
        {
            tcs.TrySetException(innerTask.Exception.InnerExceptions);
        }
        else if (innerTask.IsCanceled || cancellationToken.IsCancellationRequested)
        {
            tcs.TrySetCanceled();
        }
        else
        {
            if (syncContext != null)
            {
                syncContext.Post(state =>
                {
                    try
                    {
                        tcs.TrySetResult(continuation(task));
                    }
                    catch (Exception ex)
                    {
                        tcs.TrySetException(ex);
                    }
                }, state: null);
            }
            else
            {
                tcs.TrySetResult(continuation(task));
            }
        }
    }, runSynchronously ? TaskContinuationOptions.ExecuteSynchronously : TaskContinuationOptions.None);

    return tcs.Task.FastUnwrap();

There is a horrifying fact here. Most of the DelegatingHandler code out there (including some of mine) in various samples around internet do not respect this. Of course, looking at ASP.NET Web API source code reveals that they do indeed take care of this in their TaskHelper implementations and Brad tried to make us aware of it in his blog series. But I think we have not taken enough attention of the implications of ignoring SynchronizationContext.

Now my suggestion is to use the TaskHelpers and its extensions in the ASP.NET Web API (it is open source) or use the one provided in Brad's post. In any case,

Don't use Task for CPU-bound operations

Overhead of asynchronous operations is not negligible. You should only use async if you are doing an IO-bound operation (calling another web service/API, reading a file, reading a lot of data from database or running a slow query). I personally think even for normal IO operations, sync is more performant and scalable.

As we have talked about it here, the point about asynchronous programming on server-side is releasing the thread to be able to serve another request. Tasks are normally served by the CLR thread pool. If server already needs managed threads for its operations, it will be using CLR thread pool too. This means that by doing async operations you could be stealing threads needed for server's normal operations. A classic example is ASP.NET, so you should be careful to use async only if needed.

ContinueWith is Evil!

I think by now you should know why standard ContinueWith can be evil. First of all, it does not flow the context. Also it makes it easy for unboserved exceptions to creep into your code. My suggestion is to use .Then() from ASP.NET Web API's TaskHelpers.

Performance comparison

I think it is still early days - but I must say I would love to do a benchmark to quantify overhead of server-side asynchronous programming. Well if I do, this place will be where the result will first appear :)

So. Do I think I know all about TPL now? Hardly!

Monday, 6 August 2012

CacheCow.Client, using the benefits of HTTP Caching on the client

[Level T2]

Browsers are very sophisticated HTTP machines. We often fail to remember how much of the HTTP spec is implemented by the browsers.

As I have said before, ASP.NET Web API is a very powerful server-side framework but there is a client-side burden in using it or generally implementing a RESTful system - although Web API does not restrict you to a RESTful style.

Because of the client burden, we need more and more client-side libraries to implement lacking features that browser have had for such a long time - one of which is HTTP caching. If you use HttpClient out of the box, it will not implement any caching even though the resources are cacheable. Also all of the work for conditional GET or PUT calls (using if-none-match, etc) or cache validation (if there is must-revalidate) or checking whether your cache is stale has to be done in your own code.

CacheCow is an HTTP caching library for client and server in ASP.NET Web API that does all of above - see my earlier post on that. Storage of the cache is abstracted in ICacheStore and for now we can use in memory implementation (see below). So the features in the client library include:

  • Caching GET responses according to their caching headers
  • Verifying cached items for their staleness
  • Validating cached items if must-revalidate parameter of Cache-Control header is set to true. It will use ETag or Expires whichever exists
  • Making conditional PUT for resources that are cached based on their ETag or expires header, whichever exists

Today I released v0.1.3 of the CacheCow.Client on NuGet. This library would implement advanced HTTP caching with little or no configuration or hassle. All you have to do is to add the CachingHandler as a delegating handler to your HttpClient:

var client = new HttpClient(new DelegatingHandler()
       { 
           InnerHandler = new HttpClientHandler()
       });

This code will create an HttpClient that implements caching and stores the cache in memory. By implementing ICacheStore, you can store the cache in your custom repository. CacheCow is going to have persistent cache stores such as FileCacheStore, SqlCeCacheStore and SqliteCacheStore as a minimum. FileCacheStore will be similar to browser implementation of cache storage. Each of these cache stores will be implemented and released under its own NuGet package. To add an alternative cache store, you need to pass the store as a constructor parameter.

Usage

So in order to use, CacheCow.Client, use package manager in Visual Studio to download and add reference to it:

PM> Install-Package CacheCow.Client

This will also download and add reference to ASP.NET Web API client package, if you have not already added a reference to. Make sure try v0.1.3 or above (by the time of reading this).

After this you just need to create an HttpClient as above and add the CachingHandler as a delegating handler. That's it, you are ready to call services and cache the responses!

Sample

I am working on a sample project but for now, it is easiest to use the code below to call my CarManager Azure website which implements HTTP Caching. The code can be pasted from this GitHub gist.

CacheCow.Client adds a special header to the response which helps with debugging its various features. The header's name is x-cachecow and has a various flags on the operations done on the request/response. So in the code below, we will use this header to demonstrate the features of this library.

var client = new HttpClient(new CachingHandler()
                    {
                        InnerHandler = new HttpClientHandler()
                    }
 );
var initialResponse = client.GetAsync(
      "http://carmanager.azurewebsites.net/api/Car/5").Result;
var initialResponseHeader = initialResponse.Headers.Single(
       x => x.Key == CacheCowHeader.Name).Value.First();
Console.WriteLine(initialResponse.Headers.ETag.Tag);
Console.WriteLine(initialResponseHeader);

And we will see this to be printed:
"02e677a7799e484fb49447f8a600247d"
0.1.3.0;did-not-exist=true
As you can probably figure out, we have the ETag and the CacheCowHeader: first value is the version and did-not-exist means that item did not exist in the cache - which is understandable as this is the first call.

Now let's try this again:

var secondResponse = client.GetAsync("http://carmanager.azurewebsites.net/api/Car/5").Result;
var secondResponseHeader = secondResponse.Headers.Single(
      x => x.Key == CacheCowHeader.Name).Value.First();
Console.WriteLine(secondResponseHeader);

And what will print is:
0.1.3.0;did-not-exist=false;cache-validation-applied=true;retrieved-from-cache=true
So in fact, it existed in the cache, retrieved from the cache and cache validation was applied. Cache validation is the process by which client makes conditional call to retrieve/update a resource only if the condition is met (see Background section in this post). For example, in GET calls it will send the ETag with a if-none-match header to retrieve

If you call a PUT on a resource that is cached, CacheCow.Client will use its ETag or Expires value to make a conditional PUT, unless you set UseConditionalPut property to false.

By-passing caching

There are some cases where you might not want the result be cached or retrieved from the cache regardless of the caching logic. All you have to do is to set the CacheControl header to no-cache or no-store:

var nocacheRequest = new HttpRequestMessage(HttpMethod.Get, 
  "http://carmanager.azurewebsites.net/api/Car/5");
nocacheRequest.Headers.CacheControl = new CacheControlHeaderValue()
 {
  NoCache = true
 };
var nocacheResponse = client.SendAsync(nocacheRequest).Result;
var nocacheResponseHeader = nocacheResponse.Headers.FirstOrDefault(
 x => x.Key == CacheCowHeader.Name);
Console.WriteLine(nocacheResponseHeader);

This will print an empty header since we have by passed the caching.

Last but not least

Thanks for trying out and using CacheCow. Please send me your feedbacks and bugs. Just ping me on twitter or use GitHub's issue tracker.

Sunday, 5 August 2012

Hierarchical routing for ASP.NET Web API: RESTful resource organisation


Introduction and motivation

[Level T3] A question keeps popping up on various forums (StackOverflow, ASP.NET Forums, etc) on how to define routing in ASP.NET Web API. And more important is that there does not seem to be any consensus on how to approach this - well probably since ASP.NET Web API is still in preview (RC).

In this post, I want to have a fresh look at routing in ASP.NET Web API and present a hierarchical model for organising resources.

Background

Resources are important part of RESTful architectual style. In REST all server operations (API) are defined as interactions with resources - in HTTP terms it would mean Verb interactions with URLs. This is in contrast with RPC style where server operations are defined as method calls. 

ASP.NET Web API's routing mirrors ASP.NET MVC routing - similar to some other aspects of Web API which uses ASP.NET MVC as the baseline since it is a familiar model for developers.

Routing was first introduced to ASP.NET by MVC. Later on routing code was integrated into system.web.dll

Routing in ASP.NET MVC is very flat since all routes are defined from the root. This was one of the reasons MVC Areas were introduced to add another layer to top root routes. This can work for MVC but RESTful resource organisation usually requires more nested structure while using conventionally routing can result in configuration burden as well performance penalty.

How routing works

MVC (and Web API) routes are generally designed for a handful (and in most cases less than 100) routes. You were expected to use the default route for most cases and add a few more routes for occasional cases where the pattern was different. This in fact worked in most MVC applications but RESTful resource design requires richer and a more nested structure as we will see below.

Route definition in ASP.NET Web API - as you all have probably used and know - is based on adding route to the RouteCollection:

var routes = GlobalConfiguration.Configuration.Routes; // in web hosted environment
routes.MapHttpRoute(
 name: "DefaultApi",
 routeTemplate: "api/{controller}/{id}",
 defaults: new { id = RouteParameter.Optional }
);

Each mapping will add a new HttpRoute object to the collection. HttpRoute itself is an implementation of IHttpRoute:

public interface IHttpRoute
{
    string RouteTemplate { get; }

    IDictionary<string, object> Defaults { get; }

    IDictionary<string, object> Constraints { get; }

    IDictionary<string, object> DataTokens { get; }

    HttpMessageHandler Handler { get; }

    IHttpRouteData GetRouteData(string virtualPathRoot, HttpRequestMessage request);

    IHttpVirtualPathData GetVirtualPath(HttpRequestMessage request, IDictionary<string, object> values);
}

Most important method is GetRouteData where the matching takes place. Basically if an implementation of IHttpRoute returns a non-null IHttpRouteData then route has matched.

Now how the matching work? Well, RouteCollection will loop through all the routes and calls GetRouteData and the first one to return a non-null will be the matched route for a given URL:

// snippet from HttpRouteCollection
foreach (IHttpRoute route in _collection)
{
    IHttpRouteData routeData = route.GetRouteData(_virtualPathRoot, request);
    if (routeData != null)
    {
        return routeData;
    }
}

Did you notice something fundamental above? If you have 1000 routes and your given URL matches the 1000th route, GetRouteData (which involves complex and heavy string processing if you look at the Web API source code) has to be called for the first 999 routes and fail until it reaches your matched route. With 100, this probably not an issue but with numbers going up, performance will take a hit.

RESTful organisation of resources

This StackOverflow question is one of many questions on the Web API routing according to REST style. One of the main challenges is that, we as Microsoft developers, have used to design our API in an RPC fashion. This habit has been ingrained in our psyche since remoting days and then Web Services and more recently WCF. So it is only natural to fall into the same habit when designing or REST API.

Martin Fowler on his article Steps towards the glory of REST talks about 3 levels of REST implementation. We will be talking about the level 1 and briefly about 2. Level 3 which is the most noble constraint of REST focuses on hypermedia which is beyond the topic of our discussion.

So at the level 1, we design the server to expose its API as resources. In case of HTTP and URLs, this will look like the directory/file structure on the disk - and as such hierarchical. If we mix REST with DDD concepts, each aggregate root of the publicly exposed domain (which is called Server domain, see the post on Client-Server) will be exposed at the root (considering API root is /api/):

/api/{AggregateRoot}/{id}

An example for cars would be /api/Car/1243. As such, we would have a CarController that receives the id through the URL. Now all of this is easily achievable using the default route very much in the good old MVC fashion.

However, the picture gets more complicated when we want to expose the tyres of the car. Let's say a car has Front/Rear Left/Right tyres as such FL, FR, RL and RR. One approach would be to expose the tyres as the aggregate root and have an ID for each tyre:

/api/Tyre/1234567

Well, this will work but tyre is really not a aggregate root since it really would only have a meaning as part of the car. So ideally we should expose them only as part of a car:

/api/Car/1243/Tyre/FR

So we would need a TyreController to handle the scenario and in this case we need a route similar to below:

/api/Car/{carId}/{controller}/{id}

Now this can be written in the generic form of:

/api/{parent}/{parentId}/{controller}/{id}

But as you can see in this case we need to define carId as parentId. Also not all of the domain has similar constraints for ID so this approach will impose a heavy limitation on our routes. On the other hand, we might have more than one nested levels where this can be even more complexed.

In addition to entity levels, we also have operations. Operations also can be defined as a resource, for example in case of a puncture:

POST /api/Car/1243/Tyre/FR/Puncture

This will create a puncture in the tyre. When we say create, we do not mean that necessarily a physical record needs to be created against the tyre. In fact mapping between REST resources and domain models is usually loose. The point is every operation needs to exposed as a resource and all operations to be performed using HTTP verbs. For example, repairing puncture can be represented as:

DELETE /api/Car/1243/Tyre/FR/Puncture

So the operations will complicate the routing even more. If the domain is big and complex, we could end up with many routes. As such, the performance will be hampered and we end up with the burden of defining all these routes at the root level.


So what is the solution? Solution is to turn Web API's flat routing into a hierarchical one. 







Tuesday, 31 July 2012

Using range header for retrieving range of IEnumerable<T> in ASP.NET Web API


Introduction

[Level T3] In this post, we talk about using HTTP's Range header to achieve requesting data ranges for entities.

Background

HTTP spec defines a series headers that can be used for a client to request for a partial content. These operations are optional (in most cases spec uses word SHOULD) but most servers implement them and browsers have increasingly been using them. If you have ever resumed downloading a big file from internet, then you have used this feature (in fact all browsers use range if supported by server). In this case, client keeps requesting chunks and builds up the file until it is fully downloaded.

So here is how it works in a nutshell:

  1. Server can optionally informs clients while serving a resource that it supports partial content. It does that by sending Accept-Range header with a value of the unit it supports, normally bytes. In our case, our server sends back a custom unit that we call x-entity.
  2. Client, either informed by the server on partial content feature based on Accept-Range header or just simply tries its luck, sends a request with Range header with value of [unit]=[from]-[to] for example bytes=1024-2047. In this example, client asks for the second KB of the file. Range header can specify multiple ranges for example bytes=500-600,601-999
  3. Server will return the range requested and include a Content-Range header with value [units] [from]-[to]/[TotalCount]. For example bytes 1024-2047/12345678. It also returns status code 206 (partial content) to inform the client that the content is partial. If server does not support the range specified, it will send back status code 416.
Spec does consider using custom units so server can implement its own custom units and inform the client of the unit using Accept-Range header. Now the idea is that in ASP.NET Web API, we normally build many actions that return IEnumerable<T>. What if we could use the range to specify range of the enumerable to be returned? Hmmm....

This feature can be useful in case of pagination on the client so that instead of API implementing a range parameter all the time, we just use HTTP's built-in features and encapsulate the implementation in a reusable component, in this case a filter.

So in the code to follow, we define a custom range unit and call it x-entity. "x-" prefix is a common naming convention on the web to specify custom tokens that are not part of the canonical tokens defined in RFC specs.

Implementing range in ASP.NET Web API

So where is the best place to implement this? We have these requirements:

  • Access to request headers to read Range header
  • Access to response header to set Accept-Range
  • Access to content headers to set Content-Range header
  • Access to content so that it can filter IEnumerable<T>
DelegatingHandler might look promising but by the time it accesses the content, it is already turned into stream by MediaTypeFormatters.

MediaTypeFormatter is an interesting option. I actually created a RangeMediaTypeFormatterWrapper that would wrap the MTFs and intercept the content and if it was of type IEnumerable<T>, it would apply the filtering. Initially it seems MTF does not have access to request headers but in here we had an interesting discussion and it turns out it can access request using GetPerRequestFormatterInstance. But it also needs access to response headers.


So Glenn Block suggested filters and after some thoughts, it seems to be the right approach considering current limitations of MTF. The only drawback is that it has to be explicitly defined on the action - which can in some cases be a blessing in fact. In any case, filter approach as you will see is clean and does everything in the same place.

Filters in ASP.NET Web API is not much different from MVC. You get two methods: before (OnActionExecuting) and after (OnActionExecuted) the action where you can change values in request, response, action arguments or simply examine values.

Using the code

You can get the source code from GitHub. As you can see, we have a single controller called CarController. Project is running on port 50714 on my machine so all examples will include this port - yours could be different. So download the project, build and run it. I created a client project to implement all steps below using HttpClient but there is an issue with ASP.NET Web API implementation of the Range header that regardless of the unit set in the range header, always bytes is sent to the server.

As you can see, we have a simple action with EnableRange filter defined on it:

[EnableRange]
public IEnumerable<Car> Get()
{
 return CarRepository.Instance.Get();
}

So now we use fiddler (or similar tool capable of sending HTTP requests, such as Google's Postman) to send this request:

GET http://localhost:50714/api/Car HTTP/1.1
User-Agent: Fiddler
Host: localhost:50714

We will get all the cars in our repository in JSON format. But note the Accept-Range header with value of  x-entity:

HTTP/1.1 200 OK
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/json; charset=utf-8
Expires: -1
Accept-Ranges: x-entity
Server: Microsoft-IIS/8.0
Date: Tue, 31 Jul 2012 18:59:56 GMT
Content-Length: 1125

[{"Id":1,"Make":"Vauxhall","Model":"Astra","BuildYear":1997,...

So this should tell the client that it can use Range header. Now let's send a range header requesting 3rd item to 6th item (total of 4 items):

GET http://localhost:50714/api/Car HTTP/1.1
User-Agent: Fiddler
Host: localhost:50714
Range: x-entity=2-5

And here is the response:

HTTP/1.1 206 Partial Content
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/json; charset=utf-8
Content-Range: x-entity 2-5/10
Expires: -1
Server: Microsoft-IIS/8.0
Date: Tue, 31 Jul 2012 19:00:19 GMT
Content-Length: 447

[{"Id":3,"Make":"Toyota","Model":"Yaris","BuildYear":2003,"Price":3750.0,...

Note the Content-Range header above and also the fact that we got the entities we requested in JSON (not shown fully above). So it tells us that it has sent back items from index 2 to index 5 and total of items is 10. Also note the 206 response.

Server can send back * if number of items is not known at the time of serving the request. I have used this option since I do not want to run a Count() on an IEnumerable<T>. It is very likely that the data is being retrieved from database and we do not want to load the whole table into memory. So my approach is to try cast the value into ICollection using as keyword. If it case OK then I get the count, otherwise I set the count to *.

Another option in the spec is that to in the range is optional so the client can send a range header with value 2-*. In this case, we must skip the first 2 items and return the rest:

GET http://localhost:50714/api/Car HTTP/1.1
User-Agent: Fiddler
Host: localhost:50714
Range: x-entity=2-

In this case, our server returns this response:

HTTP/1.1 206 Partial Content
Cache-Control: no-cache
Pragma: no-cache
Content-Type: application/json; charset=utf-8
Content-Range: x-entity 2-9/10
Expires: -1
Server: Microsoft-IIS/8.0
Date: Tue, 31 Jul 2012 21:18:05 GMT
Content-Length: 897

[{"Id":3,"Make":"Toyota","Model":"Yaris","BuildYear":2003,"Price":3750.0, ....

Notes on implementation

The crux if the implementation is to call Skip() and Take() on IEnumerable<T>. Our code has to be able to work with all types hence cannot be generic. On the other hand, filters (and attributes as a whole) cannot use generics. As such we just have to use reflection to do this:

[EnableRange]
var skipMethod = t.GetMethods().Where(m => m.Name == "Skip" && m.GetParameters().Count() == 2)
 .First().MakeGenericMethod(_elementType);
var takeMethod = t.GetMethods().Where(m => m.Name == "Take" && m.GetParameters().Count() == 2)
 .First().MakeGenericMethod(_elementType);
...
value = skipMethod.Invoke(null, new object[] { value,  from});
if(to.HasValue)
 value = takeMethod.Invoke(null, new object[] { value, to - from + 1 });

Also it is useful to note that the return value is not accessible in the filter and we have to resort to using Content and casting it to ObjectContent and use the Value property.

Conclusion

Range header defined in HTTP spec is useful in retrieving partial content. We can use custom units and we defined x-entity unit to enable selecting a range of entities (commonly used in pagination scenarios) and implemented it using a filter.