Saturday, 20 October 2012

How PayPal is helping Iranian government's internet censorship

[Level N]

I have difficulty believing what I read in Hacker News around the same time my account was blocked and closed by PayPal trying to pay 3$ for internet proxy to combat filtering which has rendered internet pretty much useless in Iran. Did he read my email? I do not know but it left me frustrated and hopeless in one of the most difficult times in my life.

As some of you might know, I have been going through rough times for the last 18 months. My mother in law   passed away 3 weeks ago after a long battle during which my wife spent mostly with her - which is a consolation.

So when I visited her and family in Iran (that's where I am from) around 2 months ago, I realised that internet censorship is so bad that it has become really unusable. Sites such as twitter which I am addicted to are obviously blocked as they played an important role in Iran's suppressed Green Revolution, arguably first Twitter Revolution in world . In fact even this blog that you are reading and anything hosted on Google's blogspot is filtered - Iran had highest number of bloggers in the Middle East and a number of them are in prison. But even Gmail gets its share and gets blocked from time to time. According to some reports, 40% of Iranians (30 million) use internet which is second in Middle East after Israel, and as far as I know, most of them use either internet proxies, anti-filters, anonymisers and VPNs to bypass the censorship. So even if we say only half use anti-censorship tools, we are talking about 15 million people. I had actually setup a dedicated PC in UK for my wife to use as remote desktop connection (RDP) since sometimes these proxies are found by the government and blocked. But slow speed of internet (intentionally kept low) makes it almost useless since each refresh of the screen takes a few seconds.

So how does PayPal come into this? Most of these companies only accept PayPal. And PayPal blocks all accounts if it realises the IP is from Iran. Regardless of the amount, who is the receiver, how long the account has been used or behaviour of the account. Why? That is a a very good question, but maybe because someone is trying to purchase nuclear equipments or funding terrorism or ...! Honestly is that not silly? Paying 3$ for anti-censorship filter is illegal because you are connected from Iran? If anyone wants to use their PayPal account for illegal activities, they sure will use a proxy first so that they mask their IP. This will only affect ordinary people like me that are trying to pay for proxies as surely with the sanctions, you cannot buy anything that ships to Iran.

Now out of everyone, I am among the people least would wish to help Iran's government. My family was struck by this very government when my uncle working as Political Analyst for British Embassy was arrested by the authorities charged with spying back in 2009. He was then released after months but due to constant pressure and persecution from the authorities he had to flee from Iran and now has resumed his work in the Foreign Office in UK.

So where does this leave me?

Well it leaves my account blocked and closed. Having come back, I cannot use my account anymore. Emails I have sent have been responded with utter disinterent and "We don't care" attitude. And I have really hard time believing whether PayPal CEO does read complaint emails. I think I might have been rash with my tone in some emails but my frustration was extreme because of the unjustice.

But I am only one in many. Lives of many millions of Iranians have been affected by sanctions. Ordinary people suffer from the hands of the brutal government yet they find no consolation by the way they are treated outside Iran especially PayPal. For them, internet is the only way out of the oppression but blocking purchase of anti-censorship accounts is standing side-by-side with the Iranian regime. Does this make you happier Mr. David Marcus?


Saturday, 22 September 2012

Take your Web API service consumption up to 11 with CacheCow.Client

ASP.NET Web API is here and a lot of teams have already started building software using it. If you have followed this and other blogs on webapibloggers.com, you probably have seen many various possibilities and avenues this framework brings for designing and building clean and scalable services.

ASP.NET Web API exposes all the goodness of HTTP. Caching is an important feature of HTTP and ASP.NET Web API allows for building services and clients that take advantage of this feature. I, along with a few friends in the Web API community, have been busy building caching extensions for Web API in a project called CacheCow which is hosted on GitHub.

CacheCow has two separate components: server and client. These will be used independently by service providers (server) and their consumers (client).

Server component allows for easy handling of HTTP caching scenarios on server by generating ETag, responding to validation of cache (see earlier posts on this subject, especially this and for full list here), cache invalidation and storage of cache metadata. Storage of cache metadata is possible in various stores, currently in-memory and SQL Server have been implemented and RavenDB and Redis is on the pipeline [UPDATE: RavenDB is implemented and NuGet package is available here]. Since storage has been abstracted away, any storage mechanism can be plugged in without making any server changes.

Client component looks after making cache-aware requests, cache validation and cache storage. Currently in-memory and file-based storage is available but other stores such as Redis, SQL Server, MongoDB and RavenDB are in the pipeline. Since storage has been abstracted away, any storage mechanism can be plugged in without making any client changes. One of important features of storage is total and per-site quota.

It is important to note that while clients can be browsers or native Apps (WPF, Silverlight, iOS, Android, etc), arguably more often than not they will be server components themselves. For example, an ASP.NET web site can call services of an ASP.NET Web API server. Also middleware components could similarly use resources exposed by Web API. As such it is very crucial that cache storage solutions are performant, scalable and configurable.

In this post, I will look into CacheCow.Client a little but more. For more info, you can read previous posts on the topic in this blog.

CacheCow.Client alternatives

The only alternative to CacheCow.Client (that I am aware of) is using WinINET caching. Internet Explorer also uses this so the cache store will be the same. This is basically windows' HTTP request stack which has been exposed in .NET Framework since v 2.0 through WebRequest:

RequestCachePolicy policy = 
        new RequestCachePolicy( RequestCacheLevel.Default);
WebRequest request = WebRequest.Create(uri);
request.CachePolicy = policy;
WebResponse response = request.GetResponse();

As you can see, we can define a cache policy which will be applied to the request and according to the policy, Internet Explorer cache is used. Cache policy has a few possible values that are defined here. Notable values include:

  • CacheOnly: retrieves the request only from cache
  • BypassCache: does not use cache at all and goes straight to the server
  • CacheIfAvailable: retrieves from local or intermediate cache if resource available otherwise retrieve from server
  • Default: Similar to previous but current cache policy takes effect 

This same mechanism is now exposed in HttpClient but basically is built on the top of WebRequest. Henrik fully covers this feature in his blog here.


Basically in order to use WinINET caching with the new Web API stack, you need to create an HttpClient but provide WebRequestHandler as the MessageHandler:

HttpClient client = new HttpClient(new WebRequestHandler()
                           {
                               CachePolicy = new RequestCachePolicy( RequestCacheLevel.Default)
                           });
// this is a sample. It is not advised to use .Result since can lead to deadlock!
var httpResponseMessage = client.GetAsync("http://carmanager.softxnet.co.uk/api/car/3").Result;
var httpResponseMessage2 = client.GetAsync("http://carmanager.softxnet.co.uk/api/car/3").Result;

Using this feature, you can enable caching with little coding on the client.

Why I would choose CacheCow.Client rather than WinINET

Because it goes to 11! As we saw, it is very easy to get started with caching in HttpClient. But as we noted, it is very likely that HttpClient could be used in a server context hence having a reliable and scalable solution is very important in production.

Here are a few advantages of CacheCow.Client over WinINET (or rather disadvantages of WinINET):

1. Caching will be shared with Internet Explorer

In a production scenario, you need an implementation which is predictable and reliable. If someone uses Internet Explorer on the machine, storage area for your application's resources will be taken by just simple browsing. This can lead Internet Explorer to flush application resources in order to store resources for the browsing session. 

2. You have little control over quota

With CacheCow.Client, you can define a global and a per-site quota for storage of resources while such feature is not accessible (although there could be some registry entries for changing these variables) in WinINET caching. Also these variables could be overwritten by installation of a newer version of Internet Explorer.

3. Cache is local to the machine and cannot be shared across servers

In a production scenario, it is desirable to be able to store caches in a central store so network traffic and requests could be limited while with WinINET caching, each server will use its own local cache store.

4. WinINET is file-based

With WinINET, cache is stored in a file location while for a high-throughput production environment, robust caching using solutions such as Redis is required. CacheCow client by abstracting the storage can use any number of storage mechanisms such as Redis, MongoDB, RavenDB, etc.

5. CachePolicy is global for the HttpClient instance

Sometimes you might need to bypass caching. With WinINET, this has to be done with changing policy at the client level which applies across all requests for that HttpClient while CacheCow.Client respects will not use cached resources if you set CacheControl header of the request to no-cache. This basically recommended implementation based on HTTP specification (RFC2616).

6. With WinINET you do not know if request was retrieved from cache

With WinINET, there is no way to tell if response was retrieved from the cache or origin server. CacheCow.Client provides x-cachecow header which provides various information which can be used for debugging and troubleshooting scenarios.

Introducing CacheCow.Client.FileCacheStore

Last week I finished first version of a persistent cache store which is file based. This is available using NuGet and (package name is CacheCow.Client.FileCacheStore) and the code available at GitHub.

Using this persistent store is very easy. After getting the package from NuGet, create an HttpCient while as a delegating handler, pass CachingHandler (covered before here) while setting the store to a new instance of FileStore. While creating a FileStore, you need to specify a folder for storing the cached resources:

var httpClient = new HttpClient(
 new CachingHandler(
 new FileStore("c:\\Cache"))
{
 InnerHandler = new HttpClientHandler()
});

That is all you have to do! Now all your requests will store cacheable resources in a file-based persistent store. 

Currently for quota it uses default values but I am in the process of exposing values so you can configure quota.

CacheCow roadmap

After exposing quota settings, I will be working on CacheCow.Client.RedisCacheStore for a high throughput production level cache storage.

Please keep me posted by your comments, feedback and raising bug/issues on the GitHub page. You are awesome!

Monday, 17 September 2012

Server-side Async: Careful with that Axe, Eugene

[Level T3]

In a previous post, I talked about the dangers lurking in doing server-side async operations in .NET 4.0. As you know, .NET 4.5 provides a much better syntax allowing async/await keywords to take your TPL Task-Soups to a much more readable and organised code. But even so, async will make debugging your application more difficult and bugs could take much longer to be reproduced, isolated and fixed.

Task-Soup

In .NET 4.0, when we add up continuations to create a chained task, we could end up with a few problems:

  1. We could end up with an unobserved exception problem. This is nicely described by Ayende here
  2. Nested lambda expressions could create unexpected problems with closure of variables
  3. The code becomes hard to read.
On the third note, I will just bring an example from my own code in CacheCow. What is it that we are actually returning here?

return response.Then(r =>
{
 if (r.Content != null)
 {
  TraceWriter.WriteLine("SerializeAsync - before load",
   TraceLevel.Verbose);

  return r.Content.LoadIntoBufferAsync()
   .Then(() =>
   {
    TraceWriter.WriteLine("SerializeAsync - after load", TraceLevel.Verbose);
    var httpMessageContent = new HttpMessageContent(r);
    // All in-memory and CPU-bound so no need to async
    return httpMessageContent.ReadAsByteArrayAsync();
   })
   .Then( buffer =>
      {
       TraceWriter.WriteLine("SerializeAsync - after ReadAsByteArrayAsync", TraceLevel.Verbose);
       return Task.Factory.FromAsync(stream.BeginWrite, stream.EndWrite,
        buffer, 0, buffer.Length, null, TaskCreationOptions.AttachedToParent);                                                        
      }
     );

   ;
 }

Even looking at brackets gives me headache.

Is Async worth it at all?

Now we talk a lot about Async operations and its role in improving scalability. But really, is it worth it? How much scalability would it bring? Would it help or hinder?

The answer to these questions is yes, it does help. The more IO you do on your server-side actions, the more you benefit from improvement from scalability. So it is highly advisable to implement your ApiController actions as Async by returning Task or Task<T>

The truth is, it will help even with your non-IO-bound operations although it is not advisable to use Async in such scenarios. You can test it for yourself, create a sync and an async controller to do exactly the same operation and use a benchmarking tool to compare the performance.

I have a CarManager sample on GitHub which I use for testing CacheCow.Server and it contains two simple  controllers: CarController and CarAsyncController. All these do is to use an in-memory repository and their GET only looking up the dictionary by its key:

// sync version
public Car Get(int id)
{
 return _carRepository.Get(id);
}


// async version (on another controller)
public Task<Car> GetAsync(int id)
{
 return Task.Factory.StartNew(() => _carRepository.Get(id));
}

So if you use a benchmarking tool such as Apache Benchamrk ab.exe, you could see a slight increase in throughput using the async controller. In my case, there was a 10% increase in throughput using async.

My ordeal with a bug

Development of CacheCow has been marred by existent of a problem which as we will see, turns out to be not in my code. I have been battling with this for a few weeks (on and off) and could not progress CacheCow development because of that.

OK, here is how my story begins; I think the Sherlock Holmes nature of this troubleshooting could be amusing for others too. After realising that using simple ContinueWith will not flow the context (see previous post) I was tasked with changing all such cases with Then in the TaskHelpers which checks existence of SynchronizationContext and flows the context if it exists.

On the other hand, lostdev, one of CacheCow's most loyal users, informed me of an occasional null reference exception in CacheCow.Server. Now, I had already fixed a bug related to null reference exception when the a resource was being retrieved for the first time. I attributed the problem to the fix I had made and reported that the problem is fixed in the current version.

So I started developing file-based cache storage for CacheCow.Client (which will have its own post very soon) and replaced all ContinueWith cases with Then.

And then I started to experience deadlocks in CacheCow.Client when I was using file-based caching and sending concurrent GET requests to the server. As soon as I would remove FileStore, and replace with InMemoryCacheStore, it would work. So I started searching through the client code, debug, look at the threads, debug again, change code, debug... to no avail. As soon as I was using file-based caching it would start to appear so it had to be on the client.

Then I noticed a strange thing: I could only run 4 concurrent calls and rest would be blocked. Why? Then I started playing with the maxconnection property of the system.net configuration:

  <system.net>
 <connectionManagement>
   <add address = "*" maxconnection = "N" />
 </connectionManagement>
  </system.net>

and interestingly, by setting the N to a high number, I would get more concurrent connections - but only up to the number defined. Hmmm... so the requests do not quite finish. OK, I fired up Sysinternals' TcpView but unfortunately these connections did not show up (and I do not know why).

I was getting nowhere until I accidentally loaded an earlier version of the server code. To my surprise, I did not get the deadlock but this error which @Tugberk separately reported earlier but attributed to order of handlers:

[NullReferenceException: Object reference not set to an instance of an object.]
System.Web.Http.WebHost.HttpControllerHandler.EndProcessRequest(IAsyncResult result) +112
System.Web.Http.WebHost.HttpControllerHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result) +10
System.Web.CallHandlerExecutionStep.OnAsyncHandlerCompletion(IAsyncResult ar) +129

OK, so it is probably happening on the server but the continuation code gets deadlocked on unhandled exception. I am close! So it was time to go to bed and I was positive that I would nail it the day after.

It was funny that I woke up the day after and with my in-bed reading on tweets, stumbled on @Tugberk's tweet on issue he had just created. That sounds exceedingly similar, so we just doubled checked our scenarios and it turned out that an HttpResponseMessage with empty RequestMessage property is not handled in Web API and a null reference exception is thrown at the end of the response clean-up code. And the reason I was seeing it only with file-based cache store was that the part of server-side code to return such responses was being triggered only using file-based store (since it was capable of persisting caches and was trying to validate the cache).

So as you can see, a seemingly unrelated problem can really confuse the nature of the bugs in async scenarios.

Conclusion

First of all, always use request.CreateResponse() instead of using new HttpResponseMessage. I googled for cases of new HttpResponseMessage and found +3000 entries. This is really dangerous and I think this is a bug in Web API and needs to be fixed. If you are using new, make sure you set the RequestMessage property.

And in general, be careful with doing server-side async operations. It is really a powerful axe but with it you are not quite sure what a slightly off swing could bring. Careful with that axe Eugene.

Monday, 10 September 2012

Going away

There are feelings that are so complex they don't have a name - unless we make one. They can be described, but not easily. Now "Going away" is one of those feelings.

Have you ever been to a place, on a holiday, and enjoying the places you visit and yet you cannot stop thinking about the fact that this is probably the last time you see this place? That is why you start taking pictures, videos, you want to register everything since you know your memory won't. It is likely that you would not go back to those pictures but having them is like you "own" those memories.

* * *

Beaumaris (map) is a tiny town in the remotest corner of the north west of Wales - in the Anglesey island. Once visiting Wales (we love Wales), on our hunt for nice places to eat, we found ourselves in there. Passing sailing boats on a green road with the Irish Sea on the right and stony walls on the left, we approached a town boasting a rich history - later found out about its Castle. We had no idea what we will find there.

Memory of that night was not captured on any picture. It did not need to. It felt like a surreal experience of two invisible tourists/observers entering an almost dream-like banquet. It was almost fictional. It reminds me of the opening (and closing) scene of the Russian Ark.

We parked in the corner of the square overlooking the Irish Sea and found our way through a small entrance to the main street. Ye Olde Bull's Head Inn was easy to find: a very old traditional pub with a classy hotel on the top. We ordered drinks and sat down in very old wooden chair and benches. While walking we were careful of our head: like all old pubs ceiling is low and wooden bars are visible.

And there was a banquet: a reunion of army forces of some sort - many of them American. Now out of  all places, why Beaumaris? Had they travelled all the way from the ocean to meet at such a remote place? I had no answer. I did chat to a few officers walking around there but my inquiry did not produce much more information. I have not been able to find any reference to this reunion on internet.

After 40 minutes, we were ushered to the brasserie where food was served. Suddenly atmosphere changed. It was a modern extension to the old building and an ethereal ambient light had made the chic contemporary design even more magical. Starter, food and dessert were excellent. Probably among best foods we ever had.

On leaving, I felt like "going away". But it did not matter. I had absorbed all those beautiful yet surreal moments.
* * *

I am visiting my family in Iran now. I do go back almost every year although now most my friends and my brother and sister are abroad.

My mother-in-law is ill. Medically speaking she is terminal. But everything is possible. Isn't it?

This is a chance to spend time with her and my parents. Things are not easy. Seeing suffering and not being able to cure is not easy - although there are things we can do. My wife is doing most of them so really there is not much for me to do. I occasionally help them out and I am just there, in case. My mother-in-law is a brave woman. She has been fighting and keeps fighting. This is so important for someone battling with a fierce illness. My wife is also very brave. She is fighting her emotions trying to help medically as much as she can - she is a doctor, a real one not like me who chickened out to do what he likes to do.

My holiday is running out - and the feeling of "going away" is coming back.

* * *

We visited Beaumaris again only after two years. This time there were no banquets but the place had no less magic: it was exactly as we had seen in that evening two years ago. Ye Olde pub with its crisp beers and the food excellent like before. 

I felt really special. It felt like an honour to visit the place again. I was fortunate enough to visit a "gone away" place.

And with my mother-in-law, I think it is possible - and perhaps definite. In this world. Or in the next.

Friday, 24 August 2012

Server-side TPL Async: Don't risk learning these lessons the hard way

[Level T2]

There has been more than a few times that I have felt I know all about TPL. Only to realise sometime later I was wrong, very wrong. Now you might read this and say to yourself "Come on, this is basic stuff. I know it well, thank you vety much". Well, it is possible that you could be right but I advise you carry on reading; what follows can surprise you.

None of what I am gonna talk about is new or it is being blogged about for the first time. Brad Wilson has an excellent series on the topic here but this is to serve as a digest of his posts targeted at a broader audience in addition to a few other points.

While this post is not directly related to ASP.NET Web API, most examples (and cases) are related to day-to-day scenarios we encounter in ASP.NET Web API.

Remember this covers pre-async/await keywords in .NET 4.5 and what you need to do if you are using .NET 4.0 and not async/await. Using async/await will cover you for some of the problems described below but not all.

Don't fire and forget

Tasks are ideal for decoupling pieces of functionality. For example, I can perform a database operation and at the same time audit the operation by outputing a log entry, writing to a file, etc. Using tasks I can de-couple these operations so that my database task returns without having to wait for audit to finish. This makes sense since database operation is high priority but audit is low priority:

private void DoDbStuff()
{
   CallDatabase();
   // doing audit entry asynchronously not to bog down database operation
   Task.Factory.StartNew(()=> AuditEntry("Database stuff was done"));
}

In fact, let's say we do not even care if audit is successful or not so we just fire and forget, it most audit will fail which is low priority. OK, it all seems innocent?

No! This innocent operation can bring down your application. Reason for it is that all async exceptions must be observed even if you do not care about them. If you don't, they will haunt you when you least expect them, at the time finalizer for task is run by GC. Such an unhandled exception will kill your app.

The link above talks about various ways of observing an exception. The most practical is to use a continuation and access the .Exception property of the task (just accessing the property is enough, does not need to do anything with the exception itself).

private void DoDbStuff()
{
   CallDatabase();
   // doing audit entry asynchronously not to bog down database operation
   Task.Factory.StartNew(()=> AuditEntry("Database stuff was done"))
      .ContinueWith(t => t.Exception); // fire and forget!
}

Another option which is more of a safe-guard against accidental unobserved exception, is to register to UnobservedTaskException on TaskScheduler:

 TaskScheduler.UnobservedTaskException +=
  (e, sender) => LogException(e);

So we register a handler to handle unobserved exceptions and this way they will be "observed". If you need to read more on this, have a look at Jon Skeet's post here.

This problem has made Ayende Rahien to run for the hills.

Respect SynchronizationContext

Uncle Jeffrey Richter tells us that

By default, the CLR automatically causes the first thread's execution context to flow to any helper threads.

And then we also learn that we can use ExecutionContext.SuppressFlow() to suppress flow of the thread context.

Now, what happens when we use ContinueWith()? It turns out unlike standard thread switches, context does not flow (I do not have a reference, if you do please let me know). This will help with improving performance of asynchronous task as we know context switching is expensive (and big part of it is context flow).

So why is it important? It is important because so many developers are used to HttpContext.Current. This context is stored in the thread storage area and passed along at the time of context switching. So if the context does not flow, HttpContext.Current will be null.

SynchronizationContext is a similar (but not same) concept. It is about a state that can be shared and used by different threads at the time of switching. I cannot explain this better than Stephen here. So using Post on SynchronizationContext ensures that the execution of continuation will happen in the same context and not necessarily by the same thread.

So basically the idea is that if you are in a Task pipeline (best example being MessageHandlers in ASP.NET Web API), you need to take responsibility for passing the context along the pipeline.

This is a snippet from ASP.NET Web API Source code that displays the steps. First of all you check to see if current context is null, if it is not then you have to use Post() to flow the context:

SynchronizationContext syncContext = SynchronizationContext.Current;

    TaskCompletionSource<Task<TOuterResult>> tcs = new TaskCompletionSource<Task<TOuterResult>>();

    task.ContinueWith(innerTask =>
    {
        if (innerTask.IsFaulted)
        {
            tcs.TrySetException(innerTask.Exception.InnerExceptions);
        }
        else if (innerTask.IsCanceled || cancellationToken.IsCancellationRequested)
        {
            tcs.TrySetCanceled();
        }
        else
        {
            if (syncContext != null)
            {
                syncContext.Post(state =>
                {
                    try
                    {
                        tcs.TrySetResult(continuation(task));
                    }
                    catch (Exception ex)
                    {
                        tcs.TrySetException(ex);
                    }
                }, state: null);
            }
            else
            {
                tcs.TrySetResult(continuation(task));
            }
        }
    }, runSynchronously ? TaskContinuationOptions.ExecuteSynchronously : TaskContinuationOptions.None);

    return tcs.Task.FastUnwrap();

There is a horrifying fact here. Most of the DelegatingHandler code out there (including some of mine) in various samples around internet do not respect this. Of course, looking at ASP.NET Web API source code reveals that they do indeed take care of this in their TaskHelper implementations and Brad tried to make us aware of it in his blog series. But I think we have not taken enough attention of the implications of ignoring SynchronizationContext.

Now my suggestion is to use the TaskHelpers and its extensions in the ASP.NET Web API (it is open source) or use the one provided in Brad's post. In any case,

Don't use Task for CPU-bound operations

Overhead of asynchronous operations is not negligible. You should only use async if you are doing an IO-bound operation (calling another web service/API, reading a file, reading a lot of data from database or running a slow query). I personally think even for normal IO operations, sync is more performant and scalable.

As we have talked about it here, the point about asynchronous programming on server-side is releasing the thread to be able to serve another request. Tasks are normally served by the CLR thread pool. If server already needs managed threads for its operations, it will be using CLR thread pool too. This means that by doing async operations you could be stealing threads needed for server's normal operations. A classic example is ASP.NET, so you should be careful to use async only if needed.

ContinueWith is Evil!

I think by now you should know why standard ContinueWith can be evil. First of all, it does not flow the context. Also it makes it easy for unboserved exceptions to creep into your code. My suggestion is to use .Then() from ASP.NET Web API's TaskHelpers.

Performance comparison

I think it is still early days - but I must say I would love to do a benchmark to quantify overhead of server-side asynchronous programming. Well if I do, this place will be where the result will first appear :)

So. Do I think I know all about TPL now? Hardly!

Monday, 6 August 2012

CacheCow.Client, using the benefits of HTTP Caching on the client

[Level T2]

Browsers are very sophisticated HTTP machines. We often fail to remember how much of the HTTP spec is implemented by the browsers.

As I have said before, ASP.NET Web API is a very powerful server-side framework but there is a client-side burden in using it or generally implementing a RESTful system - although Web API does not restrict you to a RESTful style.

Because of the client burden, we need more and more client-side libraries to implement lacking features that browser have had for such a long time - one of which is HTTP caching. If you use HttpClient out of the box, it will not implement any caching even though the resources are cacheable. Also all of the work for conditional GET or PUT calls (using if-none-match, etc) or cache validation (if there is must-revalidate) or checking whether your cache is stale has to be done in your own code.

CacheCow is an HTTP caching library for client and server in ASP.NET Web API that does all of above - see my earlier post on that. Storage of the cache is abstracted in ICacheStore and for now we can use in memory implementation (see below). So the features in the client library include:

  • Caching GET responses according to their caching headers
  • Verifying cached items for their staleness
  • Validating cached items if must-revalidate parameter of Cache-Control header is set to true. It will use ETag or Expires whichever exists
  • Making conditional PUT for resources that are cached based on their ETag or expires header, whichever exists

Today I released v0.1.3 of the CacheCow.Client on NuGet. This library would implement advanced HTTP caching with little or no configuration or hassle. All you have to do is to add the CachingHandler as a delegating handler to your HttpClient:

var client = new HttpClient(new DelegatingHandler()
       { 
           InnerHandler = new HttpClientHandler()
       });

This code will create an HttpClient that implements caching and stores the cache in memory. By implementing ICacheStore, you can store the cache in your custom repository. CacheCow is going to have persistent cache stores such as FileCacheStore, SqlCeCacheStore and SqliteCacheStore as a minimum. FileCacheStore will be similar to browser implementation of cache storage. Each of these cache stores will be implemented and released under its own NuGet package. To add an alternative cache store, you need to pass the store as a constructor parameter.

Usage

So in order to use, CacheCow.Client, use package manager in Visual Studio to download and add reference to it:

PM> Install-Package CacheCow.Client

This will also download and add reference to ASP.NET Web API client package, if you have not already added a reference to. Make sure try v0.1.3 or above (by the time of reading this).

After this you just need to create an HttpClient as above and add the CachingHandler as a delegating handler. That's it, you are ready to call services and cache the responses!

Sample

I am working on a sample project but for now, it is easiest to use the code below to call my CarManager Azure website which implements HTTP Caching. The code can be pasted from this GitHub gist.

CacheCow.Client adds a special header to the response which helps with debugging its various features. The header's name is x-cachecow and has a various flags on the operations done on the request/response. So in the code below, we will use this header to demonstrate the features of this library.

var client = new HttpClient(new CachingHandler()
                    {
                        InnerHandler = new HttpClientHandler()
                    }
 );
var initialResponse = client.GetAsync(
      "http://carmanager.azurewebsites.net/api/Car/5").Result;
var initialResponseHeader = initialResponse.Headers.Single(
       x => x.Key == CacheCowHeader.Name).Value.First();
Console.WriteLine(initialResponse.Headers.ETag.Tag);
Console.WriteLine(initialResponseHeader);

And we will see this to be printed:
"02e677a7799e484fb49447f8a600247d"
0.1.3.0;did-not-exist=true
As you can probably figure out, we have the ETag and the CacheCowHeader: first value is the version and did-not-exist means that item did not exist in the cache - which is understandable as this is the first call.

Now let's try this again:

var secondResponse = client.GetAsync("http://carmanager.azurewebsites.net/api/Car/5").Result;
var secondResponseHeader = secondResponse.Headers.Single(
      x => x.Key == CacheCowHeader.Name).Value.First();
Console.WriteLine(secondResponseHeader);

And what will print is:
0.1.3.0;did-not-exist=false;cache-validation-applied=true;retrieved-from-cache=true
So in fact, it existed in the cache, retrieved from the cache and cache validation was applied. Cache validation is the process by which client makes conditional call to retrieve/update a resource only if the condition is met (see Background section in this post). For example, in GET calls it will send the ETag with a if-none-match header to retrieve

If you call a PUT on a resource that is cached, CacheCow.Client will use its ETag or Expires value to make a conditional PUT, unless you set UseConditionalPut property to false.

By-passing caching

There are some cases where you might not want the result be cached or retrieved from the cache regardless of the caching logic. All you have to do is to set the CacheControl header to no-cache or no-store:

var nocacheRequest = new HttpRequestMessage(HttpMethod.Get, 
  "http://carmanager.azurewebsites.net/api/Car/5");
nocacheRequest.Headers.CacheControl = new CacheControlHeaderValue()
 {
  NoCache = true
 };
var nocacheResponse = client.SendAsync(nocacheRequest).Result;
var nocacheResponseHeader = nocacheResponse.Headers.FirstOrDefault(
 x => x.Key == CacheCowHeader.Name);
Console.WriteLine(nocacheResponseHeader);

This will print an empty header since we have by passed the caching.

Last but not least

Thanks for trying out and using CacheCow. Please send me your feedbacks and bugs. Just ping me on twitter or use GitHub's issue tracker.

Sunday, 5 August 2012

Hierarchical routing for ASP.NET Web API: RESTful resource organisation


Introduction and motivation

[Level T3] A question keeps popping up on various forums (StackOverflow, ASP.NET Forums, etc) on how to define routing in ASP.NET Web API. And more important is that there does not seem to be any consensus on how to approach this - well probably since ASP.NET Web API is still in preview (RC).

In this post, I want to have a fresh look at routing in ASP.NET Web API and present a hierarchical model for organising resources.

Background

Resources are important part of RESTful architectual style. In REST all server operations (API) are defined as interactions with resources - in HTTP terms it would mean Verb interactions with URLs. This is in contrast with RPC style where server operations are defined as method calls. 

ASP.NET Web API's routing mirrors ASP.NET MVC routing - similar to some other aspects of Web API which uses ASP.NET MVC as the baseline since it is a familiar model for developers.

Routing was first introduced to ASP.NET by MVC. Later on routing code was integrated into system.web.dll

Routing in ASP.NET MVC is very flat since all routes are defined from the root. This was one of the reasons MVC Areas were introduced to add another layer to top root routes. This can work for MVC but RESTful resource organisation usually requires more nested structure while using conventionally routing can result in configuration burden as well performance penalty.

How routing works

MVC (and Web API) routes are generally designed for a handful (and in most cases less than 100) routes. You were expected to use the default route for most cases and add a few more routes for occasional cases where the pattern was different. This in fact worked in most MVC applications but RESTful resource design requires richer and a more nested structure as we will see below.

Route definition in ASP.NET Web API - as you all have probably used and know - is based on adding route to the RouteCollection:

var routes = GlobalConfiguration.Configuration.Routes; // in web hosted environment
routes.MapHttpRoute(
 name: "DefaultApi",
 routeTemplate: "api/{controller}/{id}",
 defaults: new { id = RouteParameter.Optional }
);

Each mapping will add a new HttpRoute object to the collection. HttpRoute itself is an implementation of IHttpRoute:

public interface IHttpRoute
{
    string RouteTemplate { get; }

    IDictionary<string, object> Defaults { get; }

    IDictionary<string, object> Constraints { get; }

    IDictionary<string, object> DataTokens { get; }

    HttpMessageHandler Handler { get; }

    IHttpRouteData GetRouteData(string virtualPathRoot, HttpRequestMessage request);

    IHttpVirtualPathData GetVirtualPath(HttpRequestMessage request, IDictionary<string, object> values);
}

Most important method is GetRouteData where the matching takes place. Basically if an implementation of IHttpRoute returns a non-null IHttpRouteData then route has matched.

Now how the matching work? Well, RouteCollection will loop through all the routes and calls GetRouteData and the first one to return a non-null will be the matched route for a given URL:

// snippet from HttpRouteCollection
foreach (IHttpRoute route in _collection)
{
    IHttpRouteData routeData = route.GetRouteData(_virtualPathRoot, request);
    if (routeData != null)
    {
        return routeData;
    }
}

Did you notice something fundamental above? If you have 1000 routes and your given URL matches the 1000th route, GetRouteData (which involves complex and heavy string processing if you look at the Web API source code) has to be called for the first 999 routes and fail until it reaches your matched route. With 100, this probably not an issue but with numbers going up, performance will take a hit.

RESTful organisation of resources

This StackOverflow question is one of many questions on the Web API routing according to REST style. One of the main challenges is that, we as Microsoft developers, have used to design our API in an RPC fashion. This habit has been ingrained in our psyche since remoting days and then Web Services and more recently WCF. So it is only natural to fall into the same habit when designing or REST API.

Martin Fowler on his article Steps towards the glory of REST talks about 3 levels of REST implementation. We will be talking about the level 1 and briefly about 2. Level 3 which is the most noble constraint of REST focuses on hypermedia which is beyond the topic of our discussion.

So at the level 1, we design the server to expose its API as resources. In case of HTTP and URLs, this will look like the directory/file structure on the disk - and as such hierarchical. If we mix REST with DDD concepts, each aggregate root of the publicly exposed domain (which is called Server domain, see the post on Client-Server) will be exposed at the root (considering API root is /api/):

/api/{AggregateRoot}/{id}

An example for cars would be /api/Car/1243. As such, we would have a CarController that receives the id through the URL. Now all of this is easily achievable using the default route very much in the good old MVC fashion.

However, the picture gets more complicated when we want to expose the tyres of the car. Let's say a car has Front/Rear Left/Right tyres as such FL, FR, RL and RR. One approach would be to expose the tyres as the aggregate root and have an ID for each tyre:

/api/Tyre/1234567

Well, this will work but tyre is really not a aggregate root since it really would only have a meaning as part of the car. So ideally we should expose them only as part of a car:

/api/Car/1243/Tyre/FR

So we would need a TyreController to handle the scenario and in this case we need a route similar to below:

/api/Car/{carId}/{controller}/{id}

Now this can be written in the generic form of:

/api/{parent}/{parentId}/{controller}/{id}

But as you can see in this case we need to define carId as parentId. Also not all of the domain has similar constraints for ID so this approach will impose a heavy limitation on our routes. On the other hand, we might have more than one nested levels where this can be even more complexed.

In addition to entity levels, we also have operations. Operations also can be defined as a resource, for example in case of a puncture:

POST /api/Car/1243/Tyre/FR/Puncture

This will create a puncture in the tyre. When we say create, we do not mean that necessarily a physical record needs to be created against the tyre. In fact mapping between REST resources and domain models is usually loose. The point is every operation needs to exposed as a resource and all operations to be performed using HTTP verbs. For example, repairing puncture can be represented as:

DELETE /api/Car/1243/Tyre/FR/Puncture

So the operations will complicate the routing even more. If the domain is big and complex, we could end up with many routes. As such, the performance will be hampered and we end up with the burden of defining all these routes at the root level.


So what is the solution? Solution is to turn Web API's flat routing into a hierarchical one.