Tuesday, 19 March 2013

CacheCow Series - Part 0: Getting started and caching basics

[Level T2CacheCow is an implementation of HTTP Caching for ASP.NET Web API. It implements HTTP caching spec (defined in RFC 2616 chapter 13) for both server and client, so you don't have to. You just plug-in the CacheCow component in server and client and the rest is taken care of. You do not have to know details of HTTP caching although it is useful to know some basics.

It is also a flexible and extensible library that instead of pushing a resource organisation approach down your throat, allows you to define your caching strategies and resource organisation using a functional approach. It stays opinionated only about caching where the opinion is HTTP spec. We shall delve into these customisation points in the future posts. But first let's to caching 101.

Caching

Caching is a very overloaded and confused term in ASP.NET. Let's review the cachings available in ASP.NET (I promise not to make it too boring):

HttpRuntime.Cache

This class is used to cache managed objects in SERVER memory using a simple dictionary like key-value approach. You might have used HttpContext.Current.Cache but it is basically nothing but a reference to HttpRuntime.Cache. So how is this different to using a plain dictionary?
  1. You can define expiry so you do not run out of memory and your cache can be refreshed. This is the most important feature.
  2. It is thread-safe
  3. You can never run out of memory with this even with no expiry. As long as it hits a configurable limit, it purges half of the cache.
  4. And also: it stores the data in the unmanaged memory of the worker process and not in heap so does not lengthen GC sweeps.
This cache is useful for keeping objects around without having to acquire them all the time (could be complex calculation or IO cost such as database). No, this is not HTTP caching.

Output cache

In this approach output of a page or a route can be cached on the SERVER so the server code does not get executed. You can define expiry and let the output cached based on parameters. Output cache behind the scene uses HttpRuntime.Cache.

You guessed it right: this is also not HTTP caching.

HTTP Caching

In HTTP Caching, a cacheable resource (or HTTP response in simple terms) gets cached on the CLIENT (and not server). Server is responsible for defining cache policy, expiry and maintaining resource metadata (e.g. ETag, LastModified). Client on the other hand is responsible for actually caching the response, respecting cache expiry directives and validating expired resources.

The beauty of the HTTP caching is that your server will not even be hit when client has cached a resource - unlike other two types of caching discussed. This helps with minimising the scalability, reducing network traffic and allowing for offline client implementation. I believe HTTP caching is the ultimate caching design.

One of the fundamental differences of HTTP caching with other types of caching is that a STALE (or expired) resource is not necessarily unusable and does not get removed automatically. Instead client call server and check if the stale resource is still good to use. If it is, server responds with status 304 (NotModified) otherwise it sends back the new response.

HTTP Caching mechanism can also be used for solving concurrency problems. When a client wants to update a resource (usually using PUT), it can send a conditional call and ask for update to run if and only if the version of the server is the same as the version of the client (based on its ETag or LastModified). This is especially useful with modern distributed systems.

So as you can see, client and server engage in a dance using various headers and things can get really complex. Good new is you don't have to worry about it. CacheCow implements all of this for you out of the box.

CacheCow consists of CacheCow.Server and CacheCow.Component. These two components can be used independently, i.e. each will work regardless the other is used or not - this is the beauty of HTTP Caching as a mixed concern which I explained here.

CacheCow on the server

How easy is it to use CacheCow on the server? You just need to get the CacheCow.Server Nuget package:

PM> Install-Package CacheCow.Server

And then 2 lines of code to be exact (actually 3 but we will see why):

var cachecow = new CachingHandler();
GlobalConfiguration.Configuration.MessageHeandlers.Add(cachecow);

So with this line of code all your resources become cacheable. By default expiry is immediate so client has to validate and double check if it can re-use the cache resource but all of this can be configured (which we will talk about in future posts). Also server now is able to respond to conditional GET or PUT requests.

CacheCow on the server requires a database (technically cache store) for storing cache metadata - but we did not define a store. CacheCow comes with an in-memory database that can be used if you do not specify one. This is good enough for development, testing and small single server scenarios but as you can imagine not scalable. What does it take to define a database? Well, a single line hence the 3rd line and you can choose from SQL Server, Memcached,  RavenDb (Redis to come soon) or build your own by implementing an interface against another database (MySql for example).

So in this way we will have 3 lines (although you may combine first two lines):

var db = new SqlServerEntityTagStore();
var cachecow = new CachingHandler(db);
GlobalConfiguration.Configuration.MessageHeandlers.Add(cachecow);

With these three lines, you have added full feature HTTP caching (ETag generation, responding to conditinal GET and PUT, etc) to your API.

CacheCow on the client

Browsers that we use everyday (and usually even forget to look at as an application) are really amazing HTTP consumption machines. They implement cache storage (file-based) and are capable of handling all client scenarios of HTTP caching.

Client component of CacheCow just does the same but is useful when you want to use HttpClient for consuming HTTP services. Plugging it into your HttpClient is as simple as the server side. First get it:
PM> Install-Package CacheCow.Client
Then

var client = new HttpClient(new CachingHandler()
       { 
           InnerHandler = new HttpClientHandler()
       });


In a similar fashion, client component of CacheCow requires a storage - and this time for the actual responses (i.e. resources). By default, an in-memory storage is used which is OK for testing or single server and small API but for persistent stores you would use from stores such as File, SQL Server, Redis, MongoDB, Memcached aleady implemented or just implement the ICacheStore interface on the top of an alternative storage.

Constructor of the client CachingHandler accepts an implementation of ICacheStore which is your storage of choice.

So now cacheable resource will be cached and CacheCow will do conditional GET and PUT for you.

In the next post, we will look into server component of CacheCow.

3 comments:

  1. Can i set expire date for specific route pattern?

    ReplyDelete
    Replies
    1. Yes you can. There are methods to do it. If you have questions, best place to ask is the github repo

      Delete
  2. Hi... since this is Part 0, is there Part 1,2,3... of this excellent blog post?

    ReplyDelete