Byte Rot: Caching

Showing posts with label Caching. Show all posts

Friday, 14 April 2017

Future of CacheCow and the birth of CacheCore [Update: No CacheCore!]

CacheCow is my most popular OSS project which came to being back in 2012. It started its life as part of WebApiContrib but the need for a full-fledged project supporting different storage options soon led me to create CacheCow - which needed both client and server components.

With the availability of .NET Core which brings its completely new HTTP pipeline, the question has been when or how CacheCow will move to the .NET Core. On the client-side, HttpClient still is the king of making HTTP requests meaning CacheCow.Client will work when the long awaited .NET Standard 2.0 comes along allowing us to reference older .NET libraries. On the server-side, however, it is clear that CacheCow.Server has no place of existence since the pipeline in its entirety has been changed. So what should we do? Create a completely new project for both client-side and server-side or maintain CacheCow.Client (while migrating it to .NET Standard to support the new .NET) and create a new project for the server-side?

I have been thinking hard about this and for the reasons I will explain, I will be creating a completely new project called CacheCore (other contenders were Cache-vNext, CacheDnx and also recently CacheStandard) which will contain both client and server elements. [UPDATE: Please view newer announcement here]

If you would like to know the details (REST discussions, lessons learned including some gory confessions below) read the rest, otherwise feel free to watch the space as things will start to happen.

I am under no illusion that this will require quite some effort. Apart from the learning curve and niggles with tooling, I find the most frustrating aspect to be trying to google anything Core related: internet is now full of discarded evolutionary artifacts in the form of blogs, stackoverflow questions, tutorials and even MSDN documentation - each documenting the journey not the current state. If you think that is a small issue, ask anyone picking .NET Core for the first time. I wish we had a big giant flush and could have flushed all that to /dev/null wiping the history clean - never mind all those many many hours lost. OK, rant over - promise.

CacheCore.Server

As I said, I have confessions to make and one is just coming. When I designed server components of CacheCow as an API middleware, the idea was that they would be used for services that are purely RESTful in the sense that all changes to the state of the resource would be going through the API to carry out the state change. Initially there does not seem anything extra-ordinary about this but I gradually learnt that cache coherency is a very big responsibility for a mere middleware to take on.

First of all, there are many services out there that the underlying data could change without a request passing through the API. What is worse is that even if all state change is via API calls, the change to a resource could invalidate other resources. For example, a PUT request to /cars/123 will invalidate the /cars/123 which is fine, but what about ‘/cars’? So I started thinking about resources in terms of collection and instance and CacheCow.Server started to infer collection and instance resources based on a convention - hence I used Route Pattern concept so the application could configure the cache invalidation, so here route pattern would be /cars/*.

But the problem did not stop there. A change to /cars/123/contracts/456 could invalidate all these URLS: /cars/123/contracts, /cars/123 and possibly /cars - hence CacheCow now needs to walk up the tree and invalidate all those resources. And now to the next level of headaches: a POST /orders/1234 could invalidate customer/987 as there is no apparent connection unless the application tells us - which made me introduce the concept of Linked Route Patterns so the application could configure these relationships. And configuring was of course a pain, and frankly I think except me and a handful other people really did not quite get what I was on about.

Now, I believe it is too much of a responsibility for an HTTP middleware to do cache coherency. As such CacheCore.Server will be a lot simpler: no more entity tag storage, application will decide to use ETag or LastModifieDate for cache coherency and will be responsible for providing these values - although I will provide some helpers. One key difference in this implementation would be a set of tools fitting different scenarios rather than a single HTTP Caching god-class.

To explain this aspect further, HTTP caching is a spectrum of primitives that help you build more scalable (caching) and consistent (concurrency) systems - some of which are basic and used by many, while others have remained obscure and seldom used. Caching and expiry on resources are better known while from my experience, conditional PUT to achieve optimistic concurrency is rarely used - even conditional GET is rarely used by HTTP clients other than browsers. As such, CacheCore will come with three filters starting from the most basic to the most advanced:

BasicCacheFilter: This is the simplest filter which covers returning Cache-Control headers according to expiry configuration, reading the ETag or LastModified from the returned model (or inferring them by using reflection) and handling conditional GET for you. As long as you have a property called ETag or LastModified (or LastModifiedDate, etc) on the model you return from your API, this will work. For conditional GETs to this filter you would not save on any pressure your “database”: API calls will result on retrieval of data to the API so the filter can find the ETag or LastModified and accordingly respond to conditional GET requests.

LookupCacheFilter: This filter improves on the BasicCacheFilter by allowing the application to provide a callback mechanism for the application to look up ETag or LastModified without having to load the full model. Caching almost always gets used on resources where the operation is expensive either in IO or computation costs and this approach helps you to replace loading the full model with a light-weight lookup call. For example, let’s say the resource is /cars/123 and you keep a LastModifiedDate on your cars database and use hash of the LastModifiedDate as the ETag (you could use LastModifiedDate to do cache validation on the date but HTTP date’s precision is sadly up to a second which might not be enough for you). In this case, the filter will enquire the application for ETag or LastModified of the resource and you can call your database and read that value for car:id=123 without loading the whole car - which is going to be a lighter database call. So this filter will do all BasicCacheFilter (and in more efficient way) and will even do conditional PUT for you. What is the problem with this one? Consistency: in terms of conditional PUT, validation is not atomic, e.g. you look up the ETag and you find the condition is met and proceed to update meanwhile data could have changed between the lookup and update (same could also apply to conditional GET but has less serious impact). This is not a problem for everyone hence I think this filter hits the sweet spot for simplicity and effectiveness.

StrongConsistencyCacheFilter: This is basically the same as above but maintains airtight consistency by allowing the application to implement atomic conditional GET and PUT - which means application has to do more.

I have plans for these to be GET or PUT specific since actions are usually designed as such.
Now you might ask, why CacheCore is a filter and not a middleware? If you remember, CacheCow.Server was a DelegatingHandler (akin to an ASP.NET Core middleware). Well, here is another lesson learnt: caching is a highly localised concern, it is a mistake to implement it as a global HTTP intermediary.

CacheCore.Client

Considering the client story in .NET Core for HTTP has not been drastically changed, it is fair to assume CacheCow.Client can still be used.

That is true, however, there are a few reasons I would like to start afresh. First of all, CacheCow’s inception and the main of the codebase was designed when .NET yet did not have an await keyword. This resulted in a .ContinueWith() soup which was hard to read and difficult to maintain. On the other hand, some interfaces supported async while others did not, resulting in breaking the async all the way rule. Also I had in mind for the storage to be clever about how much storage it uses per site and implement LRU while many underlying storages did not provide the primitive to do so - and frankly in this 5 years I have never needed it.

I think it is time to get rid of these shortcomings hence there will be a new client project too.

Future of CacheCow.Server and CacheCow.Client

It would be naive to think everyone will move to .NET Core straightaway. In fact, with .NET Standard 2.0, Microsoft has shown to have realised there needs to be a better interoperability between the classic .NET and the .NET Core. Apart from interoperability, I think people will carry on using and building .NET APIs for another few years.

Fore these reasons, I will carry on supporting CacheCow and releasing bug fixes, etc. Thanks for helping it improve by using it, reporting issues and sending bug fixes.

Saturday, 19 April 2014

CacheCow 0.5 Released

[Level T1]

OK, finally CacheCow 0.5 (actually 0.5.1 as 0.5 was released by mistake and pulled out quickly) is out. So here is the list of changes, some of which are breaking. Breaking changes, however, are very easy to fix and if you do not need new features, you can happily stay at 0.4 stable versions. If you have moved to 0.5 and you see issues, please let me know ASAP. For more information, see my other post on Alpha release.

So here are the features:

Azure Caching for CacheCow Server and Client

Thanks to Ugo Lattanzi, we have now CacheCow storage in Azure Caching. This is both client and server. Considering calls to Azure Cache takes around 1ms (based on some ad hoc tests I have done, do not quote this as a proper benchmark), this makes a really good option if you have deployed your application in Azure. You just need to specify the cacheName, otherwise "default" cache will be used.

Complete re-design of cache invalidation

I have now put some sense into cache invalidation. Point is that strong ETag is generated for a particular representation of the resource while cache invalidation happens on the resource including all its representation. For example, if you send application/xml representation of the resource, ETag is generated for this particular representation. As such, application/json representation will get its own ETag. However, on PUT both need to be invalidated. On the other hand, in case of PUT or DELETE on a resource (let's say /api/customer/123) the collection resource (/api/customer) needs to be invalidated since the collection will be different.

But how we would find out if a resource is collection or single? I have implemented some logic that infers whether the item is collection or not - and this is based on common and conventional routes design in ASP.NET Web API. If your routing is very custom this will not work.

Another aspect is hierarchical routes. When a resource is invalidated, its parent resource will be invalidated as well. For example in case of PUT /api/customer/123/order/456 , customer resource will change too - if orders are returned as part of customer. So in this case, not only the order collection resource but the customer needs to be invalidated. This is currently done in CacheCow.Server and will work for conventional routes.

Using MemoryCache for InMemory stores (both server and client)

Previously I have been using dictionaries for InMemory stores. The problem with Dictionary is that it just grows until system runs out of memory while MemoryCache limits its use of memory when system deals with a memory pressure.

Dependency of CachingHandler to HttpConfiguration

As I announced before, you need to pass the configuration to he CachingHandler on server. This should be an easy change but a breaking one.

Future roadmap and 0.6

I think we are now in a point where CacheCow requires a re-write for the features I want to introduce. One of such features is enabling Content-based ETag generation and cache invalidation. Currently Content-based ETag generation is there and can be used now but without content-based invalidation is not much use especially that in fact this now generates a weak ETag. Considering the changes required for achieving this, almost a re-write is required.

Please keep the feedbacks coming. Thanks for using and supporting CacheCow!

Sunday, 1 December 2013

CacheCow 0.5.0-alpha released for community review: new features and breaking changes

[Level T3]

CacheCow 0.5 was long time coming but with everything that was happening, I am already 1 month behind. So I have now released it as alpha so I can get some feedback from teh community.

As some of you know, a few API improvements had been postponed due to breaking changes. On the other hand, there were a few areas I really needed to nail down - the most important one being resource organisation. While resource organisation can be anything, most common approach is to stick with the standards Web API routing, i.e. "/api/{controller}/{id}" where api/{controller} is a collection resource and api/{controller}/{id} is an instance resource.

I will go through some of the new features and changes and really appreciate if you spend a bit of time reviewing and sending me feedbacks.

No more dependency on WebHost package

OK, this has been asked by Darrel and Tugberk and only made sense. So in version 0.4 I added dependency to HttpConfiguration but in order not to break the compatibility, it would use GlobalConfiguration.Configuration in WebHost by default. This has been removed now so an instance of HttpConfiguration needs to be passed in. This is a breaking change but fixing it should be a minimal change in your code from:

var handler = new CachingHandler(entityTagStore);
config.MessageHandlers.Add(handler);

You would write:

var handler = new CachingHandler(config, entityTagStore);
config.MessageHandlers.Add(handler);

CacheKeyGenerator is gone

CacheKeyGenerator was providing a flexibility for generating CacheKey from resource and Vary headers. Reality is, there is no need to provide extensibility points. This is part of the framework which needs to be internal. At the end of the day, Vary headers and URL of the resource are important for generating the CacheKey and that is implemented inside CacheKey. Exposing CacheKeyGenerator had lead to some misuses and confusions surrounding its use cases and had to go.

RoutePatternProvider is reborn

A lot of work has been focused on optimising RoutePatternProvider and possible use cases. First of all, signature has been changed to receive the whole request rather than bits and pieces (URL and value of Vary headers as a SelectMany result).

So the work basically looked at the meaning of Cache Key. Basically we have two connected concepts: representation and resource. A resource can have many representations (XML, image, JSON, different encoding or languages). In terms of caching, representations are identified by a strong ETag while resource will have a weak ETag since it cannot be guaranteed that all its representations are byte-by-byte equal - and in fact they are not.

A resource and its representations

It is important to note that each representation needs to be cached separately, however, once the resource is changed, the cache for all representations are invalidated. This was the fundamental area missing in CacheCow implementation and has been added now. Basically, IEntityTagStore has a new method: RemoveResource. This method is pivotal and gets called when a resource is changed.

The change in RoutePatternProvider is not just a delegate, it is now an interface: IRoutePatternProvider. Reason for the change to delegate is that we have a new method in there too: GetLinkedRoutePatterns. So what is a RoutePattern?

RoutePattern is basically similar to a route and its definition conventions in ASP.NET, however, it is meant for resources and their relationships. I will write a separate post on this but basically RoutePattern is in relation to collection resources and instance resources. Let's imagine a car API hosted at http://server/api. In this case, http://server/api/car refers to all cars while http://server/api/car/123 refers to a car with Id of 123. In order to represent this, we use * sign for collection resources and + sign for instance:

http://server/api/car         collection         http://server/api/car/*
http://server/api/car/123     instance           http://server/api/car/+

Now normally:

POST to collection invalidates collection
POST to instance invalidates both instance and collection
PUT or PATCH to instance invalidates both instance and collection
DELETE to instance invalidates both instance and collection

So in brief, change to collection invalidates collection and change to instance invalidates both.

In a hierarchical resource structure this can get even more complex. If /api/parent/123 gets deleted, /api/parent/123/* (collection) and /api/parent/123/+ will most likely get invalidated since they do not exist any more. However, implementing this without making some major assumptions is not possible hence its implementation is left to the CacheCow users to tailor for their own API.

However, since a flat resource organisation (defining all resources using "/api/{controller}/{id}") is quite common, I have implemented IRoutePatternProvider for flat structures in ConventionalRoutePatternProvider which is the default.

Prefixing SQL Server Script names with Server_ and Client_

The fact was that SQL Server EntityTagStore (on the server package) and CacheStore (on the client package) had some stored procedure name clashes preventing use of the same database for both client and server component. This was requested by Ben Foster and has been done now.

Please provide feedback

I hope this makes sense and waiting for your feedbacks. Please ping me on twitter or ask your questions in Github by raising issue.

Happy coding ...

Sunday, 13 October 2013

CacheCow update - Moving to MemoryCache for in-memory stores on both Client and Server

As you might know, CacheCow is a framework that implements HTTP Caching for ASP.NET Web API both for the client and server. If you are familiar with it, feel free to jump to the section "What's the update?".

HTTP Caching defined by HTTP Specification (currently at version 1.1 according to RFC 2616 although the work HTTP 2.0 is very close to finish) is an important feature of HTTP resulting in scalability of web as we know it.

It is important to remember that resources are cached on the client, and not on the server. In ASP.NET we can use Output Caching and System.Web.Caching.Cache to cache items on the Server but HTTP caching is different in the fact that resources get stored on the client. A resource is

either not cacheable (identified by no-store value in the Cache-Control header)
or can be only cached by the client (identified by private value in the Cache-Control header)
or can be cached by the client and all intermediaries (identified by public value in the Cache-Control header)

Currently CacheCow.Client looks after the storage of the items in an implementation of ICacheSore interface. Currently these implementations exist:

In-memory
Memcached
Redis
SQL Server
File-based

On the other hand, server also needs to store cache metadata. This is normally information such as Last-Modified and ETag of the resources. You configure CacheCow.Server to use one of several implementations of IEntityTagStore on the server. Currently these implementations exist:

In-memory
Memcached
RavenDB
MongoDB
SQL Server

So what is the change?

In the latest release of CacheCow which stands at 0.4.12, in-memory implementation of ICacheStore and IEntityTagStore have been changed from ConcurrentDictionary<TKey, TValue> based to MemoryCache based. The problem with dictionary-based implementation is that the store just grows and the items will never be freed. MemoryCache, on the other hand, is designed to be able to keep its memory usage to a threshold and expel old or least frequently used items out of the cache.

To use in-memory stores, all you have to do is to use the CachingHandler with default constructor which results in-memory stores to be used, as they are default. The good thing with these implementations are ability to set a maximum memory limit. This is achieved using app.config to web.config of your application:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <system.runtime.caching>
    <memoryCache>
      <namedCaches>
        <add name="TheName" cacheMemoryLimitMegabytes="40" pollingInterval="00:05:00" />
      </namedCaches>
    </memoryCache>
  </system.runtime.caching>
</configuration>

Above configuration will set the memory usage limit to 40MB and this limit will be checked every 5 minutes. CacheCow uses these names for its MemoryCache objects:

CacheCow.Client: "###InMemoryCacheStore_###"
CacheCow.Server uses 2 MemoryCache objects:

Storing ETag and LastModified: "###_InMemoryEntityTagStore_ETag_###"
Storing route patterns: "###_InMemoryEntityTagStore_RoutePattern_###"

You can use the configuration to limit the memory used by these MemoryCache objects. Since MemoryCache is in-memory and is in the same AppDomain as the application, it provides the fastest and most efficient storage.

Use in-memory storage wherever you can and surely it provides better performance compared to the likes of SQL Server.

The only caveat with using these implementations is that even though you may set a memory limit, memory can increase above your threshold. This is not an issue and is related to garbage collection and using GC.Collect() the memory will return back to the actual usage.

Tuesday, 19 March 2013

CacheCow Series - Part 0: Getting started and caching basics

[Level T2] CacheCow is an implementation of HTTP Caching for ASP.NET Web API. It implements HTTP caching spec (defined in RFC 2616 chapter 13) for both server and client, so you don't have to. You just plug-in the CacheCow component in server and client and the rest is taken care of. You do not have to know details of HTTP caching although it is useful to know some basics.

It is also a flexible and extensible library that instead of pushing a resource organisation approach down your throat, allows you to define your caching strategies and resource organisation using a functional approach. It stays opinionated only about caching where the opinion is HTTP spec. We shall delve into these customisation points in the future posts. But first let's to caching 101.

Caching

Caching is a very overloaded and confused term in ASP.NET. Let's review the cachings available in ASP.NET (I promise not to make it too boring):

HttpRuntime.Cache

This class is used to cache managed objects in SERVER memory using a simple dictionary like key-value approach. You might have used HttpContext.Current.Cache but it is basically nothing but a reference to HttpRuntime.Cache. So how is this different to using a plain dictionary?

You can define expiry so you do not run out of memory and your cache can be refreshed. This is the most important feature.
It is thread-safe
You can never run out of memory with this even with no expiry. As long as it hits a configurable limit, it purges half of the cache.
And also: it stores the data in the unmanaged memory of the worker process and not in heap so does not lengthen GC sweeps.

This cache is useful for keeping objects around without having to acquire them all the time (could be complex calculation or IO cost such as database). No, this is not HTTP caching.

Output cache

In this approach output of a page or a route can be cached on the SERVER so the server code does not get executed. You can define expiry and let the output cached based on parameters. Output cache behind the scene uses HttpRuntime.Cache.

You guessed it right: this is also not HTTP caching.

HTTP Caching

In HTTP Caching, a cacheable resource (or HTTP response in simple terms) gets cached on the CLIENT (and not server). Server is responsible for defining cache policy, expiry and maintaining resource metadata (e.g. ETag, LastModified). Client on the other hand is responsible for actually caching the response, respecting cache expiry directives and validating expired resources.

The beauty of the HTTP caching is that your server will not even be hit when client has cached a resource - unlike other two types of caching discussed. This helps with minimising the scalability, reducing network traffic and allowing for offline client implementation. I believe HTTP caching is the ultimate caching design.

One of the fundamental differences of HTTP caching with other types of caching is that a STALE (or expired) resource is not necessarily unusable and does not get removed automatically. Instead client call server and check if the stale resource is still good to use. If it is, server responds with status 304 (NotModified) otherwise it sends back the new response.

HTTP Caching mechanism can also be used for solving concurrency problems. When a client wants to update a resource (usually using PUT), it can send a conditional call and ask for update to run if and only if the version of the server is the same as the version of the client (based on its ETag or LastModified). This is especially useful with modern distributed systems.

So as you can see, client and server engage in a dance using various headers and things can get really complex. Good new is you don't have to worry about it. CacheCow implements all of this for you out of the box.

CacheCow consists of CacheCow.Server and CacheCow.Component. These two components can be used independently, i.e. each will work regardless the other is used or not - this is the beauty of HTTP Caching as a mixed concern which I explained here.

CacheCow on the server

How easy is it to use CacheCow on the server? You just need to get the CacheCow.Server Nuget package:

PM> Install-Package CacheCow.Server

And then 2 lines of code to be exact (actually 3 but we will see why):

var cachecow = new CachingHandler();
GlobalConfiguration.Configuration.MessageHeandlers.Add(cachecow);

So with this line of code all your resources become cacheable. By default expiry is immediate so client has to validate and double check if it can re-use the cache resource but all of this can be configured (which we will talk about in future posts). Also server now is able to respond to conditional GET or PUT requests.

CacheCow on the server requires a database (technically cache store) for storing cache metadata - but we did not define a store. CacheCow comes with an in-memory database that can be used if you do not specify one. This is good enough for development, testing and small single server scenarios but as you can imagine not scalable. What does it take to define a database? Well, a single line hence the 3rd line and you can choose from SQL Server, Memcached, RavenDb (Redis to come soon) or build your own by implementing an interface against another database (MySql for example).

So in this way we will have 3 lines (although you may combine first two lines):

var db = new SqlServerEntityTagStore();
var cachecow = new CachingHandler(db);
GlobalConfiguration.Configuration.MessageHeandlers.Add(cachecow);

With these three lines, you have added full feature HTTP caching (ETag generation, responding to conditinal GET and PUT, etc) to your API.

CacheCow on the client

Browsers that we use everyday (and usually even forget to look at as an application) are really amazing HTTP consumption machines. They implement cache storage (file-based) and are capable of handling all client scenarios of HTTP caching.

Client component of CacheCow just does the same but is useful when you want to use HttpClient for consuming HTTP services. Plugging it into your HttpClient is as simple as the server side. First get it:

PM> Install-Package CacheCow.Client

Then

var client = new HttpClient(new CachingHandler()
       { 
           InnerHandler = new HttpClientHandler()
       });

In a similar fashion, client component of CacheCow requires a storage - and this time for the actual responses (i.e. resources). By default, an in-memory storage is used which is OK for testing or single server and small API but for persistent stores you would use from stores such as File, SQL Server, Redis, MongoDB, Memcached aleady implemented or just implement the ICacheStore interface on the top of an alternative storage.

Constructor of the client CachingHandler accepts an implementation of ICacheStore which is your storage of choice.

So now cacheable resource will be cached and CacheCow will do conditional GET and PUT for you.

In the next post, we will look into server component of CacheCow.

Monday, 18 March 2013

CacheCow 0.4 released: new features and a breaking change

Version 0.4 is out and with it a couple of features such as attribute-based cache control and cache refresh (see below). For the first time I felt that I have got to write a few notes about this version, least of which because of one breaking change in this release - although it is not likely to break your code as I explain further. Changes have been in the server components.

I and other contributors have been working on CacheCow for the last 8 months. I thought with a couple of posts I have explained the usage of CacheCow. But I now feel that with concept counts increasing, I need to start a series on CacheCow. Before doing that I am going to explain new concepts and the breaking change.

Breaking change

The breaking change was a change in the signature of CacheControlHeaderProvider from Func<HttpRequestMessage, CacheControlHeaderValue> to Func<HttpRequestMessage, HttpConfiguration, CacheControlHeaderValue> to accept HttpConfiguration.

If you have provided your own CacheControlHeaderProvider, you need to provide HttpConfiguration as well - which should be very easy to fix whether web-host or self-host.

Cache Control Policy

So defining cache policy against resource have been by setting the value of CacheControlHeaderProvider which you would define whether a resource is cacheable and if it is, what is the expiry (and other related stuff):

public Func<HttpRequestMessage, HttpConfiguration, CacheControlHeaderValue> CacheControlHeaderProvider { get; set; }

So by default CacheCow sets Func to return a header value for private caching with immediate expiry for all resources:

CacheControlHeaderProvider = (request, cfg) => new CacheControlHeaderValue()
{
 Private = true,
 MustRevalidate = true,
 NoTransform = true,
 MaxAge = TimeSpan.Zero
};

Immediate expiry actually means that the client can use the expired resource as long as it validates the resource using a conditional GET - as explained before here.

But what if you want to individualise cache policy for each resource? We could use per-route handlers but that is not ideal and generally it depends on the resource organisation approach. I have explained in my previous post that resource organisation is one of the areas that needs to be looked at. But this is not within the scope of CacheCow. We are looking into solving this as part of another project while ASP.NET team are also looking into this. So I have decoupled the resource organisation project from CacheCow.

Having said that, in the meantime, I am going to provide some help with doing cache policy set up less painful. This means that CacheCow will come with a few pre-defined functions that help you with defining your cache control policy.

Good news! Cache policy definition using attributes

So now you can define your cache policy against your actions or controllers or both - although action attribute always takes precedence over controller. Using the popular ASP.NET Web API sample:

    [HttpCacheControlPolicy(true, 100)]
    public class ValuesController : ApiController
    {

        public IEnumerable<string> Get()
        {
            return new[] { "cache", "cow" };
        }

        [HttpCacheControlPolicy(true, 120)]
        public string Get(int id)
        {
            return "cache cow... mooowwwww";
        }

So GET call to the first action (/api/Values) will have a max-age of 100 while GET to the second action (e.g. /api/Values/1) will return a max-age of 120.

In order to set this up, all you have to do is to set the CacheControlHeaderProvider property of your CachingHandler to GetCacheControl method of an instance of AttributeBasedCacheControlPolicy:

cachingHandler.CacheControlHeaderProvider = new AttributeBasedCacheControlPolicy(
 new CacheControlHeaderValue()
  {
   NoStore = true
  }).GetCacheControl;

So in above we have passed a default caching policy of no-caching. This table defines which attribute value (or default provided in the constructor) is used:

Cache Refresh Policy

CacheCow works best when HTTP API is actually a REST API. In other words, it uses uniform interface (i.e. HTTP Verbs) to modify resources and this means that the caching handler will get the opportunity to invalidate and remove the cache when POST, PUT, DELETE or PATCH is used.

Problem is commonly HTTP API sits on the top of a legacy system where it has not control over modifications of resources and acts as a data provider. In such a case, the API will not be notified on resource changes and application will be responsible for removing cache metadata directly on the EntityTagStore used. And this is not always possible.

I am providing a solution for defining a time based cache refresh policy using attributes in a very similar fashion to Cache Control Policy - even the above table applies. Removal of items from cache store on the server happens upon the first request after the refresh interval has passed not immediately after interval. So we add the refresh policy provider:

cachingHandler.CacheRefreshPolicyProvider = new AttributeBasedCacheRefreshPolicy(TimeSpan.FromSeconds(5 * 60 * 60))
    .GetCacheRefreshPolicy;

We have defined 5 hour refresh policy as default. And we override using controller or action attributes.

Future posts

As promised, next few posts will be a CacheCow walk-through.

Sunday, 10 February 2013

CacheCow project - Current status and future roadmap

For those of you who might not know, CacheCow is an open source project that I started around 8 months ago to address the server-side and client-side caching in the new Microsoft HTTP stack - also known as ASP.NET Web API. I have been fortunate to get regular help, feedback and contributions from geek friends (Tugberk, Alex Zeitler, Sayed Hashimi) and I am pleased to announce that version 0.3 is out now. As part of this release, cache storage implementation for Memcached has been released.

There are currently 9 packages available - have a look here.

CacheCow.Server

This is the server side component of CacheCow which works as a DelegatingHandler and looks at the request/response and implements server-side caching according to HTTP spec. This includes inspecting headers, generating ETag and adding headers, responding to conditional GET and PUT, etc. The idea is that you should only have to declare your caching strategy against the handler and it should take care of the rest.

Storing ETag and other important headers such as LatsModified (which I collectively call Cache Metadata) against the resources requires a persistent or in-memory storage. CacheCow.Server by default comes with an In-Memory implementation of the storage. However, these other storages are currently available:

MongoDb
RavenDB
SQL Server

CacheCow.Client

As part of ASP.NET Web API, a new shiny HttpClient has been developed which uses the same HTTP pipeline model as the server. While you might be consuming Web API using JavaScript on the browser, I see that more often than not, this will be consumed by HttpClient on the server as more and more APIs move to Web API from WCF and classic (ASMX) web services.

As such, I believe the client side story of the caching is an important - and somehow neglected - one. So you will find a client-side CachingHandler in CacheCow.Client. Now on the client instead of storing Cache Metadata, we need to store actual responses. There are even more storage implementations available on the client:

Redis
File-based storage
SQL Server
Memcached

Future roadmap

There is a lot of work there to be done. [If you would like to contribute, please let me know]

Basically the plan is to implement below in the next 4-6 months:

Turning all storage API to asynchronous. So IEntityTageStore and ICacheStore will turn completely async.
Implementing total and per-domain storage quota for the CacheCow.Client. Part of this work has already been done but need to be consolidated.
Implement all storages for both client and server:

Server-side:

Redis
Memcached

Client-side

RavenDb
MongoDB

Implement resource organisation so declaring cache plan for a resource is easy (currently is not)

So watch this space. If you would like to contribute, please message me in twitter.

Saturday, 22 September 2012

Take your Web API service consumption up to 11 with CacheCow.Client

ASP.NET Web API is here and a lot of teams have already started building software using it. If you have followed this and other blogs on webapibloggers.com, you probably have seen many various possibilities and avenues this framework brings for designing and building clean and scalable services.

ASP.NET Web API exposes all the goodness of HTTP. Caching is an important feature of HTTP and ASP.NET Web API allows for building services and clients that take advantage of this feature. I, along with a few friends in the Web API community, have been busy building caching extensions for Web API in a project called CacheCow which is hosted on GitHub.

CacheCow has two separate components: server and client. These will be used independently by service providers (server) and their consumers (client).

Server component allows for easy handling of HTTP caching scenarios on server by generating ETag, responding to validation of cache (see earlier posts on this subject, especially this and for full list here), cache invalidation and storage of cache metadata. Storage of cache metadata is possible in various stores, currently in-memory and SQL Server have been implemented and RavenDB and Redis is on the pipeline [UPDATE: RavenDB is implemented and NuGet package is available here]. Since storage has been abstracted away, any storage mechanism can be plugged in without making any server changes.

Client component looks after making cache-aware requests, cache validation and cache storage. Currently in-memory and file-based storage is available but other stores such as Redis, SQL Server, MongoDB and RavenDB are in the pipeline. Since storage has been abstracted away, any storage mechanism can be plugged in without making any client changes. One of important features of storage is total and per-site quota.

It is important to note that while clients can be browsers or native Apps (WPF, Silverlight, iOS, Android, etc), arguably more often than not they will be server components themselves. For example, an ASP.NET web site can call services of an ASP.NET Web API server. Also middleware components could similarly use resources exposed by Web API. As such it is very crucial that cache storage solutions are performant, scalable and configurable.

In this post, I will look into CacheCow.Client a little but more. For more info, you can read previous posts on the topic in this blog.

CacheCow.Client alternatives

The only alternative to CacheCow.Client (that I am aware of) is using WinINET caching. Internet Explorer also uses this so the cache store will be the same. This is basically windows' HTTP request stack which has been exposed in .NET Framework since v 2.0 through WebRequest:

RequestCachePolicy policy = 
        new RequestCachePolicy( RequestCacheLevel.Default);
WebRequest request = WebRequest.Create(uri);
request.CachePolicy = policy;
WebResponse response = request.GetResponse();

As you can see, we can define a cache policy which will be applied to the request and according to the policy, Internet Explorer cache is used. Cache policy has a few possible values that are defined here. Notable values include:

CacheOnly: retrieves the request only from cache
BypassCache: does not use cache at all and goes straight to the server
CacheIfAvailable: retrieves from local or intermediate cache if resource available otherwise retrieve from server
Default: Similar to previous but current cache policy takes effect

This same mechanism is now exposed in HttpClient but basically is built on the top of WebRequest. Henrik fully covers this feature in his blog here.

Basically in order to use WinINET caching with the new Web API stack, you need to create an HttpClient but provide WebRequestHandler as the MessageHandler:

HttpClient client = new HttpClient(new WebRequestHandler()
                           {
                               CachePolicy = new RequestCachePolicy( RequestCacheLevel.Default)
                           });
// this is a sample. It is not advised to use .Result since can lead to deadlock!
var httpResponseMessage = client.GetAsync("http://carmanager.softxnet.co.uk/api/car/3").Result;
var httpResponseMessage2 = client.GetAsync("http://carmanager.softxnet.co.uk/api/car/3").Result;

Using this feature, you can enable caching with little coding on the client.

Why I would choose CacheCow.Client rather than WinINET

Because it goes to 11! As we saw, it is very easy to get started with caching in HttpClient. But as we noted, it is very likely that HttpClient could be used in a server context hence having a reliable and scalable solution is very important in production.

Here are a few advantages of CacheCow.Client over WinINET (or rather disadvantages of WinINET):

1. Caching will be shared with Internet Explorer

In a production scenario, you need an implementation which is predictable and reliable. If someone uses Internet Explorer on the machine, storage area for your application's resources will be taken by just simple browsing. This can lead Internet Explorer to flush application resources in order to store resources for the browsing session.

2. You have little control over quota

With CacheCow.Client, you can define a global and a per-site quota for storage of resources while such feature is not accessible (although there could be some registry entries for changing these variables) in WinINET caching. Also these variables could be overwritten by installation of a newer version of Internet Explorer.

3. Cache is local to the machine and cannot be shared across servers

In a production scenario, it is desirable to be able to store caches in a central store so network traffic and requests could be limited while with WinINET caching, each server will use its own local cache store.

4. WinINET is file-based

With WinINET, cache is stored in a file location while for a high-throughput production environment, robust caching using solutions such as Redis is required. CacheCow client by abstracting the storage can use any number of storage mechanisms such as Redis, MongoDB, RavenDB, etc.

5. CachePolicy is global for the HttpClient instance

Sometimes you might need to bypass caching. With WinINET, this has to be done with changing policy at the client level which applies across all requests for that HttpClient while CacheCow.Client respects will not use cached resources if you set CacheControl header of the request to no-cache. This basically recommended implementation based on HTTP specification (RFC2616).

6. With WinINET you do not know if request was retrieved from cache

With WinINET, there is no way to tell if response was retrieved from the cache or origin server. CacheCow.Client provides x-cachecow header which provides various information which can be used for debugging and troubleshooting scenarios.

Introducing CacheCow.Client.FileCacheStore

Last week I finished first version of a persistent cache store which is file based. This is available using NuGet and (package name is CacheCow.Client.FileCacheStore) and the code available at GitHub.

Using this persistent store is very easy. After getting the package from NuGet, create an HttpCient while as a delegating handler, pass CachingHandler (covered before here) while setting the store to a new instance of FileStore. While creating a FileStore, you need to specify a folder for storing the cached resources:

var httpClient = new HttpClient(
 new CachingHandler(
 new FileStore("c:\\Cache"))
{
 InnerHandler = new HttpClientHandler()
});

That is all you have to do! Now all your requests will store cacheable resources in a file-based persistent store.

Currently for quota it uses default values but I am in the process of exposing values so you can configure quota.

CacheCow roadmap

After exposing quota settings, I will be working on CacheCow.Client.RedisCacheStore for a high throughput production level cache storage.

Please keep me posted by your comments, feedback and raising bug/issues on the GitHub page. You are awesome!

Sunday, 22 July 2012

Introducing CacheCow: An HTTP caching framework for server and client

CacheCow

[Level T2] This is a short post to introduce CacheCow, an Open Source framework for HTTP caching on the client and server in ASP.NET Web API.

As some of you probably know, I have been working on caching for a while. If you go back to my post on CachingHandler, you will see that I contributed server-side HTTP caching implementation to the WebApiContrib project and included samples and tests.

However, I realised that even more important part of the caching needs to be implemented on the client. Also implementations of the IEntityTagStore on various databases (in-memory and persisted) and client-side's cache storage all need their own project so this is bigger than just a feature on WebApiContrib. As such, I have decided to start a new project and port the server-side from WebApiContrib. [Please bear in mind, the code in the WebApiContrib will be maintained and supported so if you are using it and have problems or experience bugs, please ping me in twitter or GitHub.]

So CacheCow framework has been born. The name itself is a word game with Cash Cow, meaning it does the heavy lifting for caching with minimal set up hence promises good return on investment :). Project is open source and hosted on GitHub. And yes, I do accept pull requests; Tugberk has done a great job (also with some help from Sayed Ibrahim Hashimi which I am so grateful for) and automated the whole build and NuGet package generation and his PR was merged pretty much immediately. But please contact me before-hand on the work you would like to do - there is ton of interesting work to do.

How to use CacheCow.Server

At the moment, only server-side CacheCow is ready for use. All you need to do is to use NuGet to get the package:

PM> Install-Package CacheCow.Server

This will add the CacheCow.Server and CacheCow.Common DLLs and the rest is all the same as the CachingHandler post and samples. Just add CachingHandler to the config:

GlobalConfiguration.Configuration.MessageHandlers.Add(new CachingHandler());

This will add this handler with all the default settings and will store the cache state in the memory. There are many dials that you can turn and configure the handler according to your resource organisation - just see the sample in WebApiContrib. This sample connects to the CarManager sample and tests various scenarios for a fairly complex resource organisation.

How to use CacheCow.Server.EntityTagStore.SqlServer

As I pointed out above, by default cache state is stored in memory. This is OK for single server or test scenarios but in case of a server farm you would like the cache state to be maintained for whole farm and when a resource cache invalidated, it is done for all servers. In this case you need a central EntityTagStore (cache state store).

Building a cache state store is pretty easy and all you have to do is to implement IEntityTagStore interface which has 5 methods. Since cache might be invalidated not just for a CacheKey (previously called EntityTagKey) but also for a RoutePattern (see CachingHandler post), key-value stores are not rich enough to provide this but conventional databases can be used.

So I have implemented this for SQL Server. In order to get this EntityTagStore, just use NuGet:

PM> Install-Package CacheCow.Server.EntityTagStore.SqlServer

This will download the DLL and also a script file named script.sql located in <project root>\packages\CacheCow.Server.EntityTagStore.SqlServer.0.1.0\scripts

So create a database and run this script against it, and you will get one table and several stored procedures. Then in order to use SqlServerEntityTagStore, create an instance pass it to the CachingHandler constructor. Default constructor relies on a connection string named "EntityTagStore" to be there in your web.config and pointing to your database.

GlobalConfiguration.Configuration.MessageHandlers.Add(
 new CachingHandler(new SqlServerEntityTagStore()));

Alternatively, pass the connection string to the constructor. That is all you have to do use SQL Server EntityTagStore.

Roadmap

I am working on the client CachingHandler to be used with HttpClient. This will initially come with an InMemoryCacheStore but then various persistent cache store implementations can be done (for example file-based, SQL CE, etc)

Also CarManager sample needs to be ported for CacheCow which I am hoping to do very soon - although the old sample does work well.

Any question or comment please ping me on twitter (@aliostad) or GitHub.

Monday, 25 June 2012

Using the CachingHandler in ASP.NET Web API

Introduction

[NOTE: Please also see updated post and the new framework called CacheCow.
This class has been removed from WebApiContrib code and NO LONGER SUPPORTED]

[Level T3] Caching is an important concept in HTTP and comprises a sizeable chunk of the spec. ASP.NET Web API exposes full goodness of the HTTP spec and caching can be implemented as a message handler explained in Part 6 and 7. I have implemented a CachingHandler and contributed the code to the WebApiContrib in GitHub.

This post will have two sections: first a primer on HTTP caching and then how to use the handler. This code uses ASP.NET Web API RC and with .NET 4 (VS 2010 or 2012).

Background

NOTE: This topic is fairly advanced and complex. You do not necessarily need to know all this in order to use CachingHandler and are welcomed to skip it but more in-depth knowledge of HTTP could go a long way.

Caching is a very important feature in the HTTP. RFC 2616 extensively covers this topic in section 13. Review of all the spec is beyond the scope of this post but we will briefly touch on the subject.

First of all let's get this straight: this is not about putting an object in memory as we do with HttpRuntime.Cache. In fact server does not store anything (more on this below), it only tells the client (or mid-stream cache or proxy servers) what can be cached, how long and validates the caching.

Basically, HTTP provides semantics and mechanism for the origin server, client and mid-stream proxy/cache servers to effectively reduce traffic by validating the version of the resource they have against the server and retrieve the resource only if it has changed. This process is usually referred to as cache validation.

In HTTP 1.0, server would return a LastModified header with the resource. A client/user agent could use this value and send it in the If-Modified-Since header in the subsequent GET requests. If the resource was not changed, server would respond with a 304 (Not Modified) otherwise the resource would be sent back with a new LastModified header. This would be also useful in PUT scenarios where a client sends a PUT request to update a resource only if it has not changed: a If-Unmodified-Since header is sent. If the resource has not changed, server fulfils the request and sends a 2xx response (usually 202 Accepted) otherwise a 412 (Precondition Failed) is sent.

For many reasons (including the fact that HTTP dates do not have milliseconds) it was felt that this mechanism was not adequate. In HTTP 1.1, ETag (Entity Tag) was introduced which is an opaque Id in the format of a quoted string that is returned with the resource. ETags can be strong (refer to RFC 2616 for more info) or weak in which case they start with w/ such as w/"12345". ETag works like a version ID for a resource - if two ETags (for the same resource) are equal, it means those two versions of the resource are the same.

ETag for the same resource could be different according to various headers. For example a client can send an Accept-Language header of de-DE or en-GB and server would send different ETags. Server can define this variability for each resource by using Vary header.

Cache validation for GET and PUT requests are similar but instead will use If-Match (for PUT) and If-None-Match (for GET).

Well... complex? Yeah, pretty much and yet I have not covered some other aspects and the edge cases. But don't worry! You will be abstracted from a lot of it if you use the CachingHandler.

Using CachingHandler right out of the box

OK, using the CachingHandler is straightforward especially if you go with the default settings. You can have a look at the CarManager sample in the WebApiContrib project. CarManager.Web sample demonstrate server side setup of the caching and CarManager.CachingClient for making calls to the server and testing various caching scenarios.

All you have to do is to create an ASP.NET Web API project and add the CachingHandler as a delegating handler as we learnt in Part 6:

GlobalConfiguration.Configuration.MessageHandlers.Add(cachingHandler);

And you are done! Now let's try the server with some scenarios. I would suggest that you use the CarManager sample, otherwise create a simple handler and implement GET, PUT and POST on it.

Now let's make some request and look at the response. I would suggest using Fiddler or Chrome's Postman to send requests and view the response.

So if we send a GET request:

GET http://localhost:8031/api/Cars HTTP/1.1
User-Agent: Fiddler
Host: localhost:8031

We get back this response (or similar; some headers removed and body truncated for clarity):

HTTP/1.1 200 OK
ETag: "54e9a75f2dbb4edca672f7a2c4a73dca"
Vary: Accept
Cache-Control: no-transform, must-revalidate, max-age=604800, private
Last-Modified: Thu, 21 Jun 2012 23:35:46 GMT
Content-Type: application/json; charset=utf-8

[{"Id":1,"Make":"Vauxhall","Model":"Astra","BuildYear":1997,"Price":175.0....

So we see the ETag header here along with important caching headers. Now if we make another GET call, we get back the same response but ETag and last modified stays the same.

Generating same ETag is fine (showing our CachingHandler is doing something) but we have not yet seen any caching. That is where client has to do some work to do. It has to use If-None-Match with the ETag to conditionally ask for the resource: if it matches server, it will get back 304 but if not, server will return the new resource:

GET http://localhost:8031/api/Cars HTTP/1.1
User-Agent: Fiddler
Host: localhost:8031
If-None-Match: "54e9a75f2dbb4edca672f7a2c4a73dca"

Here we get back 304 (Not modified) as expected:

HTTP/1.1 304 Not Modified
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
ETag: "54e9a75f2dbb4edca672f7a2c4a73dca"
Server: Microsoft-IIS/8.0
Date: Sun, 24 Jun 2012 07:34:29 GMT

A typical server that can use CachingHandler

CachingHandler makes a few RESTful assumptions about the server for effective caching. Some of these assumptions can be overridden but generally not recommended. These assumptions are:

HTTP verbs (POST/GET/PUT/DELETE for CRUD operations) are used to modify resources - and not RPC-style resources (such as POST /api/Cars/Add). This is the most fundamental assumption.
All resources are to be modified through the same HTTP pipeline that implements caching. If a resource modified outside the pipeline, cache state needs to be updated by the same process.
Resource are organised in a natural cache-friendly manner: invalidation of related resources can be done with a minimal setup (more details upcoming).

Defining cache state

As we said, this has nothing to do with HttpRuntime.Cache! Unfortunately ASP.NET implementation makes it really confusing between the HTTP cache (where the resource gets cached on the client or midstream cache servers) and server caching (when the rendered output gets cached on the server).

Cache state is a collection of data that helps keep track of each resource along with its last modified date and ETag. It might initially seem that for a resource, there exists a single of such pieces of information. But as we touched upon above, a resource can have different representations each of which needs to be stored separately on the client while they will most likely invalidated together.

For example, resource /api/disclaimer can exist in multiple languages as such client has to cache each representation separately but when disclaimer changes, all such representations need to be invalidated. That will require a storage of some sort to keep track of all this data. Current implementation comes with an in-memory storage but in a web farm scenario this needs to be a persistent store.

Cache state storage and cache invalidation

So we do not need to store the cache on the server, but we DO need to store ETag and various states on the server. If we only have a single server, this state can be stored in the memory. In a web farm (or even web garden) scenario a persisted store is needed. This store is called Entity Tag Store and is represented by a simple interface:

public interface IEntityTagStore
{
 bool TryGetValue(EntityTagKey key, out TimedEntityTagHeaderValue eTag);
 void AddOrUpdate(EntityTagKey key, TimedEntityTagHeaderValue eTag);
 bool TryRemove(EntityTagKey key);
 int RemoveAllByRoutePattern(string routePattern);
 void Clear();
}

We need to have an implementation of this interface so that the cache management can be abstracted away from controllers and instead done in the DelegatingHandlers.

Introducing some concepts (you may skip and come back to it later)

This store will keep track of the Etag (and related state stored in TimedEntityTagHeaderValue) based on an Entity Tag key which is calculated based on the resource URI and content of the important headers (which their list will be on the Vary header). In-Memory implementation for a single server is provided out of the box with CachingHandler - I would be creating a SQL-Server implementation of it very soon; watch this space.

It is important to note that change in the resource will most likely invalidate all forms of the resource so all permutations of important headers will be invalidated. So invalidation is usually performed at the resource level. Sometimes several related resources can be represented as a RoutePattern. By default, URI of a resource is its RoutePattern.

Also in some cases, change in a resource will invalidate linked resources. For example, a POST to /api/cars to add a car will invalidate /api/cars/fastest and /api/cars/mostExpensive. In this case, "/api/cars/*" can be defined as the linked RoutePattern (since /api/cars does not qualify for /api/cars/*).

Some assumptions in CachingHandler

Re-emphasising that resource can only change through the HTTP API (using PUT, POST and DELETE verbs). If resources are to be changed outside the API, it is responsibility of the application to use IEntityTageStore to invalidate the cache for those resources.
If no Vary header is defined by the application, CachingHandler creates weak ETags.
Change in the resource will invalidates all forms of the resources (all permutations of important header values)

CarManager sample

I have considered a pretty complex and interrelated routing and caching requirement for the sample to display what is possible. Resources available are:

/api/Car/{id}: GET, PUT and DELETE
/api/Cars: GET and POST
/api/Cars/MostExpensive: GET
/api/Cars/Fastest: GET

So creating a car would invalidate cache for 2, 3 and 4. Updating car with id=1 invalidates 2,3 and 4 in addition to the car itself at /api/Car/1. Deleting a car will invalidate 2, 3 and 4.

The client sample (CarManager.CachingClinet) displays how to call the server with headers to validate the cache.

Configuring CachingHandler

Best place to start is the CarManager.Web sample to give an idea on how various setup can be used to configure complex caching requirements. Basically the points below can be used to configure the CachingHandler

Constructor

A list of request header names can be passed that will define the vary headers. If no vary header is passed, system only generates weak ETags. Also optionally an implementation of IEntityTagStore otherwise by default InMemoryEntityTagStore will be used.

EntityTagKeyGenerator

This is an opportunity to provide linked resource for a resource.

Other customisation points

CachingHandler provides properties in the form of functions with default implementation that can be changed to override the default behaviour which we will cover in the upcoming post on "Extending CachingHandler".

Conclusion

CachingHandler is a server-side DelegatingHandler which can be used to abstract away caching from individual ApiControllers so that controller logic can be coded without having to worry about caching.

The code is hosted on GitHub (~~currently sitting at my fork waiting to be merged~~ merged!) and comes with both a client and server sample - CarManager.Web and CarManager.CachingClient.