Monday, 18 March 2013

CacheCow 0.4 released: new features and a breaking change

Version 0.4 is out and with it a couple of features such as attribute-based cache control and cache refresh (see below). For the first time I felt that I have got to write a few notes about this version, least of which because of one breaking change in this release - although it is not likely to break your code as I explain further. Changes have been in the server components.

I and other contributors have been working on CacheCow for the last 8 months. I thought with a couple of posts I have explained the usage of CacheCow. But I now feel that with concept counts increasing, I need to start a series on CacheCow. Before doing that I am going to explain new concepts and the breaking change.

Breaking change

The breaking change was a change in the signature of CacheControlHeaderProvider from Func<HttpRequestMessage, CacheControlHeaderValue> to Func<HttpRequestMessage, HttpConfiguration, CacheControlHeaderValue> to accept HttpConfiguration.

If you have provided your own CacheControlHeaderProvider, you need to provide HttpConfiguration as well - which should be very easy to fix whether web-host or self-host.

Cache Control Policy

So defining cache policy against resource have been by setting the value of CacheControlHeaderProvider which you would define whether a resource is cacheable and if it is, what is the expiry (and other related stuff):

public Func<HttpRequestMessage, HttpConfiguration, CacheControlHeaderValue> CacheControlHeaderProvider { get; set; }

So by default CacheCow sets Func to return a header value for private caching with immediate expiry for all resources:

CacheControlHeaderProvider = (request, cfg) => new CacheControlHeaderValue()
{
 Private = true,
 MustRevalidate = true,
 NoTransform = true,
 MaxAge = TimeSpan.Zero
};

Immediate expiry actually means that the client can use the expired resource as long as it validates the resource using a conditional GET - as explained before here.

But what if you want to individualise cache policy for each resource? We could use per-route handlers but that is not ideal and generally it depends on the resource organisation approach. I have explained in my previous post that resource organisation is one of the areas that needs to be looked at. But this is not within the scope of CacheCow. We are looking into solving this as part of another project while ASP.NET team are also looking into this. So I have decoupled the resource organisation project from CacheCow.

Having said that, in the meantime, I am going to provide some help with doing cache policy set up less painful. This means that CacheCow will come with a few pre-defined functions that help you with defining your cache control policy.

Good news! Cache policy definition using attributes

So now you can define your cache policy against your actions or controllers or both - although action attribute always takes precedence over controller. Using the popular ASP.NET Web API sample:

    [HttpCacheControlPolicy(true, 100)]
    public class ValuesController : ApiController
    {

        public IEnumerable<string> Get()
        {
            return new[] { "cache", "cow" };
        }

        [HttpCacheControlPolicy(true, 120)]
        public string Get(int id)
        {
            return "cache cow... mooowwwww";
        }

So GET call to the first action (/api/Values) will have a max-age of 100 while GET to the second action (e.g. /api/Values/1) will return a max-age of 120.

In order to set this up, all you have to do is to set the CacheControlHeaderProvider property of your CachingHandler to GetCacheControl method of an instance of AttributeBasedCacheControlPolicy:

cachingHandler.CacheControlHeaderProvider = new AttributeBasedCacheControlPolicy(
 new CacheControlHeaderValue()
  {
   NoStore = true
  }).GetCacheControl;

So in above we have passed a default caching policy of no-caching. This table defines which attribute value (or default provided in the constructor) is used:



Cache Refresh Policy

CacheCow works best when HTTP API is actually a REST API. In other words, it uses uniform interface (i.e. HTTP Verbs) to modify resources and this means that the caching handler will get the opportunity to invalidate and remove the cache when POST, PUT, DELETE or PATCH is used.

Problem is commonly HTTP API sits on the top of a legacy system where it has not control over modifications of resources and acts as a data provider. In such a case, the API will not be notified on resource changes and application will be responsible for removing cache metadata directly on the EntityTagStore used. And this is not always possible.

I am providing a solution for defining a time based cache refresh policy using attributes in a very similar fashion to Cache Control Policy - even the above table applies. Removal of items from cache store on the server happens upon the first request after the refresh interval has passed not immediately after interval. So we add the refresh policy provider:

cachingHandler.CacheRefreshPolicyProvider = new AttributeBasedCacheRefreshPolicy(TimeSpan.FromSeconds(5 * 60 * 60))
    .GetCacheRefreshPolicy;

We have defined 5 hour refresh policy as default. And we override using controller or action attributes.

Future posts

As promised, next few posts will be a CacheCow walk-through. 

22 comments:

  1. Hi aliostad. I really appreciated your work. I was working on Asp.NET Web API Service, particularly on the cache part, and I found your implementation really helpful.

    I would be very grateful if you could help me with a question. I saw in the CachingHandler class that the eTag construction is based on a random Guid, but those eTags would be lost if you restart the service (if you don't use a db to store them). I'd like to know if would it be possible to generate the eTag based on the response payload or if there is a reason for that generation to be made the way it is now.

    I would be very grateful if you could help me with this doubt.

    Congratulations for you nice work!

    Guilherme

    ReplyDelete
    Replies
    1. Yes, it really helped a lot! Thank you!

      Delete
  2. Hi Guilherme,

    Default implementation of eTag store is an InMemory one but you could use persistent ones such as Memcached, SQL Server, Redis, etc. So your best bet is to use a persistent one so it does not be lost. It seems that you have already taken a note of that.

    I have been thinking of adding a feature to generate eTag based on the content. This is not there yet and might be a while until done.

    Your best option is to do this:

    1- Define a custom header name such as x-cachecow-content-hash
    2- Pass that when as one of the varyByHeaders headers when you initialise the CachingHandler
    3- Generate the content hash and add the header to REQUEST (and not RESPONSE)
    4- Create a Func and set to the value of ETagValueGenerator to read from the value of the headers passed.

    Any problems let me know.



    ReplyDelete
    Replies
    1. Hi aliostad,

      Thanks for your fast reply. I will use the persistent cache with MongoDB for now.

      I didn't understand very well the part about putting the header on the REQUEST. Just to clarify, when talking about the server, shouldn't the header be put on the RESPONSE?

      Thanks again.

      Delete
    2. OK, not to worry, I will implement this later. But I would say it is 10 times better to use an InMemory repo and accept drawback of restarts rather than generate eTag based on the content since win the first, server just goes to backend once to get the data and then sets the eTag. If you do based on content, you have to always go to backend to get the data and generate eTag. So you actually would not use most of the benefits of caching.

      Delete
    3. As a matter of fact I was thinking on generating the eTags based on the content and then store them on the memory.

      With that in mind, every first time that the server goes to the backend to get the resource, it should store that eTag and on the following requests to the same resource, it should reuse the stored eTag. On the PUT, PATCH, DELETE and POST requests for that resource, the server should invalidate the resource's eTag and regenerate it. Wouldn't it be possible?

      Anyway, your current implementation already fully satisfy my requirements. Thanks again!

      Delete
    4. So sorry for me asking, what benefit does the generation of eTag based on content provides over GUID if we are not checking with content in subsequent calls?

      Delete
    5. The only benefit I thought was that I could restart the server without the eTags being invalidated. So, the eTags that the client have stored, associated with the resources, would still be valid.

      Anyway, that was just an idea and would have the same results as using the GUID as eTag and persisting it to a database.

      Delete
    6. I see - you are right. Can you please create an issue on the GitHub? I need to make a new release, I will add this as part of it :)

      Delete
    7. OK, nice! :)

      I was going to create the issue but I saw that you already created it.

      Delete
    8. So, how does it work now?
      I'm facing the same problem.
      We have lots of images as static content that get updated via web deploy. So, the application gets restarted.
      Then we don't want the clients to have to re-fetch images if they haven't actually changed.

      Delete
    9. Why don't you use static file handlers that have a built in support for output caching?

      Delete
    10. I can do that?
      Anyway, solved it with a delegating handler on top of the cachecow one:


      public class HandleFirstTimeEtagResultDelegatingHandler : DelegatingHandler
      {
      protected override async Task SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
      {
      HttpResponseMessage response = await base.SendAsync(request, cancellationToken);
      HttpStatusCode statusCodeSuggestion = response.StatusCode;
      if (statusCodeSuggestion == HttpStatusCode.OK)
      {
      HttpHeaderValueCollection etagValuesFromRequest = request.Headers.IfNoneMatch;
      EntityTagHeaderValue responseEtagSuggestion = response.Headers.ETag;
      if (etagValuesFromRequest != null && responseEtagSuggestion!=null && etagValuesFromRequest.Any(i => responseEtagSuggestion.Equals(i)))
      {
      response.StatusCode = HttpStatusCode.NotModified;
      response.ReasonPhrase = new HttpStatusCodeResult(HttpStatusCode.NotModified).StatusDescription;
      response.Content.Dispose();
      response.Content = null;
      }
      }
      return response;
      }
      }

      Delete
  3. I am trying to by-pass caching on the server for some routes in my Web-Api project however when using an attribute based caching policy, I noticed based on the HTTP responses that the policy works but the SQL Server store is still updated with the cache state info. How can I prevent the CacheState table from adding and updating for resources I do not want cached. I attempted to use the HTTPCacheControlPolicy with the cacheControlHeaderValueFactory parameter but it is not entirely clear to me how this should be implemented. Can you please shed some light as to how I can go about addressing this issue. thanks.

    ReplyDelete
    Replies
    1. Is this causing a problem? It is a known issue but if it is too much of a problem, raise an issue in Github and I will have a look to fix it.

      Delete
  4. Hi. Going back to the point you make in the post about legacy systems changing state and this not being picked up by the Handler, is it then OK (Best Practice) to manually remove the data from the store? (SQL, Memcache, whatever) ?

    ReplyDelete
    Replies
    1. Yes, that becomes a necessity. However, this becomes brittle since deeper layers now need to understand about more superficial layers (API Layer) and know their resources.

      One of the solutions is to generate the cache key from the resource which requires a call to the backend. I have a plan to support this scenario.

      Delete
  5. Hi. Based on the information in this article I added the following to my CachingHander created in my Application_Start (Global.asax.cs).

    var cachecow = new CachingHandler(GlobalConfiguration.Configuration);

    cachecow.CacheControlHeaderProvider = new AttributeBasedCacheControlPolicy(
    new CacheControlHeaderValue()
    {
    NoCache = true
    }).GetCacheControl;

    I now get the following error ,"exceptionMessage":"Multiple actions were found that match the request:...

    Any advice in troubleshooting/resolving would be appreciated. Thanks

    ReplyDelete
    Replies
    1. Can you please send me your routing and controllers?

      Delete
    2. I had this problem too because I had a BaseController with public bool and public int? function. I changed these to protected instead of public which worked.

      Delete
  6. DreamHost is definitely the best hosting provider with plans for all of your hosting requirments.

    ReplyDelete