Byte Rot: CacheCow 2.0 is here - now supporting .NET Standard and ASP.NET Core MVC

CacheCow 2.0 Series:

Part 1 - CacheCow 2.0 is here - supporting .NET Standard and ASP.NET Core MVC [This post]
Part 2 - CacheCow.Client 2.0: HTTP Caching for your API calls
Part 3 - CacheCow.Server 2.0: Using it on ASP.NET Core MVC
Part 4 - CacheCow.Server for ASP.NET Web API [Coming Soon]
Epilogue: side-learnings from supporting Core [Coming]

So, no CacheCore in the end!

Yeah. I did announce last year that the new updated CacheCow will live under the name CacheCore. The more I worked on it, the more it became evident that only a tiny amount of CacheCow will ever be Core-related. And frankly trends come and go, while HTTP Caching is pretty much unchanged for the last 20 years.

So the name CacheCow lives on, although in the end what matters for a library is if it can solve any of your problems. I hope it will and carry on doing so. Now you can use CacheCow.Client with .NET 4.52+ and .NET Standard 2.0+. Also CacheCow.Server also supports both Web API and ASP.NET Core MVC - and possibly Nancy soon!

CacheCow 2.0 has lots of documentation and the project now has 3 sample projects covering both client and server sides in the same project.

CacheCow.Server has changed radically

The design for the server-side of the CacheCow 0.x and 1.x was based on the assumption that your API is a pure RESTful API and the data only changes through calling its endpoints so the API layer gets to see all changes to its underlying resources. The more I explored and over the years, this turned out to be a pretty big assumption in the end, and is realistic only in the REST La La Land - a big learning for me. And even if the case is true, the relationship between resources resulted in server-side cache directive management to be a mammoth task. For example in the familiar scenario of customer-product-orders, if an order changes, the cache for the collection of orders is now invalidated - hence the API needs to understand which resource is collection of which. What is more, change in customer could change the order data (depending on implementation of course, but just the take it for the sake of argument). So it meant that the API now has to know a lot more: single vs collection resources, relationship between resources... it was a slippery slope to a very bad place.

With removing that assumption, the responsibility now lies with the back-end stores which provide data for the resources - they will be queried by a couple of constructs added to CacheCow.Server. If you opt to implement that part for your API, then you have a supper-efficient API. If not, there are some defaults there to do the work for you - although super-optimal. All of this will be explained in the CacheCow.Server posts, but the point is CacheCow.Server is now a clean abstraction for HTTP Caching, as clean as I could make it. Judge for yourself.

What is HTTP Caching?

Caching is a very familiar notion in programming and pretty much every developer uses it on a regular basis. This familiarity has a downside to it since HTTP Caching is more complex and in many ways different to the routing caching in code - hence it is very common to see misunderstandings even amongst senior developers. If you ask an average developer this question: "In HTTP Caching, where the cache data gets stored?" it is probably more likely to hear the wrong answer "server" than the correct answer "client". In fact, many developers are looking for to improve their server-side code's performance by turning on the caching on the server, while if the callers ignore the caching directives it will not result in any benefit.

This reminds me of a blog post I wrote 6 years ago where I used HTTP Caching as an example of mixed-concern (as opposed to server-concern or client-concern) where "For HTTP caching to work, client and server need to work in tandem". This a key difference with the usual caching scenarios seen everyday. What makes HTTP Caching even more complex is the concurrency primitives, built-in starting with HTTP 1.1 - we will look into those below.

I know HTTP Caching is hardly new and has been explained many times before. But considering number of times I have seen being completely misunderstood, I think it deserves your 5-10 minutes - even though as refresher.

Resources vs. Representations

REST advocates exposing services through a uniform API (where HTTP is one such implementation) allowing resources to be created, modified and queried using the API. A resource is addressed by its location identifier or URL (e.g. /api/car/123). When a client requests a resource, only a representation of the resource is sent back. This means that the client receives only a representation out of many possible representations. This also would mean that when the client caches the representation, this representation is only valid if the the representation requested matches the one cached. And finally, a client might cache different representations of the same resource. But what does all of this mean?

HTTP GET - The server serving a representation of the resource. Server also send cache directives.

A resource could be represented differently in terms of format, encoding, language and other presentation concerns. HTTP provides semantic for the client to express its preferences in such concerns with headers such as Accept, Accept-Language and Accept-Encoding. There could be other headers that can result in alternative representations. The server is responsible for returning the definitive list of such headers in the Vary header.

Cache Directives

Server is responsible for returning cache directives along with the representation. Cache-Control header is the de-factor cache directive defining whether the representation can be cached, for how long, whether by the end client or also by the HTTP intermediaries/proxies, etc. HTTP 1.0 had the simple Expires header which only defined absolute expire time of the representation.

You could also think of other cache-related headers as cache directives (although purely speaking they are not) such as ETag, Last-Modified and Vary.

Resource Version Identifiers (Validators)

HTTP 1.1 defines ETag as an opaque identifier which defines the version of the resource. ETag (or EntityTag) can be strong or weak. Normally a strong ETag identifies version of the representation while a weak ETag is only at the resource level.

Last-Modified header was the main validator in HTTP 1.0 but since it is based on a date with up-to-a-second precision, it is not suitable for achieving high consistency since a resource could change multiple times in a second.

CacheCow supports both validators (ETag and Last-Modified) and combines these two notions in the construct TimedETag.

Validating (conditional) HTTP Calls

A GET call can request the server for the resource with the condition that the resource has been modified with respect to its validator. In this case, the client sends ETag(s) in the If-None-Match header or Last-Modified date in the If-Modified-Since header. If validation matches and no change was made, the server returns status 304 otherwise the resource is sent back.

For a PUT (and DELETE) call, the client sends validators in If-Match or If-Unmodified-Since. The server performs the action if validation matches otherwise status 412 is sent back.

Consistency

The client normally caches representations longer than the expiry and after the expiry it resorts to validating calls and if they succeed it can carry on using the representations.

In fact the sever can return representations with immediate expiry forcing the client to validate every time before using the cache resource. This scenario can be called High-Consistency caching since it ensures the client always uses the most recent version.

Is HTTP Caching suitable for my scenario?

Consider using HTTP Caching if:

Both your client and server are cache-aware. The client either is a browser which is the ultimate HTTP machine well capable of handling cache directives or a client that understands caching such as HttpClient + CacheCow.Client.
You need a High-Consistency caching and you cannot afford clients to use outdated data
Saving on network bandwidth is important

HTTP Caching is unsuitable for you if:

Your client does not understand/implement HTTP caching
The server is unable to provide cache directives

In the next post, we will look into CacheCow.Client.

Byte Rot

Sunday, 13 May 2018

CacheCow 2.0 is here - now supporting .NET Standard and ASP.NET Core MVC