Friday, 14 April 2017

Future of CacheCow and the birth of CacheCore

CacheCow is my most popular OSS project which came to being back in 2012. It started its life as part of WebApiContrib but the need for a full-fledged project supporting different storage options soon led me to create CacheCow - which needed both client and server components.

With the availability of .NET Core which brings its completely new HTTP pipeline, the question has been when or how CacheCow will move to the .NET Core. On the client-side, HttpClient still is the king of making HTTP requests meaning CacheCow.Client will work when the long awaited .NET Standard 2.0 comes along allowing us to reference older .NET libraries. On the server-side, however, it is clear that CacheCow.Server has no place of existence since the pipeline in its entirety has been changed. So what should we do? Create a completely new project for both client-side and server-side or maintain CacheCow.Client (while migrating it to .NET Standard to support the new .NET) and create a new project for the server-side?

I have been thinking hard about this and for the reasons I will explain, I will be creating a completely new project called CacheCore (other contenders were Cache-vNext, CacheDnx and also recently CacheStandard) which will contain both client and server elements.

If you would like to know the details (REST discussions, lessons learned including some gory confessions below) read the rest, otherwise feel free to watch the space as things will start to happen.

I am under no illusion that this will require quite some effort. Apart from the learning curve and niggles with tooling, I find the most frustrating aspect to be trying to google anything Core related: internet is now full of discarded evolutionary artifacts in the form of blogs, stackoverflow questions, tutorials and even MSDN documentation - each documenting the journey not the current state. If you think that is a small issue, ask anyone picking .NET Core for the first time. I wish we had a big giant flush and could have flushed all that to /dev/null wiping the history clean - never mind all those many many hours lost. OK, rant over - promise.

CacheCore.Server

As I said, I have confessions to make and one is just coming. When I designed server components of CacheCow as an API middleware, the idea was that they would be used for services that are purely RESTful in the sense that all changes to the state of the resource would be going through the API to carry out the state change. Initially there does not seem anything extra-ordinary about this but I gradually learnt that cache coherency is a very big responsibility for a mere middleware to take on.

First of all, there are many services out there that the underlying data could change without a request passing through the API. What is worse is that even if all state change is via API calls, the change to a resource could invalidate other resources. For example, a PUT request to /cars/123 will invalidate the /cars/123 which is fine, but what about ‘/cars’? So I started thinking about resources in terms of collection and instance and CacheCow.Server started to infer collection and instance resources based on a convention - hence I used Route Pattern concept so the application could configure the cache invalidation, so here route pattern would be /cars/*.

But the problem did not stop there. A change to /cars/123/contracts/456 could invalidate all these URLS: /cars/123/contracts, /cars/123 and possibly /cars - hence CacheCow now needs to walk up the tree and invalidate all those resources. And now to the next level of headaches: a POST /orders/1234 could invalidate customer/987 as there is no apparent connection unless the application tells us - which made me introduce the concept of Linked Route Patterns so the application could configure these relationships. And configuring was of course a pain, and frankly I think except me and a handful other people really did not quite get what I was on about.

Now, I believe it is too much of a responsibility for an HTTP middleware to do cache coherency. As such CacheCore.Server will be a lot simpler: no more entity tag storage, application will decide to use ETag or LastModifieDate for cache coherency and will be responsible for providing these values - although I will provide some helpers. One key difference in this implementation would be a set of tools fitting different scenarios rather than a single HTTP Caching god-class.

To explain this aspect further, HTTP caching is a spectrum of primitives that help you build more scalable (caching) and consistent (concurrency) systems - some of which are basic and used by many, while others have remained obscure and seldom used. Caching and expiry on resources are better known while from my experience, conditional PUT to achieve optimistic concurrency is rarely used - even conditional GET is rarely used by HTTP clients other than browsers. As such, CacheCore will come with three filters starting from the most basic to the most advanced:
  • BasicCacheFilter: This is the simplest filter which covers returning Cache-Control headers according to expiry configuration, reading the ETag or LastModified from the returned model (or inferring them by using reflection) and handling conditional GET for you. As long as you have a property called ETag or LastModified (or LastModifiedDate, etc) on the model you return from your API, this will work. For conditional GETs to this filter you would not save on any pressure your “database”: API calls will result on retrieval of data to the API so the filter can find the ETag or LastModified and accordingly respond to conditional GET requests.
  • LookupCacheFilter: This filter improves on the BasicCacheFilter by allowing the application to provide a callback mechanism for the application to look up ETag or LastModified without having to load the full model. Caching almost always gets used on resources where the operation is expensive either in IO or computation costs and this approach helps you to replace loading the full model with a light-weight lookup call. For example, let’s say the resource is /cars/123 and you keep a LastModifiedDate on your cars database and use hash of the LastModifiedDate as the ETag (you could use LastModifiedDate to do cache validation on the date but HTTP date’s precision is sadly up to a second which might not be enough for you). In this case, the filter will enquire the application for ETag or LastModified of the resource and you can call your database and read that value for car:id=123 without loading the whole car - which is going to be a lighter database call. So this filter will do all BasicCacheFilter (and in more efficient way) and will even do conditional PUT for you. What is the problem with this one? Consistency: in terms of conditional PUT, validation is not atomic, e.g. you look up the ETag and you find the condition is met and proceed to update meanwhile data could have changed between the lookup and update (same could also apply to conditional GET but has less serious impact). This is not a problem for everyone hence I think this filter hits the sweet spot for simplicity and effectiveness.
  • StrongConsistencyCacheFilter: This is basically the same as above but maintains airtight consistency by allowing the application to implement atomic conditional GET and PUT - which means application has to do more.
I have plans for these to be GET or PUT specific since actions are usually designed as such.
Now you might ask, why CacheCore is a filter and not a middleware? If you remember, CacheCow.Server was a DelegatingHandler (akin to an ASP.NET Core middleware). Well, here is another lesson learnt: caching is a highly localised concern, it is a mistake to implement it as a global HTTP intermediary.

CacheCore.Client

Considering the client story in .NET Core for HTTP has not been drastically changed, it is fair to assume CacheCow.Client can still be used.

That is true, however, there are a few reasons I would like to start afresh. First of all, CacheCow’s inception and the main of the codebase was designed when .NET yet did not have an await keyword. This resulted in a .ContinueWith() soup which was hard to read and difficult to maintain. On the other hand, some interfaces supported async while others did not, resulting in breaking the async all the way rule. Also I had in mind for the storage to be clever about how much storage it uses per site and implement LRU while many underlying storages did not provide the primitive to do so - and frankly in this 5 years I have never needed it.

I think it is time to get rid of these shortcomings hence there will be a new client project too.

Future of CacheCow.Server and CacheCow.Client

It would be naive to think everyone will move to .NET Core straightaway. In fact, with .NET Standard 2.0, Microsoft has shown to have realised there needs to be a better interoperability between the classic .NET and the .NET Core. Apart from interoperability, I think people will carry on using and building .NET APIs for another few years.

Fore these reasons, I will carry on supporting CacheCow and releasing bug fixes, etc. Thanks for helping it improve by using it, reporting issues and sending bug fixes.

Tuesday, 31 January 2017

Announcing Zipkin Collector for Azure EventHub

If you are reading this, you have probably heard of Zipkin. If not, please take my word to leave this post to spend 10 minutes reading up on it - a very worthwhile 10 minutes which will introduce to you one of the best, yet simplest distributed tracing systems. It one word, it tells you where the time to serve requests been most spent helping you to optimise your Microservice architecture.

Zipkin, used by the likes of Twitter and Netflix, has already created a complete storm in the Java/JVM ecosystem, but many of us in the .NET community have not heard of it - and that is frankly a real pity. And if you have heard it and want to use it, yes of course we can try to port the whole system over to .NET but that would be a huge amount of work and frankly a waste since Zipkin is designed to work across different stacks as long as you can somehow get your data over to it. The data is normally pushed to Kafka, and Zipkin consume messages from Kafka by a component called Collector. Data then gets stored in a storage (currently available for MySQL, Cassandra or Elasticsearch) and then served by the UI.

Of course nothing stops you to run Kafka in your cloud or on-premise environment, but if you have never done it, to say the least, ZooKeeper (a consensus required for running Kafka) is not the easiest service to operate. And frankly if you are on Azure it makes a lot of sense to use EventHub, an Azure PaaS service with functionality very similar to Kafka. Sadly there were no collector for it.

I have been very keen to bring Zipkin to ASOS, but could not quite justify running ZK and Kafka, even for myself. Hence I felt something has to be done about it. The only problem: had never done a Java/Maven project before.

*     *     *

I have been doing what I have been doing - being a professional developer - for some time now. And I have had my ups and downs, both moments that I am proud of and moments of embarrassment because I have messed up. But never, have I just picked up a complete different stack, and built something like what I am going to share, within a couple of weeks. [Yeah I am talking about Zipkin Collector for Azure EventHub]



This really has been a testament to how pluggable and nicely designed-Zipkin is, and above all it has a truly amazing community - championed by Adrian Cole. Help was always around the corner, be it on hardcore stuff such as how to modularise collector or my noob problems with Maven.

Not to forget too, that Azure EventHub SDK basically made it completely trivial to implement a working solution. All the heavy lifting has been done by the EventProcessorHost so all is left is a bit of plumbing to get the configuration over to these components.

*     *     *

How to use EventHub Collector

So the idea is that you would run zipkin-server (which hosts the Zipkin UI) and in the same process you run your collector. Zipkin uses Spring Boot's auto configuration mechanism to load the right collector based on the configurations provided. The project is host on github. [UPDATE: Project has moved to OpenZipkin organisation here]

EventHub Collector gets triggered by the existence of "zipkin.collector.eventhub.eventHubConnectionString" configuration via command line. Rest of the configurations necessary can be passed by an application.properties or application.yaml file.

So to run the EventHub collector you need:

1- zipkin.jar (zipkin-server)
2- application.properties file
3- zipkin-collector-eventhub-autoconfig module jar (which contains transitive dependencies too). This jar is not on maven yet

So in order to run:

1- Clone the source and build

mkdir zipkin-collector-eventhub
cd zipkin-collector-eventhub
git clone git@github.com:aliostad/zipkin-collector-eventhub.git
mvn package

If you do not have maven, get maven here.

2- Unpackage MODULE jar into an empty folder

copy zipkin-collector-eventhub-autoconfig-x.x.x-SNAPSHOT-module.jar (that has been package in the target folder) into an empty folder and unpackage

jar xf zipkin-collector-eventhub-autoconfig-0.1.0-SNAPSHOT-module.jar

You may then delete the jar itself.

3- Download zipkin-server jar


Download the latest zipkin-server jar (which is named zipkin.jar) from here. For more information visit zipkin-server homepage.

4- create an application.properties file for configuration next to the zipkin.jar file

Populate the configuration - make sure the resources (Azure Storage, EventHub, etc) exist. Only storageConnectionString is mandatory the rest are optional and must be used only to override the defaults:

zipkin.collector.eventhub.storageConnectionString=<azure storage connection string>
zipkin.collector.eventhub.eventHubName=<name of the eventhub, default is zipkin>
zipkin.collector.eventhub.consumerGroupName=<name of the consumer group, default is $Default>
zipkin.collector.eventhub.storageContainerName=<name of the storage container, default is zipkin>
zipkin.collector.eventhub.processorHostName=<name of the processor host, default is a randomly generated GUID>
zipkin.collector.eventhub.storageBlobPrefix=<the path within container where blobs are created for partition lease, processorHostName>


5- Run the server along with the collector

Assuming zipkin.jar and application.properties are in the current working directory, run this from the command line (note that the connection string to the eventhub itself is passed in the command line):

java -Dloader.path=/where/jar/was/unpackaged -cp zipkin.jar org.springframework.boot.loader.PropertiesLauncher --spring.config.location=application.properties --zipkin.collector.eventhub.eventHubConnectionString="<eventhub connection string, make sure quoted otherwise won't work>"


After running, spring boot and the rest of the stack gets loaded and then you should be able to see some INFO output from the collector outputting the configuration you have passed.


You should be up and running and can start pushing spans to your EventHub.

Span serialisation guideline

EventHub Collector expects spans serialised as JSON array of spans. The payload gets read as a UTF-8 string and gets deserialised by the zipkin-server components.

Roadmap

Next step is to get the jar on to maven central. Also I will start working on a .NET library to make building spans easier.