Wednesday 27 May 2015

PerfIt! decoupled from Web API: measure down to a closure in your .NET application

Level [T2]

Performance monitoring is an essential part of doing any serious-scale software. Unfortunately in .NET ecosystem, historically first looking for direction and tooling from Microsoft, there has been a real lack of good tooling - for some reason or another effective monitoring has not been a priority for Microsoft although this could be changing now. Healthy growth of .NET Open Source community in the last few years brought a few innovations in this space (Glimpse being one) but they focused on solving development problems rather than application telemetry.

2 years ago, while trying to build and deploy large scale APIs, I was unable to find anything suitable to save me having to write a lot of boilerplate code to add performance counters to my applications so I coded a working prototype of performance counters for ASP .NET Web API and open sourced and shared it on Github, calling it PerfIt! for the lack of a better name. Over the last few years PerfIt! has been deployed to production in a good number of companies running .NET. I added the client support too to measure calls made by HttpClient and it was a handy addition.
From Flickr

This is all not bad but in reality, REST API calls do not cover all your outgoing or incoming server communications (which you naturally would like to measure): you need to communicate to databases (relational or NoSQL), caches (e.g. Redis), Blob Storages, and many other. On top of that, there could be some other parts of your code that you would like to measure such as CPU intensive algorithms, reading or writing large local files, running Machine Learning classifiers, etc. Of course, PerfIt! in this current incarnation cannot help with any of those cases.

It turned out with a little change and separating performance monitoring from Web API semantic (which is changing with vNext again) this can be done. Actually, not getting much credit for it, it was mainly ideas from two of my best colleagues which I am grateful for their contribution: Andres Del Rio and JaiGanesh Sundaravel.

New PerfIt! features (and limitations)

So currently at version alpha2, you can get the new PerfIt! by using nuget (when it works):
PM> install-package PerfIt -pre
Here are the extra features that you get from the new PerfIt!.

Measure metrics for a closure

So at the lowest level of an aspect abstraction, you might be interested in measuring metrics for a closure, for example:
Action action = Thread.Sleep(1000);
action(); // measure
Or in case of an async operation:
foo result = null;
Func<Task> asyncCall = async () => result = await _command.ExecuteScalar();

// and then
await asyncCall();
This closure could be wrapped in a method of course, but there again, having a unified closure interface is essential in building a common tool: each method can have different inputs of outputs while all can be presented in a closure having the same interface.

Thames Barriers Closure - Flickr. Sorry couldn't find a more related picture, but enjoy all the same
So in order to measure metrics for the action closure, all we need to do is:
var ins = new SimpleInstrumentor(new InstrumentationInfo() 
   Counters = CounterTypes.StandardCounters, 
   Description = "test", 
   InstanceName = "Test instance" 

ins.Instrument(() => Thread.Sleep(100));

A few things here:
  • SimpleInstrumentor is responsible for providing a hook to instrument your closures. 
  • InstrumentationInfo contains the metadata for publishing the performance counters. You provide the name of the counters to raise to it (provided if they are not standard, you have already defined )
  • You will be more likely to create a single instrumentor instance for each aspect of your code that you would like to instrument.
  • This example assumes the counters and their category are installed. PerfitRuntime class provides mechanism to register your counters on the box - which is covered in previous posts.
  • Instrument method has an option to pass the context as a string parameter. This context can be used to correlate metrics with application context in ETW events (see below).

Doing an async operation is not that different:
ins.InstrumentAsync(async () => await Task.Delay(100));

//or even simpler:
ins.InstrumentAsync(() => Task.Delay(100))

SimpleInstrumentor is the building block for higher level abstractions of instrumentation. For example, PerfitClientDelegatingHandler now uses SimpleInstrumentor behind the scene.

Raise ETW events, effortlessly

Event Tracing for Windows (ETW) is a low overhead framework for logging, instrumentation, tracing and monitoring that has been in Windows since version 2000. Version 4.5 of the .NET Framework exposes this feature in the class EventSource. Probably suffice to say, if you are not using ETW you are doing it wrong.

One problem with Performance Counters is that they use sampling, rather than events. This is all well and good but lacks the resolution you sometimes need to find problems. For example, if 1% of calls take > 2 seconds, you need on average 100 samples and if you are unlucky a lot more to see the spike.

Another problem is lack of context with the measurements. When you see such a high response, there is really no way to find out what was the context (e.g. customerId) for which it took wrong. This makes finding performance bottlenecks more difficult.

So SimpleInstrumentor, in addition to doing counters for you, raises InstrumentationEventSource ETW events. Of course, you can turn it off or just leave it as it has almost no impact. But so much better, is that use a sink (Table Storage, ElasticSearch, etc) and persist these events to a store and then analyse using something like ElasticSearch and Kibana - as we do it in ASOS. Here is a console log sink, subscribed to these events:
var listener = ConsoleLog.CreateListener();
listener.EnableEvents(InstrumentationEventSource.Instance, EventLevel.LogAlways,
And you would see:

Obviously this might not look very impressive but when you take into account that you have the timeTakenMilli (here 102ms) and have the option to pass instrumentationContext string (here "test..."), you could correlate performance with the context of in your application.

PerfIt for Web API is all there just in a different nuget package

If you have been using previous versions of PerfIt, do not panic! We are not going to move the cheese, so the client and server delegating handlers are all there only in a different package, so you just need to install Perfit.WebApi package:
PM> install-package PerfIt.WebApi -pre
The rest is just the same.

Only .NET 4.5 or higher

After spending a lot of time writing async code in CacheCow which was .NET 4.0, I do not think anyone should be subjected to such torture, so my apologies to those using .NET 4.0 but I had to move PerfIt! to .NET 4.5. Sorry .NET 4.0 users.

PerfIt for MVC, Windsor Castle interceptors and more

Yeah, there is more coming. PerfIt for MVC has been long asked by the community and castle interceptors can simply remove all cross cutting concern codes out of your core business code. Stay tuned and please provide feedback before going fully to v1!

Sunday 10 May 2015

Machine Learning and APIs: introducing Mills in REST API Design

Level [C3]

REST (REpresentational State Transfer) was designed with the "state" at its heart, literally, standing for the S in the middle of the acronym.

TL;DR: Mill is a special type of resource where server's authority purely comes from exposing an algorithm, rather than "defining, exposing and maintaining integrity of a state". Unlike an RPC style endpoint, it has to adhere to a set of 5 constraints (see below). 

Historically, when there were only a few thousand servers around, the state was predominantly documents. People were creating, editing and sharing a lot of text documents, and some HTML. With HTTP 1.1, caching and concurrency was built into the protocol and enabled it to represent richer distributed computing concerns and we have been building . With the rising popularity of REST over the last 10 years, much of today's web has been built on top of RESTful thinking, whether what is visible or what is behind the presentation (most external layer) servers. Nowadays when we talk of state, we normally mean data or rather records persisted in a data store (relational or non-relational). A lot of today's data, directly or indirectly, is created, updated and deleted using REST APIs. And this is all cool, of course.

When we design APIs, we map the state into REST Resources. It is very intuitive to think of resources as collection and instance. It is unambiguous and useful to communicate these concepts when for example we refer to /books and /books/123 URLs, as the collection or instance resources, respectively. We interact with these resources using verbs, and although HTTP verbs are not meant to be used just for CRUD operations, interacting with the state that exists on the server is inherent in the design.

But that is not all the story. Mainstream adoption of Machine Learning in the industry means we need to expose Machine Learning applications using APIs. The problem is the resource oriented approach of REST (where the state is at the heart of the design) does not work very well.

By the way, I am NOT 51... is an example of a Machine Learning application where instead of being an application, it could have been an API. For example (just for illustration, you could use other media types too):

POST /age_gender_classifier HTTP/1.1
Content-Type: image/jpeg
And the response:
200 OK
Content-Type: application/json


Server is generating a response to the request by carrying out complex face recognition and running a model, most likely a deep network model. Server is not returning a state stored on the server, in fact this whole process is completely stateless.

And why does this matter? Well I feel if REST is supposed to move forward with our needs and use cases, it should define, clarify, internalise and finally digest edge cases. While such edge cases were pretty uncommon, with the rise and popularity of Machine Learning, such endpoints will be pretty standard.

A few days ago, on the second day of APIdays Mediterranea 2015 conference, I presented a talk on Machine Learning and APIs. And in this talk I presented simple concept of Mills. Mills, where you take your wheat to be ground and you carry back the flour.

Basically, it all goes back to the origin of a server's authority. To bring an example, a "Customer Profile" service, exposed by a REST API, is the authority to go to when another service requires access to a customer's profile. The "Customer Profile" service has defined a state, which is profile of the customers, and is responsible for ensuring integrity of the state (enforcing business rules on the state). For example, marketing email preference can have values of None, WeeklyDigest or All, it should not allow the value to be set to MonthlyDigest. We are quite used to these type of services and building REST APIs on top: CustomerProfile becomes a resource that we can query or interact with.

On the other hand, a server's authority could be exposing an algorithm. For example, tokenisation of text is a non-trivial problem that requires not only breaking the text to its words, but also maintaining muti-words and named entities intact. A REST API that exposes this functionality will be a mill.

5 constraints of a Mill

1) It encapsulates an algorithm not a state

Which was discussed ad nauseum, however, the distinction is very important. For example let's say we have an algorithm that you provide the postcode and it returns to you houses within 1 mile radius of that postcode - this is not an example of a mill.

2) Raw data in, processed data out

For example you send your text and get back the translation.

3) Calls are both safe and idempotent

Calling the endpoint should not directly change any state within the server. For example, the endpoint should not be directly mapped to the ML training engine, e.g. sending a text 1000 times skew the trained model for that text. The training endpoint is usually a normal resource, not a mill - see my slides

4) It has a single specialty

And as such, it accepts a single HTTP verb apart from OPTIONS, normally POST (although a GET with entity payload would be more semantically correct but not my preferred option for practical reasons).

5) It is named not as a verb but as a tool

A mill that exposes tokenisation, is to be called tokeniser. In a similar way, classifier would be the appropriate name for a system that classifies on top of a neural network, for example. Or normalising text, would have a normaliser mill.

No this is not the same as an RPC endpoint. No RPC smell. Honest :) That is why those 5 constraints exists.