Thursday, 9 July 2015

Daft Punk+Tool=Muse: word2vec model trained on a small Rock music corpus

In my last blog post, I outlined a few interesting results from a word2wec model trained on half a million news documents. This was pleasantly met with some positive reactions, some of which not necessarily due to the scientific rigour of the report but due to awareness effect of such "populist treatment of the subject" on the community. On the other hand, there were more than some negative reactions. Some believing I was "cherry-picking" and reporting only a handful of interesting results out of an ocean of mediocre performances. Others rejecting my claim that training on a small dataset in any language can produce very encouraging results. And yet others literally threatening me so that I would release the code despite I reiterating the code is small and not the point.

Am I the only one here thinking word2vec is freaking awesome?!

So I am back. And this time I have trained the model on a very small corpus of Rock artists obtained from Wikipedia, as part of my Rock History project. And I have built an API on top of the model so that you could play with the model and try out different combinations to your heart's content - [but please be easy on the API it is a small instance only] :) strictly no bots. And that's not all: I am releasing the code and the dataset (which is only 36K Wiki entries).

But now, my turn to RANT for a few paragraphs.

First of all, quantification of the performance of an unsupervised learning algo in a highly subjective field is very hard, time-consuming and potentially non-repeatable. Google in their latest paper on seq2seq had to resort to reporting mainly man-machine conversations. I feel in these subjects crowdsourcing the quantification is probably the best approach. Hence you would help by giving a rough accuracy score according to your experience.

On the other hand, sorry, those who were expecting to see a formal paper - perhaps in laTex format - you completely missed the point. As others said, there are plenty of hardcode papers out there, feel free to knock yourselves down. My point was to evangelise to a much wider audience. And, if you liked what you saw, go and try it for yourself.

Finally, alluding to "cognition" turned a lot of eyebrows but as Nando de Freitas puts it when asked about intelligence, whenever we build an intelligent machine, we will look at it as bogus not containing the "real intelligence" and we will discard it as not AI. So the world of Artifical Intelligence is a world of moving targets essentially because intelligence has been very difficult to define.

For me, word2vec is a breath of fresh air in a world of arbitrary, highly engineered and complex NLP algorithms which can breach the gap forming a meaningful relationship between tokens of your corpus. And I feel it is more a tool enhancing other algorithms rather than the end product. But even on its own, it generates fascinating results. For example in this tiny corpus, it was not only able to find the match between the name of the artists, but it can successfully find matches between similar bands - able to be used it as a Recommender system. And then, even adding the vector of artists generates interesting fusion genres which tend to correspond to real bands influenced by them.


BEWARE: Tokens are case-sensitive. So u2 and U2 not the same.

The API is basically a simple RESTful flask on top of the model:
where pos and neg are comma separated list of zero to many 'phrases' (pos for similar, and neg for opposite) - that are English words, or multi-word tokens including name of the bands or phrases that have a Wiki entry (such as albums or songs) - list if which can be found here .
For example:

You can add vectors of words, for example to mix genres:
or add an artist with an adjective for example a softer Bob Dylan:
Or subtract:
But the tokens do not have to be a band name or artist names:
If you pass a non-existent or misspelling (it is case-sensitive!) of a name or word, you will get an error:

  result: "Not in vocab: radiohead"

You may pass minimum frequency of the word in the corpus to filter the output to remove the noice:


The code on github as I said is tiny. Perhaps the most complex part of the code is the Dictionary Tokenisation which is one of the tools I have built to tokenise the text without breaking multi-word phrases and I have found it very useful allowing to produce much more meaningful results.

The code is shared under MIT license.

To build the model, uncomment the line in, specifying the location of corpus:

train_and_save('data/wiki_rock_multiword_dic.txt', 'data/stop-words-english1.txt', '<THE_LOCATION>/wiki_rock_corpus/*.txt')


As mentioned earlier, dataset/corpus is the text from 36K Rock music artist entries on the Wikipedia. This list was obtained by scraping the links from the "List of rock genres". Dataset can be downloaded from here. For information on the Copyright of the Wikipedia text and its terms of use please see here.

Sunday, 14 June 2015

Five crazy abstractions my Deep Learning word2vec model just did

Seeing is believing. 

Of course, there is a whole host of Machine Learning techniques available, thanks to the researchers, and to Open Source developers for turning them into libraries. And I am not quite a complete stranger to this field, I have been, on and off, working on Machine Learning over the last 8 years. But, nothing, absolutely nothing for me has ever come close to what blew my mind recently with word2vec: so effortless yet you feel like the model knows so much that it has obtained cognitive coherence of the vocabulary. Until neuroscientists nail cognition, I am happy to foolishly take that as some early form of machine cognition.

Singularity Dance - Wiki

But, no, don't take my word for it! If you have a corpus of 100s of thousand documents (or even 10s of thousands), feed it and see it for yourselves. What language? Doesn't really matter! My money is on that you will get results that equally blow your tops off.

What is word2vec?

word2vec is a Deep Learning technique first described by Tomas Mikolov only 2 years ago but due to its simplicity of algorithm and yet surprising robustness of the results, it has been widely implemented and adopted. This technique basically trains a model based on a neighborhood window of words in a corpus and then projects the result onto [an arbitrary number of] n dimensions where each word is a vector in the n dimensional space. Then the words can be compared using the cosine similarity of their vectors. And what is much more interesting is the arithmetics: vectors can be added or subtracted for example vector of Queen is almost equal to King + Woman - Man. In other words, if you remove Man from the King and add Woman to it, logically you get Queen and but this model is able to represent it mathematically.

LeCun recently proposed a variant of this approach in which he uses characters and not words. Altogether this is a fast moving space and likely to bring about significant change in the state of the art in Natural Language Processing.

Enough of this, show us ze resultz!

OK, sure. For those interested, I have brought the methods after the results.

1) Human - Animal = Ethics

Yeah, as if it knows! So if you remove the animal traits from human, what remains is Ethics. And in word2vec terms, subtracting the vector of Human by the vector of Animal results in a vector which is closest to Ethics (0.51). The other similar words to the Human - Animal vector are the words below: spirituality,  knowledge and piety. Interesting, huh?

2) Stock Market ≈ Thermometer

In my model the word Thermometer has a similarity of 0.72 to the Stock Market vector and the 6th similar word to it - most of closer words were other names for the stock market. It is not 100% clear to me how it was able to make such abstraction but perhaps proximity of Thermometer to the words increase/decrease or up/down, etc could have resulted in the similarity. In any case, likening Stock Market to Thermometer is a higher level abstraction.

3) Library - Books = Hall

What remains of a library if you were to remove the books? word2vec to the rescue. The similarity is 0.49 and next words are: Building and Dorm.  Hall's vector is already similar to that of Library (so the subtraction's effect could be incidental) but Building and Dorm are not. Now Library - Book (and not Books) is closest to Dorm with 0.51 similarity.

4) Obama + Russia - USA = Putin

This is a classic case similar to King+Woman-Man but it was interesting to see that it works. In fact finding leaders of most countries was successful using this method. For example, Obama + Britain - USA finds David Cameron (0.71).

5) Iraq - Violence = Jordan

So a country that is most similar to Iraq after taking its violence is Jordan, its neighbour. Iraq's vector itself is most similar to that of Syria - for obvious reasons. After Jordan, next vectors are Lebanon, Oman and Turkey.

Not enough? Hmm there you go with another two...

Bonus) President - Power = Prime Minister

Kinda obvious, isn't it? But of course we know it depends which one is Putin which one is Medvedev :)

Bonus 2) Politics - Lies = Germans??

OK, I admit I don't know what this one really means but according to my model, German politicians do not lie!

Now the boring stuff...


I used a corpus of publicly available online news and articles. Articles extracted from a number of different Farsi online websites and on average they contained ~ 8KB of text. The topics ranged from local and global Politics, Sports, Arts and Culture, Science and Technologies, Humanities and Religion, Health, etc.

The processing pipeline is illustrated below:

Figure 1 - Processing Pipeline
For word segmentation, an approach was used to join named entities using a dictionary of ~ 40K multi-part words and named entities.

Gensim's word2vec implementation was used to train the model. The default n=100 and window=5 worked very well but to find the optimum values, another study needs to be conducted.

In order to generate the results presented in this post, most_similar method was used. No significant difference between using most_similar and most_similar_cosmul was found.

A significant problem was discovered where words with spelling mistake in the corpus or infrequent words generate sparse vectors which result in a very high score of similar with some words. I used frequency of the word in the corpus to filter out such occasions.


word2vec is relatively simple algorithm with surprisingly remarkable performance. Its implementation are available in a variety of Open Source libraries, including Python's Gensim. Based on the preliminary results, it appears that word2vec is able to make higher levels abstractions which nudges towards cognitive abilities.

Despite its remarkable it is not quite clear how this ability can be used in an application, although in its current form, it can be readily used in finding antonym/synonym, spelling correction and stemming.

Wednesday, 27 May 2015

PerfIt! decoupled from Web API: measure down to a closure in your .NET application

Level [T2]

Performance monitoring is an essential part of doing any serious-scale software. Unfortunately in .NET ecosystem, historically first looking for direction and tooling from Microsoft, there has been a real lack of good tooling - for some reason or another effective monitoring has not been a priority for Microsoft although this could be changing now. Healthy growth of .NET Open Source community in the last few years brought a few innovations in this space (Glimpse being one) but they focused on solving development problems rather than application telemetry.

2 years ago, while trying to build and deploy large scale APIs, I was unable to find anything suitable to save me having to write a lot of boilerplate code to add performance counters to my applications so I coded a working prototype of performance counters for ASP .NET Web API and open sourced and shared it on Github, calling it PerfIt! for the lack of a better name. Over the last few years PerfIt! has been deployed to production in a good number of companies running .NET. I added the client support too to measure calls made by HttpClient and it was a handy addition.
From Flickr

This is all not bad but in reality, REST API calls do not cover all your outgoing or incoming server communications (which you naturally would like to measure): you need to communicate to databases (relational or NoSQL), caches (e.g. Redis), Blob Storages, and many other. On top of that, there could be some other parts of your code that you would like to measure such as CPU intensive algorithms, reading or writing large local files, running Machine Learning classifiers, etc. Of course, PerfIt! in this current incarnation cannot help with any of those cases.

It turned out with a little change and separating performance monitoring from Web API semantic (which is changing with vNext again) this can be done. Actually, not getting much credit for it, it was mainly ideas from two of my best colleagues which I am grateful for their contribution: Andres Del Rio and JaiGanesh Sundaravel.

New PerfIt! features (and limitations)

So currently at version alpha2, you can get the new PerfIt! by using nuget (when it works):
PM> install-package PerfIt -pre
Here are the extra features that you get from the new PerfIt!.

Measure metrics for a closure

So at the lowest level of an aspect abstraction, you might be interested in measuring metrics for a closure, for example:
Action action = Thread.Sleep(1000);
action(); // measure
Or in case of an async operation:
foo result = null;
Func<Task> asyncCall = async () => result = await _command.ExecuteScalar();

// and then
await asyncCall();
This closure could be wrapped in a method of course, but there again, having a unified closure interface is essential in building a common tool: each method can have different inputs of outputs while all can be presented in a closure having the same interface.

Thames Barriers Closure - Flickr. Sorry couldn't find a more related picture, but enjoy all the same
So in order to measure metrics for the action closure, all we need to do is:
var ins = new SimpleInstrumentor(new InstrumentationInfo() 
   Counters = CounterTypes.StandardCounters, 
   Description = "test", 
   InstanceName = "Test instance" 

ins.Instrument(() => Thread.Sleep(100));

A few things here:
  • SimpleInstrumentor is responsible for providing a hook to instrument your closures. 
  • InstrumentationInfo contains the metadata for publishing the performance counters. You provide the name of the counters to raise to it (provided if they are not standard, you have already defined )
  • You will be more likely to create a single instrumentor instance for each aspect of your code that you would like to instrument.
  • This example assumes the counters and their category are installed. PerfitRuntime class provides mechanism to register your counters on the box - which is covered in previous posts.
  • Instrument method has an option to pass the context as a string parameter. This context can be used to correlate metrics with application context in ETW events (see below).

Doing an async operation is not that different:
ins.InstrumentAsync(async () => await Task.Delay(100));

//or even simpler:
ins.InstrumentAsync(() => Task.Delay(100))

SimpleInstrumentor is the building block for higher level abstractions of instrumentation. For example, PerfitClientDelegatingHandler now uses SimpleInstrumentor behind the scene.

Raise ETW events, effortlessly

Event Tracing for Windows (ETW) is a low overhead framework for logging, instrumentation, tracing and monitoring that has been in Windows since version 2000. Version 4.5 of the .NET Framework exposes this feature in the class EventSource. Probably suffice to say, if you are not using ETW you are doing it wrong.

One problem with Performance Counters is that they use sampling, rather than events. This is all well and good but lacks the resolution you sometimes need to find problems. For example, if 1% of calls take > 2 seconds, you need on average 100 samples and if you are unlucky a lot more to see the spike.

Another problem is lack of context with the measurements. When you see such a high response, there is really no way to find out what was the context (e.g. customerId) for which it took wrong. This makes finding performance bottlenecks more difficult.

So SimpleInstrumentor, in addition to doing counters for you, raises InstrumentationEventSource ETW events. Of course, you can turn it off or just leave it as it has almost no impact. But so much better, is that use a sink (Table Storage, ElasticSearch, etc) and persist these events to a store and then analyse using something like ElasticSearch and Kibana - as we do it in ASOS. Here is a console log sink, subscribed to these events:
var listener = ConsoleLog.CreateListener();
listener.EnableEvents(InstrumentationEventSource.Instance, EventLevel.LogAlways,
And you would see:

Obviously this might not look very impressive but when you take into account that you have the timeTakenMilli (here 102ms) and have the option to pass instrumentationContext string (here "test..."), you could correlate performance with the context of in your application.

PerfIt for Web API is all there just in a different nuget package

If you have been using previous versions of PerfIt, do not panic! We are not going to move the cheese, so the client and server delegating handlers are all there only in a different package, so you just need to install Perfit.WebApi package:
PM> install-package PerfIt.WebApi -pre
The rest is just the same.

Only .NET 4.5 or higher

After spending a lot of time writing async code in CacheCow which was .NET 4.0, I do not think anyone should be subjected to such torture, so my apologies to those using .NET 4.0 but I had to move PerfIt! to .NET 4.5. Sorry .NET 4.0 users.

PerfIt for MVC, Windsor Castle interceptors and more

Yeah, there is more coming. PerfIt for MVC has been long asked by the community and castle interceptors can simply remove all cross cutting concern codes out of your core business code. Stay tuned and please provide feedback before going fully to v1!

Sunday, 10 May 2015

Machine Learning and APIs: introducing Mills in REST API Design

Level [C3]

REST (REpresentational State Transfer) was designed with the "state" at its heart, literally, standing for the S in the middle of the acronym.

TL;DR: Mill is a special type of resource where server's authority purely comes from exposing an algorithm, rather than "defining, exposing and maintaining integrity of a state". Unlike an RPC style endpoint, it has to adhere to a set of 5 constraints (see below). 

Historically, when there were only a few thousand servers around, the state was predominantly documents. People were creating, editing and sharing a lot of text documents, and some HTML. With HTTP 1.1, caching and concurrency was built into the protocol and enabled it to represent richer distributed computing concerns and we have been building . With the rising popularity of REST over the last 10 years, much of today's web has been built on top of RESTful thinking, whether what is visible or what is behind the presentation (most external layer) servers. Nowadays when we talk of state, we normally mean data or rather records persisted in a data store (relational or non-relational). A lot of today's data, directly or indirectly, is created, updated and deleted using REST APIs. And this is all cool, of course.

When we design APIs, we map the state into REST Resources. It is very intuitive to think of resources as collection and instance. It is unambiguous and useful to communicate these concepts when for example we refer to /books and /books/123 URLs, as the collection or instance resources, respectively. We interact with these resources using verbs, and although HTTP verbs are not meant to be used just for CRUD operations, interacting with the state that exists on the server is inherent in the design.

But that is not all the story. Mainstream adoption of Machine Learning in the industry means we need to expose Machine Learning applications using APIs. The problem is the resource oriented approach of REST (where the state is at the heart of the design) does not work very well.

By the way, I am NOT 51... is an example of a Machine Learning application where instead of being an application, it could have been an API. For example (just for illustration, you could use other media types too):

POST /age_gender_classifier HTTP/1.1
Content-Type: image/jpeg
And the response:
200 OK
Content-Type: application/json


Server is generating a response to the request by carrying out complex face recognition and running a model, most likely a deep network model. Server is not returning a state stored on the server, in fact this whole process is completely stateless.

And why does this matter? Well I feel if REST is supposed to move forward with our needs and use cases, it should define, clarify, internalise and finally digest edge cases. While such edge cases were pretty uncommon, with the rise and popularity of Machine Learning, such endpoints will be pretty standard.

A few days ago, on the second day of APIdays Mediterranea 2015 conference, I presented a talk on Machine Learning and APIs. And in this talk I presented simple concept of Mills. Mills, where you take your wheat to be ground and you carry back the flour.

Basically, it all goes back to the origin of a server's authority. To bring an example, a "Customer Profile" service, exposed by a REST API, is the authority to go to when another service requires access to a customer's profile. The "Customer Profile" service has defined a state, which is profile of the customers, and is responsible for ensuring integrity of the state (enforcing business rules on the state). For example, marketing email preference can have values of None, WeeklyDigest or All, it should not allow the value to be set to MonthlyDigest. We are quite used to these type of services and building REST APIs on top: CustomerProfile becomes a resource that we can query or interact with.

On the other hand, a server's authority could be exposing an algorithm. For example, tokenisation of text is a non-trivial problem that requires not only breaking the text to its words, but also maintaining muti-words and named entities intact. A REST API that exposes this functionality will be a mill.

5 constraints of a Mill

1) It encapsulates an algorithm not a state

Which was discussed ad nauseum, however, the distinction is very important. For example let's say we have an algorithm that you provide the postcode and it returns to you houses within 1 mile radius of that postcode - this is not an example of a mill.

2) Raw data in, processed data out

For example you send your text and get back the translation.

3) Calls are both safe and idempotent

Calling the endpoint should not directly change any state within the server. For example, the endpoint should not be directly mapped to the ML training engine, e.g. sending a text 1000 times skew the trained model for that text. The training endpoint is usually a normal resource, not a mill - see my slides

4) It has a single specialty

And as such, it accepts a single HTTP verb apart from OPTIONS, normally POST (although a GET with entity payload would be more semantically correct but not my preferred option for practical reasons).

5) It is named not as a verb but as a tool

A mill that exposes tokenisation, is to be called tokeniser. In a similar way, classifier would be the appropriate name for a system that classifies on top of a neural network, for example. Or normalising text, would have a normaliser mill.

No this is not the same as an RPC endpoint. No RPC smell. Honest :) That is why those 5 constraints exists.

Wednesday, 22 April 2015

Pilgrimage into the world of Tarkovsky: through the eyes of hope and suffering

[Level N]

The world is not perfect. It has given us scientists, authors, artists and politicians - and I have lived enough to know none of them were really perfect. Among these, we have personal heroes, personalities that have made great discoveries, built wonderful things or have lived extraordinary lives. Whether it is Obama, Einstein or George Orwell, they have their deficiencies.

I am saying this because the word Pilgrimage in the title can put you off. In fact it puts me off. But ... it is there for a reason, and I hope by the time you finish reading - if you hang on long enough - you would see it.

*  *  * 

A stuttering boy who finally mutters a few words with no pause after a session of hypnotherapy, and then leading to a black screen of titles with the music of Bach, is not a typical opening scene. But this for me has been the most memorable opening among all the films I have seen. If you are looking to describe the body of work by the late director Tarkovsky, look no further, it is all there in the opening scene of The Mirror (1975). This scene somehow encapsulates Tarkovsky's view of himself. A timid lad who can barely speak two words in sequence without constantly stuttering but with the help of "supernatural" powers can speak and tell us his stories. And the process is painful for him, it is only achieved with determination and sacrifice.

*  *  * 

Stumbling a few times along the way, I find my way with difficulty through the aisles of the dark cinema. I think I have missed the first few minutes but that should be OK.

I am lucky to be here. After queueing several hours in a cold sunny day on February 1988, I have managed to buy a ticket to Tarkovsky's Stalker (1979) in Fajr Film Festival. A special section of the Festival is dedicated to the memory of late Tarkovsky who died the previous year and they are showing all his films - with understandable cuts when it does not meet with "the code", at the end of the day Iran is run as an Islamic country. These are the films that intellectuals go to - and I should go to since I am planning to become one!

And I sit there in the dark, watching this 220 minute epic where very few things actually happen. And the film is in fluent Russian with no subscript!

And through the confusion of barely knowing the storyline, and not getting any of the dialogues, as a young 19-year-old student, I am mesmerised. The film works its way through me, somehow, precipitates deep marks that are ingrained with me until this day. The film communicates with a strange language whom I feel I have known but very remotely, as if in a previous life. It is hazy, sublime and next to impossible to translate to words.

And next thing I know, I am sitting watching Mirror (this time it is the public screening and is translated) and incoherent images and storylets come and go, with apparently no relationship. And yet, by the end, I cannot control myself and my eyes are wet. And again, I have no explanation, when being accused of pretentious intellectualism or sentimentalist.

My journey (or Pilgrimage) has started. These films, I lived with. They grew with me, and gradually, over quarter of a century, made sense. And this post is about why and how.

*  *  * 

It was not a coincident that in the same Fajr Festival 1988 there was a screening of Parajanov's Colour of Pomegranates. It is generally believed that films of Tarkovsky and Parajanov are very similar. Tarkovsky indeed was a fan of Parajanov works and I later found out they were in fact friends. I did manage to watch it later on the public screening but when it was even more bizarre, I did not like it. Form is the vehicle to deliver meaning and not the meaning itself. Parajanov felt overly concerned with form and while narrative and a story of love is there, the meaning is shallow and bare symbolism hurts the film.

Going back to Tarkovsky, "the meaning" is not easy to grasp. Commonly there are different interpretations and even it is said that his films are meant to take us to a personal journey to understand hence all interpretations are correct - so post-modern!

Did Tarkovsky hide specific messages for us to grasp in his often difficult and unusual films? If it is true then although personal interpretation is not full without its merits (it can have a spiritual or inspirational effect), we will be missing the point. Most works of art (and even more so for the music and modern art) are open to personal interpretation. Abstract paintings famously invite us to find our personal comprehension of the work of art. But how about Tarkovsky?

Only he can answer us. And he did.

*  *  * 

It is very rare for a director to uncover his tricks and spoil the meaning of his films in a book. Well, he did not quite do that in Sculpting in time but he did reveal his vision of cinema as an art form. And more importantly, why he made his films. While for many, making film is a means of gaining fame, a career, or a vehicle to project one's intellectual viewpoints, or (as Tarkovsky refutes) as a means of self expression, for Tarkovsky it was a selfless and painful endevour to fulfil a responsibility he was trusted with. While for some, making on average 1 film every 7 hours means they were striving for perfection, for him it was painstakingly ensuring his duty in this world gets fulfilled.

What do we mean by responsibility? Hard to explain in words but easier to point you to his films. We get to meet Tarkovsky himself in his films. Whom do you think Andrei Rublev was then?! An artist monk, sick of the decadence of the world, taking a vow of silence only to understand at the end that he cannot forfeit his duty as an artist. His work will involve suffering but that is the sacrifice he is meant to make. An artist is not free, despite the theories of modern art, artist is not solely responsible to himself and his art. Tarkovsky shunned the modern art:
"Modern art has taken a wrong turn in abandoning search for the meaning of existence in order to affirm the value of the individual for its own sake."
Tarkovsky sees the process of making art as a consummation of the artist for the cause - he called artists "sufferers". Artist is a martyr and artistic creation a sanctimonial sacrifice:
"Artistic creation demands that he 'perish utterly' in the full tragic sense of those words."
The word self-expression, this inner looking for fulfilment, utterly made him frustrated with the artistic culture of the day. Artist himself is the last person to gain from the artistic creation - very much like the character Stalker that could not benefit himself from "The Room", nor any of the other Stalkers.
"The artist is always a servant, and is perpetually trying to pay for the gift that has been given to him as if by miracle."
Also, the artist is not merely an intellectual concerned with the abstract notions of his art form, but he is an evangelist (in its literal meaning) making his art for everyone:
"Art addresses everybody, in the hope of making an impression, ... of winning people not by incontrovertible rational argument but through the spiritual energy with which the artist has charged the work."
And oh boy, that spiritual energy that sets you on fire, making you look for the answer - in my case for quarter a century, Now probably it makes a lot more sense to think of this man as a prophet.

*  *  * 

Tarkovsky films are slow - for some, painfully slow. They contain many long takes, and this by itself does not signify a technique but it is a by-product of his vision and language for the cinema as an art form. This vision was used later by Bela Tarr, a true student of this vision.

On the surface, it could appear that this is a stylistic decision to come up with a unique formalism, a pretentious intellectual gesture. But Tarkovsky himself disdained pure experimentation to come up with a new formalism:
"People talk about experiment and search above all in relation to the avant-garde. But what does it mean? ... For the work of art carries within it an integral aesthetic and philosophical unity; it is an organism ... Can we talk of experiment in relation to the birth of a child? It is senseless and immoral."
And this again reminds of the burden of responsibility he felt in making his films. On the other hand, he is regarded as one of the proponents of "poetic cinema", a term that Tarkovsky himself find almost offensive:
"I find particularly irritating the pretensions of modern 'poetic cinema', which involves breaking off contact with fact and with time realism."
Tarkovsky talks of the works of arts that have inspired him and have shaped his artistic language. These range from late middle ages icons, Italian paintings of the renaissance period to the works of literature by Dostoevsky, Tolstoy, Goethe and finally to the films by Dovzhenko, Bresson and others. In an effort to describe an ideal piece of work, he brings an example from a relatively obscure painter of the renaissance period, whose painting had a deep effect on him. In contrast to Raphael's "Virgin and Child", he was captivated by the inexplicability of the works of Carpaccio.

Preparation of Christ's Tomb (1505) - Vittorio Carpaccio

Back to cinema, he believed that the ideal film is countless meters of celluloid capturing entire life of a person. This probably make it easier to understand why his films were usually longer than 2 hours and in case of Stalker 3 hours and 40 minutes! Tarkovsky believed that a work of art need to be true to life. And when we think of life, there are fast cuts and edits: it is a very long take.
"I want to make the point yet again that in film, every time, the first essential in any plastic composition ... is whether it is true to life."
Tarkovsky explains some of the techniques he used in order to make his scenes have a deeper impression on the viewer. These techniques move away from the cinematic languages of the time, from the cliches symbolisms of the common cinematic cinema. They usually enhance and magnify the image and make it imprint on our psyche. An example, is the scene from Mirror where the Doctor meets Mother and at the end of the scene a strong wind blows, making the Doctor look back towards the house.

All in all, for Tarkovsky, a masterpiece is a work of art that you cannot remove anything from it without completely destroying the work. And that is exactly what he saw in the works of Carpaccio - a unity of that cannot be broken. As such, it is really difficult to pinpoint what makes the masterpiece exceptional as it is.

*  *  * 

So where does Tarkovsky get his inspiration from? Where is his true role model?
It might come as a surprise for some but Tarkovsky was a devout Christian. He knew two of the gospels by heart and would recite them in conversations. His book is full of quotations of the New Testament (1 Corinthians a favourite of him) and phrases that can only mean he truly believed. He was not after happiness (remember Stalker - the Black Dog of depression):
"Let us imagine for a moment that people have attained happiness ... Man becomes Beelzebub."
He saw a strong similarity between art and religion:
"In art, as in religion, intuition is tantamount to conviction, to faith. It is a state of mind, not a way of thinking."
He felt a deep connection in his role to that of an evangelist:
"Art ... expresses its own postulate of faith."
And his role model: of course, Jesus: Selfless sacrifice, Servant, For everyone, Winning people. All his films and writings points to him. It is not accidental that we hear John's Revelation in Stalker. As apocalyptic as the film is, this could have not been more literal. No accident that we meet God in Solaris (Ocean), or Stalker is so stricken by the lack of faith of others, and in Sacrifice, one can save everyone.

Commonly people ask why he made his films so difficult under layers of meanings. Why? Exactly for the same reason Jesus, as a teacher, used parables to convey messages, not plainly.

*  *  *

And my quest is not finished, but surely it has eased off. After believing in Jesus in 2001, I revisited Tarkovsky again lately. Now all symbols and meanings crystal clear. I feel very close to what he tried so hard to shape into images. It just makes sense.

Messages and meanings ... and what are those? It will be clear by the process of your personal pilgrimage. And it could begin now...

Friday, 3 April 2015

Utilisation and High Availability analysis: Containers for Microservices

Microservices? Is this not the same SOA principles repackaged and sold under different label? Not this time, I will attend this question in another posts. But if you are considering Microservices for your architecture, beware of the cost and availability concerns. In this post we will look at how using containers (such as Docker) can help you improve your cloud utilisation, decrease costs and above all improve availability.

Elephant in the room: most of the cloud resources are under-utilised

We almost universally underestimate how long it takes to build a software feature. Not sure it is because our time is felt more precious than money, but for hardware almost always the reverse is true: we always overestimate hardware requirements of our systems. Historically this could have been useful since commissioning hardware in enterprises usually a long and painful process and on the other hand, this included business growth over the years and planned contingency for spikes.
But in an elastic environment such as cloud? Well it seems we still do that. In UK alone £1bn is wasted on unused or under-utilised cloud resource.

Some of this is avoidable, by using elasticity of the cloud and scaling up and down as needed. Many cloud vendors provide such functionality out of the box with little or no coding. But many companies already do that, so why waste is so high?

From personal experience I can give you a few reasons why my systems do that...

Instance Redundancy

Redundancy is one of the biggest killers in the computing costs. And things do not change a lot being in the cloud: vendors' availability SLAs usually are defined in a context of redundancy and to be frank, some of it purely cloud related. For example, on Azure you need to have your VMs in an "availability set" to qualify for VM SLAs. In other words, at least 2 or more VMs are needed since your VMs could be taken for patching at any time but within an availability zone this is guaranteed not to happen on all machines in the same availability zone at the same time.

The problem is, unless you are company with massive number of customers, even a small instance VM could suffice for your needs - or even for a big company with many internal services, some services might not need big resource allocation.

Looking from another angle, adopting Microservices will mean you can iterate your services more quickly releasing more often. The catch is, the clients will not be able to upgrade at the same time and you have to be prepared to run multiple versions of the same service/microservice. Old versions of the API cannot be decommissioned until all clients are weaned off the old one and moved to the newer versions. Translation? Well some of your versions will have to run on the shoestring budget to justify their existence.

Containerisation helps you to tap into this resource, reducing the cost by running multiple services on the same VM. A system usually requires at least 2 or 3 active instances - allowing for redundancy. Small services loaded into containers can be co-located on the same instances allowing for higher utilisation of the resources and reduction of cost.

Improved utilisation by service co-location

This ain't rocket science...

Resource Redundancy

Most services have different resource requirements. Whether Network, Disk, CPU or memory, some resources are used more heavily that others. A service encapsulating an algorithm will be mainly CPU-heavy while an HTTP API could benefit from local caching of resources. While cloud vendors provide different VM setups that can be geared for memory, Disk IO or CPU, a system still usually leaves a lot of redundant resources.

Possible best explained in the pictures below. No rocket science here either but mixing services that have different resource allocation profiles gives us best utilisation.

Co-location of Microservices having different resource allocation profile

And what's that got to do with Microservices?

Didn't you just see it?! Building smaller services pushes you towards building ad deploying more services many of which need the High Availability provided by the redundancy but not the price tag associated with it.

Docker is absolutely a must-have if you are doing Microservices or you are paying through the nose for your cloud costs. In QCon London 2015, John Wilkes from Google explained how they "start over 2 billion containers per week". In fact, to be able to take advantage of the spare resources on the VMs, they tend to mix their Production and Batch processes. One difference here is that the Live processes require locked allocated resources while the Batch processes take whatever is left. They analysed the optimum percentages minimising the errors while keeping utilisation high.

Containerisation and availability

As we discussed, optimising utilisation becomes a big problem when you have many many services - and their multiple versions - to run. But what would that mean in terms of Availability? Does containerisation improve or hinder your availability metrics? I have not been able to find much in the literature but as I will explain below, even if you do not have small services requiring VM co-location, you are better off co-locating and spreading the service onto more machines. And it even helps you achieve higher utilisation.

By having spreading your architecture to more Microservices, availability of your overall service (the one the customer sees) is a factor of availability of each Microservice. For instance, if you have 10 Microservices with availability of 4 9s (99.99%), the overall availability drops to 3 9s (99.9%). And if you have 100 Microservice, which is not uncommon, obviously this drops to only two 9s (99%). In this term, you would need to strive for a very high Microservice availability.

Hardware failure is very common and for many components it goes above 1% (Annualised Failure Rate). Defining hardware and platform availability in respect to system availability is not very easy. But for simplicity and the purpose of this study, let's assume failure risk of 1% - at the end of the day our resultant downtime will scale accordingly.

If service A is deployed onto 3 VMs, and one VM goes down (1%), other two instances will have to bear the extra load until another instance is spawned - which will take some time. The capacity planning can leave enough spare resources to deal with this situation but if two VMs go down (0.01%), it will most likely bring down the service as it would not be able cope with the extra load. If the Mean Time to Recovery is 20 minutes, this alone will dent your service Microservice availability by around half of 4 9s! If you have worked hard in this field, you would know how difficult it is to gain those 9s and losing them like that is not an option.

So what's the solution? This diagram should speak for more words:

Service A and B co-located in containers, can tolerate more VM failures

By using containers and co-locating services, we spread instance more thinly and can tolerate more failures. In the example above, our services can tolerate 2 or maybe even 3 VM failures at the same time.


Containerisation (or Docker if you will) is a must if you are considering Microservices. It helps you with increasing utilisation, bringing down cloud costs and above all, improves your availability.

Tuesday, 10 March 2015

QCon London 2015: from hype to trendsetting - Part 1

Level [C3]

This year I could make it to the QCon London and I felt it might be useful to write up a summary for those who liked to be there but did not make it for any reason. This will also an opportunity to get my head together and summarise a couple of themes, inspired by the conference.

Quality of the talks was varied and initially pretty disappointing on the first day but rose to a real high on the last day. Not surprisingly, Microservices and Docker were the buzzwords of the conference and many of the talks had one or the other in their title. It was as if, the hungry folks were being presented Microservices with ketchup and next it would be with Mayonnaise and yet nothing as good as Docker with Salsa. In fact it is very easy to be skeptic and sarcastic about Microservices or Docker and disregard them as a pure hype.

After listening to the talks, especially ones on the last day, I was convinced that with or without me, this train is set to take the industry forward. Yes, granularity of the Microservices (MS) have not been crisply defined yet, and there is a stampede to download and install Microservices on old systems and fix the world. Industry will abuse it as it reduced SOA to Web Services by adding just a P to the end. Yes, there are very few people talking about the cost of moving to MS and explore the cases where you should stay put. But if your Monolith (even though pays lip service to SOA) has ground the development cycle to a halt and is killing you and your company, there is a thing or two to learn here.

Disclaimer: This post by no means is a comprehensive account of the conference. This is my personal take on QCon London 2015 and topics discussed, peppered with some of my own views, presented as a technical writing.


Yeah I know you are fed up with hearing the word - but bear with me for a few minutes. Microservices reminded me of my past life: it is a syndrome. A medical syndrome when it is first being described, does not have to have the Aetiology and Pathophysiology all clear and explained - it is just a syndrome, a collection of signs and symptoms that occur together. In the medical world, there could be years between describing a syndrome and finding what and why.

And this is what we are dealing here within a different discipline: Microservice is an emerging pattern, a solution to a contextual problem that has indeed occurred. It is a phenomenon that we are still trying to figure out - a lot of head scratching is going on. So bear with it and I think we are for a good ride beyond all the hype.

Its two basic benefits are mainly: smaller deployment granularity enabling you to iterate faster and smaller domain to focus, understand and improve. For me the first is the key.

So here are a breakdown of few key aspects of the Microservices.

Conway, Conway, Where Art Thou

A re-occurring theme (and at points, ad nauseum) was that MS is the result of reversing cause and effect in the Conway's law and using it to your advantage: build smaller teams and your software will shape like it. So in essence, turning Conway's law on its head and use it as a tool to naturally result in a more loosely coupled architecture.

This by no means is new, Amazon has been doing this for a decade. Size of the teams are nicely defined by Jeff Bezos as "Two Pizza Teams". But what is the makeup of these teams and how do they operate? As again described by Amazon, they are made up of elements of a small company, a start-up, including developers, testers, BA, business representative and more importantly operations, aka Devops.

Another point stressed by Yoni Goldberg from Gilt and Andy Shoup was that the teams charge other teams for using their services and need to look after their finances. They found that doing this reduced costs of the team by 90% - mainly due to optimising cloud and computing costs.

Granularity: "fits in my head" (does it?)

One of the key challenges of Microservices was to define the granularity of a Microservice differentiating it from the traditional SOA. And it seems we have now up a definition: "its complexity fits one's head".

What? This to me is a non-definition and on any account, it is a poor definition (sorry Dan). After all, there is nothing more subjective than what fits one's head, is it? And whose head by the way? if it is me, I cannot keep track of what I ate for breakfast and lunch at the same time (if you know me personally, you must have noticed my small head) and then we get those giants that can master several disciplines or understand the whole of an uber-domain.

One of the key properties of a good definition is that it is tangible, unambiguous and objectively prescriptive. Jeff Bezos was not necessarily a Pizza lover to use it to define Amazon team sizes.

In the absence of any tangible definition, I am going to bring my own - why not? This is really how I feel like the granularity of a MS should be, having built one or two, and I am using tangible metrics to define it.

Granularity of Microservices - my definition

As it is evident, Cross-cutting concerns of a Microservice are numerous. From security, availability, performance to routing, versioning, discovery, logging and monitoring. For a lot of these concerns, you can rely on the existing platform or common platform-wide guidelines, tools and infrastructure. So the crux of the sizing of the Microservice is its core business functionality, otherwise with regard to non-functional requirements, it would share the same concerns as traditional services.

When not to Microservice

Yoni Goldberg from Gilt covered this subject to some level. He basically said do not start with Microservice, build them when your domain complexity warrants it. He went through his own experience and how they improved upon the ball of mud to nice discreet service and then how they exploded the number of services when their
So takeaways (with some personal salt and pepper) I would say is do NOT consider Microservice if:
  • you do not have the organisation structure (small cross functional teams)
  • you are not practising Devops, automated build and deployment
  • you do not have (or cannot have) an uber monitoring system telling you exactly what is happening
  • you have to carry along a legacy database
  • your domain is not too big

Microservices is an evolutionary process

Randy Shoup explained how the process towards Microservice has been an evolutionary one, usually starting with the Monolith. So he stressed "Evolution, not intelligent design" and how in such an environment, Governance (oh yeah, all ye Enterprise Architects listen up) is not the same as traditional SOA and is decentralised with its adoption purely based on how useful a practice/ is.

Optimised message protocols now a must

Frequented in more than a couple of talks, moving to ProtoBuf, Avro, Thrift or similar seems to be a must in all but trivial Microservice setups. One of the main performance challenges in MS scenarios is network latency and cost of serialisation/deserialisation over and over across multiple hops and JSON simply does not cut it anymore

Source: Thrift vs Protobuf comparison (
Be ready to move your APIs to these message protocols - yes you lose some simplicity benefits but trading it off for performance is always a necessary evil to make. Rest assured nothing stops you to use JSON while developing and testing, but if your game is serious, start changing your protocols now - and I am too, item already added to the technical backlog.

What I was hoping to hear about and did not

Microservice registry and versioning best practices was not mentioned at all. I tried to quiz a few speakers on these but did not quite get a good answer. I suppose the space is open for grab.

Need for Composition Services/APIs

As experienced personally, in an MS environment you would end up with two different types of services: Functional Microservice where they own their data and are the authority in their business domain and Composition APIs which do not normally own any data and bring value by composing data from several other services - normally involving some level of business logic affecting the end user. In DDD terms, you could somehow find similarity with Facade services and Yoni used the word "mid-tier services".

Composition services can bring a lot of value when it comes to caching, pagination of data and enriching the information. They practically scatter the requests and gather the results back and compose the result - Fan-out is another term used here.

By inherently depending on many services, they are notoriously susceptible to performance outliers (will be discussed in the second post) and failure scenarios which might warrant a layered cache backed by soft storage with a higher expiry for fallback in case dependent service is down.

In the next post, we will look into topics below. We will discover why Docker in fact is closely related to the Microservices - and it is not what you think! [Do I qualify now to become a BusinessInsider journalist?]
  • Those pesky performance outliers
  • Containers, containers
  • Don't beat the dead Agile
  • Extra large memory computing is now a thing