tag:blogger.com,1999:blog-28894168252502548812024-03-06T04:17:48.137+00:00Byte RotOverly Ambitious to lead a humble lifealiostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.comBlogger100125tag:blogger.com,1999:blog-2889416825250254881.post-41950298536761699322022-04-11T15:03:00.005+01:002022-04-23T13:58:47.849+01:00Sending UDP packets from docker container to localhost<p>It has been a long while since my last post, and now instead of a proper post, just a trivial note on the issue I had. Yet it might be the one saving hours for you out there, the hours I wasted yesterday trying so many different combinations just to find out what was trying to do was <i>impossible</i>.</p><p>So what is impossible?</p><p></p><blockquote>Sending UDP packets (and more generally: just communicating) from a docker container to a listener running on the <b>localhost</b> is impossible.</blockquote><p></p><p>What... surely this is a joke?!</p><p>This is no joke. I was running a UDP listener for a <a href="https://github.com/aliostad/TraceView">tiny project</a> I was working on and it was hosted on port 1969 (arguably the best port number, after the best year in rock music) with "localhost" as the host name (rather the IP listening on). I needed to generate a lot of data being sent and I created a little docker container to send lots of UDP packets.</p><p>I changed the IP to <span style="font-family: courier;">0.0.0.0</span> surely will help? Nope.</p><p>Then I stumbled on <a href="https://stackoverflow.com/questions/43961530/by-default-can-a-docker-container-call-hosts-localhost-udp">this</a> which essentially suggesting using network mode of "host" is needed to do the trick. That too to no avail.</p><p>It made me doubt my solution so I resorted to landing on the bash of the container and trying with <span style="font-family: courier;">netcat</span>. When the error persisted I realised I am dealing with an impossible scenario.</p><p><br /></p><h3 style="text-align: left;">The solution</h3><p>Well the solution in the end in my case was this: just give up, do not send to <i>localhost -</i> instead send to <span style="font-family: courier;">host.docker.internal</span>. </p><p>This might be a particularity of MacOS implementation of the docker that you cannot reach the host (the other posts were suggesting the could reach localhost) but all in all, for your scenarios in order for the container to reach the host networking, just use that. This has been mapped to 192.168.65.2 on Mac and Windows implementation of the docker seems to use the same IP.</p><p>That is it.</p><p>I hope someone might find this useful.</p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p><p><br /></p>
<script src="https://softxnet.s3-eu-west-1.amazonaws.com/sh/js/shCore.js" type="text/javascript">
</script>
<script src="https://softxnet.s3-eu-west-1.amazonaws.com/sh/js/shBrushJScript.js" type="text/javascript">
</script>
<script src="https://softxnet.s3-eu-west-1.amazonaws.com/sh/js/shBrushCSharp.js" type="text/javascript">
</script>
<script src="https://softxnet.s3-eu-west-1.amazonaws.com/sh/_ga.js" type="text/javascript">
</script>aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com0tag:blogger.com,1999:blog-2889416825250254881.post-74030944540574739412019-08-13T22:35:00.002+01:002022-01-28T09:49:50.254+00:00What is fluentd and its support in PerfIt<script src="https://softxnet.s3-eu-west-1.amazonaws.com/sh/js/shCore.js" type="text/javascript">
</script>
<script src="https://softxnet.s3-eu-west-1.amazonaws.com/sh/js/shBrushJScript.js" type="text/javascript">
</script>
<script src="https://softxnet.s3-eu-west-1.amazonaws.com/sh/js/shBrushCSharp.js" type="text/javascript">
</script>
<script src="https://softxnet.s3-eu-west-1.amazonaws.com/sh/_ga.js" type="text/javascript">
</script>
<br />
After a few years of several contending products and tools in the <em>tracing middleware</em> space, it seems the industry has settled on <a href="https://docs.fluentd.org/">fluentd</a>. I believe this is probably because it has maintained technology-neutrality when it comes to ultimate destination of metrics and logs, i.e. storage/visualisation/search/alerting aspect.<br />
<br />
The journey had started with Etsy’s <a href="https://github.com/statsd/statsd/wiki">statsd</a> back in 2012. It allowed for collection of metrics mainly using UDP which started an important if not revolutionary trend. On the other hand, <a href="https://zipkin.io/">Zipkin</a> project came out of internal works in Twitter inspired by Google’s <a href="https://ai.google/research/pubs/pub36356">Dapper paper</a>.<br />
<br />
<div class="has-line-data" data-line-end="8" data-line-start="7">
Zipkin had poised to become the universal Open Source distributed tracing solution. It certainly inspired the OpenTracing initiative but the more OpenTracing grew, the more it dented the status of Zipkin, as OpenTracing improved on some of the shortcomings of Zipkin that were kept in order to maintain backward compatibility. I actually worked with Zipkin community and its amazing community leader, Adrian Cole, to bring Zipkin support to Azure Event Hubs. I also started adding support for Zipkin to the PerfIt project (I did not get a chance to use it in anger and I doubt anyone else used it). Regardless of its merits, drawbacks and its future, it couples middleware with storage and search and is not a pure middleware.</div>
<div class="has-line-data" data-line-end="8" data-line-start="7">
<br /></div>
Another contender was LogStash which then morphed into <a href="https://www.elastic.co/products/beats">Beats</a> and was the default choice for collection of metrics and logs in Elasticsearch. But then again, these technologies are optimised for use with Elasticsearch and Kibana.<br />
<br />
<div class="has-line-data" data-line-end="8" data-line-start="7"><br /></div><div class="has-line-data" data-line-end="10" data-line-start="9">
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhu2PWJBqAPBa2t5UBEmvyYmmZh17TRcOmfZifJ05t4trEhHvWoa0WjBXwGGsAeKZ7srDtRMrDssKFHT1xz3YSa0JzbsM2bwV2HrMk6FONtJjLH4SSINM7cdfMi_ZM0z9P0k-cqq0ftQKpp/s1600/Screenshot+2019-08-14+at+22.48.03.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="805" data-original-width="1600" height="322" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhu2PWJBqAPBa2t5UBEmvyYmmZh17TRcOmfZifJ05t4trEhHvWoa0WjBXwGGsAeKZ7srDtRMrDssKFHT1xz3YSa0JzbsM2bwV2HrMk6FONtJjLH4SSINM7cdfMi_ZM0z9P0k-cqq0ftQKpp/s640/Screenshot+2019-08-14+at+22.48.03.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<br /></div>
<div class="has-line-data" data-line-end="10" data-line-start="9">
<br /></div>
<h2 class="code-line" data-line-end="12" data-line-start="11">
<a href="https://www.blogger.com/null" id="fluentd_is_ubiquitous_11"></a>fluentd is ubiquitous</h2>
<div class="has-line-data" data-line-end="13" data-line-start="12">
Now, with fluentd being a default choice with Kubernetes, it understandably has gained wide adoption. Having said that, I believe it says more about fluentd than Kubernetes as there is so much to love about fluentd since it:</div>
<ul>
<li>Is simple and straightforward</li>
<li>Is Open Source</li>
<li class="has-line-data" data-line-end="17" data-line-start="16">Supports many inputs/transports (HTTP, UDP, file)</li>
<li class="has-line-data" data-line-end="18" data-line-start="17">Supports common output types (Elasticsearch, s3, kafka)</li>
<li class="has-line-data" data-line-end="19" data-line-start="18">Supports many formats/parsers (csv, json, nginx)</li>
<li class="has-line-data" data-line-end="20" data-line-start="19">Supports transformations and buffering</li>
<li class="has-line-data" data-line-end="21" data-line-start="20">Does not try to solve high availability and instead provides probes so that infrastructure can monitor</li>
<li class="has-line-data" data-line-end="23" data-line-start="21">Provides extension points using the plugin model</li>
</ul>
<div class="has-line-data" data-line-end="24" data-line-start="23">
So simply, it can accept logs, metrics and traces from your applications, while running as a Kubernetes service, a sidecar to your application or a simple deamon, and then push them to your destination of choice so your application does not have to worry about buffering, retry, destinations, etc. Using UDP which is essentially fire and forget, you completely decouple trace collection from your code - practically zero to little overhead.<br />
<br />
While it initially targeted as a Big Data transformation middleware, it is clear now that its strength will be in the observability space. That is why I believe it will be a glue that will be binding producers and consumers of observability data for the coming years. And that is the main reason I felt I had to support it in PerfIt.</div>
<div class="has-line-data" data-line-end="24" data-line-start="23">
<br /></div>
<h2 class="code-line" data-line-end="26" data-line-start="25">
<a href="https://www.blogger.com/null" id="PefIt_and_fluentd_25"></a>PerfIt and fluentd</h2>
<div class="has-line-data" data-line-end="27" data-line-start="26">
For those of you who are not familiar with PerfIt, I started the project more than 6 years ago for instrumenting .NET systems - in the absence of similar tool. Initially it was meant only for custom Windows performance counters but gradually grew to support ETW and finally with .NET Core to support other forms of transport such as Azure Event Hubs.</div>
<div class="has-line-data" data-line-end="29" data-line-start="28">
Now I support fluentd using UDP transport.</div>
<div class="has-line-data" data-line-end="29" data-line-start="28">
<br /></div>
<h2 class="code-line" data-line-end="31" data-line-start="30">
<a href="https://www.blogger.com/null" id="Getting_started_30"></a>Getting started</h2>
<div class="has-line-data" data-line-end="32" data-line-start="31">
I have blogged in the past on using PerfIt but essentially you create an instrumentor and instrument a piece of code:</div>
<div class="has-line-data" data-line-end="32" data-line-start="31">
<pre class="brush:csharp">var si = new SimpleInstrumentor(new InstrumentationInfo()
{
Description = "Instruments your code!",
Name = "general",
CategoryName = "my app",
InstanceName = "Sleep100"
});
si.Instrument(() =>
{
Thread.Sleep(100);
});</pre>
<div>
You can also achieve this using AOP with attributes in ASP.NET Web Api or Core MVC. </div>
<br />
<div>
You can register tracers which are essentially outputs/sinks for your captured metrics. Currently Azure EventHubs, Zipkin and fluend are supported - the latter is the new addition:</div>
<pre class="brush:csharp">// add the tracer after creating instrumentor - sends UDP datagrams to localhost on port 5160
si.Tracers["fluentd"] = new UdpFluentdTracer("localhost", 5160);</pre>
That is really it! This means all your *sampled* traces will be sent to your fluentd. Easy peasy...
</div>
<script type="text/javascript">
SyntaxHighlighter.all();
</script>aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com0tag:blogger.com,1999:blog-2889416825250254881.post-90843062026198046052019-05-18T15:55:00.001+01:002019-05-18T22:45:47.315+01:00Why WebAssembly Matters?<script src="https://s3-eu-west-1.amazonaws.com/softxnet/sh/js/shcore.js" type="text/javascript">
</script>
<script src="https://s3-eu-west-1.amazonaws.com/softxnet/sh/js/shbrushjscript.js" type="text/javascript">
</script>
<script src="https://s3-eu-west-1.amazonaws.com/softxnet/sh/js/shbrushcsharp.js" type="text/javascript">
</script>
<script src="https://s3-eu-west-1.amazonaws.com/softxnet/sh/_ga.js" type="text/javascript">
</script>
WebAssembly as a specification-first technology, is probably one of the few innovations of the web where the spec is not a retrospective documentation of the technology. WebAssembly community group started in 2015 and with its version 1.0, nowadays it <strong>feels</strong> like it has crossed from an <em>experimental</em> status into a <em>production-ready first-generation</em> technology - let alone the same also being claimed on its website.<br />
<br />
On the browser, execution of any application code other than javascript is immediately reminiscent of the pile of technologies of the kind, that nowadays have little significance other than their historical ones: Java applets, Adobe Flash and Microsoft Silverlight. It has been proved over and over that <strong>Web Always Wins</strong>.<br />
<br />
This has recently sparked conversations and debates on the community on the merits, use cases and typical scenarios where you would use WebAssembly (and especially <a href="https://dotnet.microsoft.com/apps/aspnet/web-apps/client">Blazor</a> in the .NET community, although Blazor adds a <a href="https://docs.microsoft.com/en-gb/aspnet/core/blazor/?view=aspnetcore-3.0#blazor-server-side">server-side</a> scenario which simply is not WebAssembly).<br />
<img alt="WebAssembly" height="416" src="https://2r4s9p1yi1fa2jd7j43zph8r-wpengine.netdna-ssl.com/files/2017/02/04-02-langs08.png" width="640" /><br />
<br />
The classic view is that WebAssembly will allow C, C++ and Rust developers to be able to <a href="https://hacks.mozilla.org/2017/02/creating-and-working-with-webassembly-modules/">join</a> the web’s cool party. For some, it is a game changer: it will revolutionise Web Application development - potentially even <a href="https://www.quora.com/Will-WebAssembly-replace-JavaScript/answer/Aaron-Martin-Colby">sideline Javascript</a>. For others, it has (albeit limited) use-cases for high-computation <a href="https://hacks.mozilla.org/2017/07/webassembly-for-native-games-on-the-web/">gaming</a>. Some propose that <a href="https://arxiv.org/pdf/1901.09388.pdf">AI on the browser</a> could be a growing market and enhance the capabilities of today and tomorrow web sites. And yet others feel it will <a href="https://news.ycombinator.com/item?id=11977739">die off</a> as it is competing with web, and the web always wins.<br />
<br />
Of course, it is different to compiled-code technologies of the past. Unlike them, it is an Open standard adopted equally by all major players. It does not need a plugin to run - since, simply as it were, the plugin is distributed along with the browser.<br />
<br />
There is a level of truth to many of these statements: it will shine in special case scenarios such as gaming and AI. But that is NOT why I think it matters, and I am writing these lines. But I believe the main potential has <strong>nothing to do with gaming or AI</strong>: it is an <em>incidental feature</em> of the WebAssembly and in order to explain it I have to bring example from the web.<br />
<br />
Do you remember Ajax? Ajax was a <a href="https://en.wikipedia.org/wiki/Ajax_(programming)#History">hack</a> by Microsoft Outlook team to load emails asynchronously as they arrive - instead of reloading the page. Now it is impossible to think of the web without Ajax.<br />
<br />
Hence, <b>I claim that the most important feature of WebAssembly is its Security (an incidental feature): the fact that it runs on the same sandbox as the browser’s Javascript</b>. In fact, the C, C++ and Rust impact is just a minor distraction: companies will build (and already have) adaptors for high level languages (such as Java, C#, Python) to interact according to the WebAssembly specification.
<br />
<h2>
WebAssembly will revitalise desktop app development</h2>
<h1>
<a href="https://www.blogger.com/null" id="WebAssembly_will_revitalise_desktop_app_development_21"></a></h1>
When was the last time you installed an unknown app on your desktop (Windows or Mac)? In the early 2000s, viruses almost completely killed desktop application market for everyone but the major vendors. For many smaller vendors, still the only way to reach escape velocity and conquer mass market is to provide native mobile apps or web-based application and after years of gaining customers’ trust, one might install their desktop app. Who would have just tried <a href="https://evernote.com/">Evernote</a> purely as a desktop application?<br />
<br />
Apple’s iOS (and then Android) with its rigorous app governance and opt-in ACL policy created a sandbox for the apps to harness them so that they ask for access on resources they absolutely need. This was not an after-thought, but a design decision from day one. Major Desktop Operating Systems do have concepts of kernel mode vs user mode and file ACL but they are not designed for comprehensive opt-in ACL. And that is why even Mac app store feels pretty deserted with tumbleweed rolling on the street, let alone Windows 8/10 whose app store is more like a joke.<br />
<br />
I strongly believe WebAssembly will become a <strong>vehicle for desktop application delivery</strong>. First, it is web-based, allowing the same levels of discoverability, ease of registration, monetisation/subscription and seamless update mechanism. Second, it is 100% secure (or at least as secure as websites we visit everyday) and there is no more a concept of installation. Third, it will make software rental very easy and will allow for the growth of SaaS for compute-intensive applications.<br />
<br />
While <a href="https://electronjs.org/">Electron</a> applications provided ease of developing desktop applications using Web toolkit, WebAssembly will turn this approach on its head and bring native performance to the web.<br />
<br />
Web is ubiquitous and WebAssembly will conquer desktop applications.
<script type="text/javascript">
SyntaxHighlighter.all();
</script>aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com0tag:blogger.com,1999:blog-2889416825250254881.post-61340738931391408832018-06-04T18:12:00.000+01:002018-06-04T20:22:07.153+01:00CacheCow.Server 2.0: Using it on ASP.NET Core MVC <script src="http://softxnet.co.uk/sh/js/shcore.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushjscript.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushcsharp.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
CacheCow 2.0 Series:<br />
<ul>
<li>Part 1 - <a href="http://byterot.blogspot.co.uk/2018/05/cachecow-20-is-here-supporting-netcore-netstandard-aspnetcore-httpclient-aspnetwebapi-etag.html" target="_blank">CacheCow 2.0 is here - supporting .NET Standard and ASP.NET Core MVC</a></li>
<li>Part 2 - <a href="http://byterot.blogspot.com/2018/05/cachecowclient-20-http-caching-for-your-api-http-httpclient-caching-cachecow-etag-dotnetcore-aspnetcore.html" target="_blank">CacheCow.Client 2.0: HTTP Caching for your API Calls</a></li>
<li>Part 3 - CacheCow.Server 2.0: Using it on ASP.NET Core MVC [This post]</li>
<li>Part 4 - CacheCow.Server for ASP.NET Web API [Coming Soon]</li>
<li>Epilogue: side-learnings from supporting Core [Coming]</li>
</ul>
<ol>
</ol>
<h2>
HTTP Caching on the server</h2>
In HTTP Caching, the server has two main responsibilities:<br />
<ul>
<li>Supplying cache directives - including <span style="font-family: "courier new" , "courier" , monospace;">Cache-Control</span> (and <span style="font-family: "courier new" , "courier" , monospace;">ETag</span>, <span style="font-family: "courier new" , "courier" , monospace;">Last-Modified</span> and <span style="font-family: "courier new" , "courier" , monospace;">Vary</span>) headers</li>
<li>Performing conditional validating requests</li>
</ul>
<br />
These two have been explained in full in <a href="http://byterot.blogspot.com/2018/05/cachecow-20-is-here-supporting-netcore-netstandard-aspnetcore-httpclient-aspnetwebapi-etag.html" target="_blank">Part 1</a> of this series (It might be useful to have a quick review of that post to re-cap and refresh your memories).<br />
<br />
In CacheCow 1.x, it was assumed that resources can be updated only through the API, i.e. all modifications had to go through the API - if modifications were done outside the agent modifying was responsible for invalidating API cache: obviously this is not always possible or architecturally correct. As I explained in Part 1, this was a big assumption and even with such an assumption in place, the interaction between various resources made cache invalidation very difficult for the API layer: change to a single resource invalidates its collection resource and in case of hierarchical resources, change to a leaf resource invalidates higher up.<br />
<br />
CacheCow 2.x solves these problems by making the data and data providers take part in cache validation through new a couple of new abstractions. The following two sections are essential to understand the full value you get from CacheCow.Server, apart from setting <span style="font-family: "courier new" , "courier" , monospace;">Cache-Control</span> header which is frankly not rocket science. But if you get bored and want to see some code, feel free to skip to the Getting Started section and come back to these later on.<br />
<br />
<h2>
ASP.NET Core supports HTTP Caching, why not use that instead?</h2>
ASP.NET Core has provided several primitives for <a href="https://docs.microsoft.com/en-us/aspnet/core/performance/caching/response?view=aspnetcore-2.1#http-based-response-caching" target="_blank">HTTP Caching</a> to:<br />
<br />
<ul>
<li>Generate cache directives (and respond to conditional requests)</li>
<li><a href="https://docs.microsoft.com/en-us/aspnet/core/performance/caching/middleware?view=aspnetcore-2.1" target="_blank">Caching middleware</a> to cache representations on the server (an in-process HTTP proxy)</li>
</ul>
<br />
You are, of course, free to use these additional primitives. At the end of the day, <i>generating</i> cache directives requires bunch of if/else statements and reading from some config (for the <span style="font-family: "courier new" , "courier" , monospace;">max-age</span>, etc). But bear in mind these points:<br />
<ul>
<li>ASP.NET Core caching has started where CacheCow 0.x was. It suffers essentially from some of the same issues in CacheCow 0.x/1.x where the API layer is unaware of cache invalidation when it happens outside the API layer or when there is relationship between different resources. In fact CacheCow 1.x was capable of understanding relationship between single and collection resources as long as you would adhere to simple naming convention and it also provided invalidation mechanism. AFAIK these do NOT exist in ASP.NET Core caching and this could be potentially harmful if you care about <i>consistency</i> of your resources. (Consistency was explain in Part 1)</li>
<li> CacheCow.Server now provides the constructs to make the data source (and not the API) <i>the authority</i> with regard to <span style="font-family: "courier new" , "courier" , monospace;">ETag</span> and conditional validating calls - as it is normally the case. This also solves the problem of cache invalidation when data is modified but not through the API - perhaps arrival of new batch data or when the underlying data store is exposed via different mechanisms which is common in the industry.</li>
<li>To my knowledge, caching middleware does not do validation. Again if you care about consistency of your resources then this is probably not for you. </li>
</ul>
<br />
<h2>
The new shiny CacheCow.Sever 2.x</h2>
<div>
As explained, CacheCow.Server moves controlling of cache validation to where it belongs: your data and data providers. And in order to make that seamless and not pollute your API, these have been abstracted away to several constructs. These are explained in more detail on <a href="https://github.com/aliostad/CacheCow#concepts-and-definitions" target="_blank">github</a> so check that out if you need the next level details.</div>
<div>
<br /></div>
<h4>
ITimedETagExtractor</h4>
<div>
CacheCow blends versioning identity of a resource into TimedETag (an Either of <span style="font-family: "courier new" , "courier" , monospace;">Last-Modified</span> or <span style="font-family: "courier new" , "courier" , monospace;">ETag</span>). <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagExtractor</span> extracts TimedETag from a resource. Extraction by default uses <i>SHA1</i> hash of the json-serialised response payload (which has understandable overhead) but it would be very easy for you to provide an alternative mechanism for extracting the TimedETag. Many database tables have a timestamp column which you can use as <span style="font-family: "courier new" , "courier" , monospace;">Last-Modified</span>. But since HTTP data does not have sub-second accuracy and this might mean missing updates done in the same second, it is best if you turn that into an opaque hash-like value as <span style="font-family: "courier new" , "courier" , monospace;">ETag</span> - e.g. <a href="https://github.com/aliostad/CacheCow/blob/master/samples/CacheCow.Samples.Common/Extensions.cs#L10" target="_blank">this</a>:</div>
<pre style="brush: csharp;">public static string TurnDatetimeOffsetToETag(DateTimeOffset dateTimeOffset)
{
var dateBytes = BitConverter.GetBytes(dateTimeOffset.UtcDateTime.Ticks);
var offsetBytes = BitConverter.GetBytes((Int16)dateTimeOffset.Offset.TotalHours);
return Convert.ToBase64String(dateBytes.Concat(offsetBytes).ToArray());
}
</pre>
<div>
For the <i>collection</i> resources (e.g. orders vs. order) where there are a number of LastUpdated fields, all you need is <b>Max(LastUpdated)</b> and <b>total count</b>. You can combine these two values into a byte array buffer and serve the base64 representation directly or if you want it completely opaque, use its hash.<br />
<br />
You can implement <span style="font-family: "courier new" , "courier" , monospace;"><a href="https://github.com/aliostad/CacheCow/blob/master/src/CacheCow.Server/ETag/ICacheResource.cs" target="_blank">ICacheResource</a></span> interface on your view models (payloads you return from your actions) to extract the value and return TimedETag - it is a very simple interface with a single method.</div>
<br />
<h4>
ITimedETagQueryProvider</h4>
<div>
Generating TimedETag from the resources is all well and good but it means that we have to bring the resource all the way to the API layer to generate the TimedETag. This is completely acceptable if we then are serving the resource. But what if we are doing cache validation from a client asking to respond with the whole resource only if things have changed (and NotModified 304 status otherwise)? In that case, we might load the whole resource to find out it has not changed and waste computation effort - also the pressure on the backend systems stays regardless of whether resource was modified or not. Bear in mind, in this case there is still some saving on network and computing but surely we can do better than this.</div>
<div>
<br /></div>
<div>
The solution is to use <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagQueryProvider</span> to preemptively query your backend for the current status of the resource, i.e. TimedETag. Usually getting that piece of information about the resource is much cheaper than returning the whole resource. For a <i>single</i> resource, you just need the e.g. timestamp field and for the <i>collection</i> resource, Max(timestamp) and count - if you are using RDBMS, all of this can be conveniently achieved in a single query e.g. "SELECT COUNT(1), MAX(LastUpdated) FROM MyTable WHERE IsActive = 1". </div>
<div>
<br />
<h3>
Understanding trade-offs of various approaches</h3>
</div>
<div>
All of above said, you do not necessarily have to implement <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagExtractor</span> and <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagQueryProvider</span>. In fact you can use the CacheCow.Server out-of-the-box and it will fulfil all server-side caching duties. The point is if you would like optimal performance, you have got to do a bit more work. The table below explains your various options and the benefits you get.</div>
<div>
<br /></div>
<div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0YIOzkaTw2SJnMCHVbgzewaB2rD566RnhX9FDbFfELnd6QR977PFKZni6G7bjwmYc_BV3criNWkrbROaMIFBgAmwNOCarEWUYfQNDkODtUOcsAS1E73LFmRUWTJciB48-pF_5ODB-zK7J/s1600/CacheCow-2-options.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="874" data-original-width="1600" height="349" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0YIOzkaTw2SJnMCHVbgzewaB2rD566RnhX9FDbFfELnd6QR977PFKZni6G7bjwmYc_BV3criNWkrbROaMIFBgAmwNOCarEWUYfQNDkODtUOcsAS1E73LFmRUWTJciB48-pF_5ODB-zK7J/s640/CacheCow-2-options.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Table 1: CacheCow.Server - trade-offs and options</td></tr>
</tbody></table>
<br /></div>
<h3>
No need for storage anymore</h3>
<div>
CacheCow.Server 1.x had a need for some storage to keep the current TimedETag of the resource. Now that the TimedETag is generated or queried, there is no more such need. All solutions to do with EntiyTagStore in the CacheCow.Server have been removed in the repo.</div>
<div>
<br /></div>
<h2>
Getting started with CacheCow.Server on ASP.NET Core MVC</h2>
<div>
Documentation in <a href="https://github.com/aliostad/CacheCow#getting-started---aspnet-mvc-core" target="_blank">github</a> is pretty clear, I believe, but for the sake of completeness I am bringing some of it here too. This covers the basic case with default implementations.<br />
<br />
Essentially, all you need is a filter to decorate your actions, specifying the cache expiry duration in seconds. There are a bunch of other knobs but at this point, let's focus on the default scenario.</div>
<div>
<br /></div>
<h4>
1. Add the package from nuget </h4>
<div>
In your package-manager console type below:</div>
<pre>PM> install-package CacheCow.Server.Core.Mvc</pre>
<div>
<h4>
2. Add CacheCow's default dependencies</h4>
<pre style="brush: csharp;">public virtual void ConfigureServices(IServiceCollection services)
{
... // usual startup code
services.AddHttpCachingMvc(); // add HTTP Caching for Core MVC
}
</pre>
</div>
<div>
<h4>
3. Decorate the action with the HttpCacheFactory filter</h4>
<pre style="brush: csharp;">public class MyController : Controller
{
[HttpGet]
[HttpCacheFactory(300)]
public IActionResult Get(int id)
{
... // implementation
}
}
</pre>
Here we are defining the expiry to be 300 seconds (= 5 minutes). This means the client will cache the result for 5 minutes and after 5 minutes will keep asking if the resource has changed using conditional GET requests (see Part 2 for more info).<br />
<br />
<h4>
4. Check all is working</h4>
That should be all you need to have up and running. Now make a call to your API and you should see the <span style="font-family: "courier new" , "courier" , monospace;">Cache-Control</span> header. You can use postman, fiddler or any other tool... you will basically see something like this:<br />
<pre>Vary: Accept
ETag: "SPQT7RzH1QgBAAEAAAA="
Cache-Control: must-revalidate, max-age=300, private
x-cachecow-server: validation-applied=True;⏎
validation-matched=False;short-circuited=False;query-made=True
Date: Thu, 31 May 2018 17:35:42 GMT
</pre>
As you can see, CacheCow has added <span style="font-family: "courier new" , "courier" , monospace;">Vary</span>, <span style="font-family: "courier new" , "courier" , monospace;">ETag</span> and <span style="font-family: "courier new" , "courier" , monospace;">Cache-Control</span>. There is also a diagnostic header, <span style="font-family: "courier new" , "courier" , monospace;">x-cachecow-server</span>, that explains what CacheCow has performed to generate the response.<br />
<br />
Now you can test if the conditional case by sending a GET request with the header below:<br />
<pre>If-None-Match: "SPQT7RzH1QgBAAEAAAA="
</pre>
And the server will respond with 304 if your resource has not changed.<br />
<br />
<h2>
More complex scenarios</h2>
<div>
Before we go into more details, it might be useful to go to CacheCow's github repo and review the ASP.NET <a href="https://github.com/aliostad/CacheCow/tree/master/samples/CacheCow.Samples.MvcCore" target="_blank">Core MVC sample</a>. Build and run it, play around and browse the code. This will make the discussions below closer to home as it details how to cater for various scenarios.<br />
<br />
Table 1 (further above) is your guide in deciding which interface to implement.</div>
<br />
<h4>
Implementing ITimedETagExtractor or ICacheResource</h4>
As mentioned above serialisation is a heavy-handed approach to generating TimedETag. While OK for low-to-mid level load, for high performance you would be best either implement <span style="font-family: "courier new" , "courier" , monospace;">ICacheResource</span> on your view models (what you return back from your action) or if you do not want dependency to a caching library for your view models, implement <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagExtractor</span> to extract TimedETag from your view models.<br />
<br />
If you implement <span style="font-family: "courier new" , "courier" , monospace;">ICacheResource</span>, you do not have to register anything additionally but if you implement <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagExtractor</span> for your view models, you have to register them.<br />
<br />
There are examples on the <a href="https://github.com/aliostad/CacheCow/tree/master/samples/CacheCow.Samples.MvcCore" target="_blank">samples</a>.<br />
<br />
<h4>
Implementing ITimedETagQueryProvider</h4>
By implementing <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagQueryProvider</span>, you protect your backend system so that cache validation can be achieved without bring the view model all the way to the API layer to extract/generate TimedETag.<br />
<br />
There are examples on the <a href="https://github.com/aliostad/CacheCow/tree/master/samples/CacheCow.Samples.MvcCore" target="_blank">samples</a>.<br />
<br />
<h4>
Dependency Injection and differentiation of ViewModels</h4>
<div>
Implementing <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagQueryProvider</span> or <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagExtractor</span> for different view models most likely involve different code. Since normally only a single implementing is registered against an interface, such implementation should check the type and then apply the appropriate code which breaks several programming principles.</div>
<div>
<br /></div>
<div>
You can use generic interfaces<span style="font-family: "courier new" , "courier" , monospace;"> ITimedETagQueryProvider<TViewModel></span> and <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagExtractor<TViewModel></span> to implement and then register. Then, in your filter, annotate the type of the view model. For example:</div>
<pre class="brush:csharp">[HttpGet]
[HttpCacheFactory(0, ViewModelType = typeof(Car))]
public IActionResult Get(int id)
{
var car = _repository.GetCar(id);
return car == null
? (IActionResult)new NotFoundResult()
: new ObjectResult(car);
}
</pre>
<div>
This means that you have implemented <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagExtractor<Car></span> and <span style="font-family: "courier new" , "courier" , monospace;">ITimedETagQueryProvider<Car></span> and registered them in your IoC.</div>
<br />
You would be registering these in your application using extension methods in CacheCow (depending which interfaces you have implemented):</div>
<pre class="brush:csharp">public virtual void ConfigureServices(IServiceCollection services)
{
... // register stuff
services.AddQueryProviderForViewModelMvc<TestViewModel, TestViewModelQueryProvider>();
services.AddQueryProviderForViewModelMvc<IEnumerable<TestViewModel>, TestViewModelCollectionQueryProvider>();
}
</pre>
<br />
Other options for register implementations are: <span style="font-family: "courier new" , "courier" , monospace;">AddExtractorForViewModelMvc</span>, <span style="font-family: "courier new" , "courier" , monospace;">AddSeparateDirectiveAndQueryProviderForViewModelMvc</span> or <span style="font-family: "courier new" , "courier" , monospace;">AddDirectiveProviderForViewModelMvc</span>. Some of these extension methods are essentially helpers that combine registration of multiple types.<br />
<br />
<div>
<h2>
Conclusions</h2>
</div>
<div>
CacheCow.Server is now relying on the data and data providers to take part in TimedETag generation and cache validation instead of storing and maintaining TimedETag and making guesses about the cache validation. This reduces the need for storage and making CacheCow a reliable solution capable of providing caching with air-tight consistency. </div>
<div>
<br /></div>
<div>
ASP.NET Core's HTTP Caching features are a good start but they lack some fundamental features thus I advise you to use CacheCow.Server instead - although I cannot guarantee that my views as the creator of CacheCow could be free of bias - just try and see for yourself and pick what works for you.</div>
<br />
<br />
<br />
<script type="text/javascript">
SyntaxHighlighter.all();
</script>aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com0tag:blogger.com,1999:blog-2889416825250254881.post-77967018262547099422018-05-16T18:27:00.000+01:002018-06-04T18:18:28.258+01:00CacheCow.Client 2.0: HTTP Caching for your API Calls<script src="http://softxnet.co.uk/sh/js/shcore.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushjscript.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushcsharp.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
CacheCow 2.0 Series:<br />
<ul>
<li>Part 1 - <a href="http://byterot.blogspot.co.uk/2018/05/cachecow-20-is-here-supporting-netcore-netstandard-aspnetcore-httpclient-aspnetwebapi-etag.html" target="_blank">CacheCow 2.0 is here - supporting .NET Standard and ASP.NET Core MVC</a></li>
<li>Part 2 - CacheCow.Client 2.0 [This post]</li>
<li>Part 3 - <a href="http://byterot.blogspot.com/2018/06/cachecowserver-20-using-it-on-aspnetcoremvc-dotnetcore-etag-http-caching-aspnetcore.html" target="_blank">CacheCow.Server 2.0: Using it on ASP.NET Core MVC</a></li>
<li>Part 4 - CacheCow.Server for ASP.NET Web API [Coming Soon]</li>
<li>Epilogue: side-learnings from supporting Core [Coming]</li>
</ul>
<ol>
</ol>
<div>
<br /></div>
<h3>
State of Client HTTP Caching in .NET</h3>
<div>
Before CacheCow, the only way to use HTTP caching was to use Windows/IE caching through <span style="font-family: "courier new" , "courier" , monospace;">WebRequestHandler</span> as Darrel explains <a href="http://www.bizcoder.com/httpclient-it-lives-and-it-is-glorious" target="_blank">here</a>. AFAIK, this class no longer exists in .NET Standard due to its tight coupling with Windows/IE implementations.</div>
<div>
<br /></div>
<div>
I set out to build a store-independent caching story in .NET around 6 years ago and named it CacheCow and after these years I am still committed to maintain that effort.<br />
<br />
Apart from a full-blown HTTP Caching, I had other ambitions in the beginning, for example, I had plans so you could limit caching per domain, etc. It became evident that this story is neither a critical feature nor possible in all storage techs. The underlying data structure requirement for cache storage is key-value while this feature required more complex querying. I accomplished implementing it for some storages but never was really used. That is why I no longer pursue this feature and it has been removed from CacheCow.Client 2.0. It is evident that unlike browsers, virtually all HttpClient instances would communicate with a handful of APIs and storage in this day and age is hardly a problem.</div>
<div>
<br /></div>
<h3>
CacheCow.Client Features</h3>
<div>
The features of CacheCow 2.0 is pretty much unchanged since 1.x other than that now it supports .NET Standard 2.0+ hence you can use it in .NET Core and on platforms other than Windows (Linux/Mac).</div>
<div>
<br /></div>
<div>
In brief:</div>
<div>
<ul>
<li>Supporting .NET 4.52+ and .NET Standard 2.0+</li>
<li>Store cacheable responses </li>
<li>Supports In-Memory and Redis storages - SQL is coming too (and easy to build your own)</li>
<li>Manage separate query/storage of representations according to server's <span style="font-family: "courier new" , "courier" , monospace;">Vary</span> header</li>
<li>Validating GET calls to validate cache after expiry</li>
<li>Conditional PUT calls to modify a resource only if not changed since (can be turned off)</li>
<li>Exposing diagnostic<span style="font-family: "courier new" , "courier" , monospace;"> x-cachecow-client</span> header to inform of the caching result</li>
</ul>
</div>
<div>
Using CacheCow.Client is effortless and there are hardly any knobs to adjust - it hides away all the caching cruft that can get in your way of consuming an API efficiently.<br />
<br />
CacheCow.Client has been created as a <span style="font-family: "courier new" , "courier" , monospace;">DelegatingHandler</span> that needs to be added to the <span style="font-family: "courier new" , "courier" , monospace;">HttpClient</span>'s HTTP pipeline to intercept the calls. We will look at some use typical use cases.<br />
<br /></div>
<h3>
Basic Use Case</h3>
<div>
Let's imagine you have a service that needs to consume a cacheable resource and you are using <span style="font-family: "courier new" , "courier" , monospace;">HttpClient</span>. Here are the steps to follow:<br />
<br /></div>
<div>
<h4>
Add a Nuget dependency to CacheCow.Client</h4>
</div>
<div>
Use command-line or UI to add a dependency to CacheCow.Client version 2.x:<br />
<pre class="tr_bq">> install-package CacheCow.Client</pre>
<br />
<h4>
Create an HttpClient </h4>
CacheCow.Client provides a helper method to create an HttpClient with caching enabled:<br />
<pre class="brush:csharp">var client = ClientExtensions.CreateClient();</pre>
All this does is to create an HttpClient with CacheCow's <span style="font-family: "courier new" , "courier" , monospace;">CachingHandler</span> added to the pipeline fronted by the <span style="font-family: "courier new" , "courier" , monospace;">HttpClientHandler</span>.<br />
<br />
You can pass the cache store (an implementation of <span style="font-family: "courier new" , "courier" , monospace;">ICacheStore</span>) in an overload of this method but here we are going to use the default In-Memory store suitable for our use case.<br />
<br />
<h4>
Make two calls to the cacheable resource</h4>
Now we make a GET call to get a cacheable resource and then another call to get it again. From examining the CacheCow header we can ascertain second response came directly from the cache and never even hit the network.<br />
<pre class="brush:csharp">const string CacheableResource = "https://code.jquery.com/jquery-3.3.1.slim.min.js";
var response = client.GetAsync(CacheableResource).
ConfigureAwait(false).GetAwaiter().GetResult();
var responseFromCache = client.GetAsync(CacheableResource).
ConfigureAwait(false).GetAwaiter().GetResult();
Console.WriteLine(response.Headers.GetCacheCowHeader().ToString());
// outputs "2.0.0.0;did-not-exist=true"
Console.WriteLine(responseFromCache.Headers.GetCacheCowHeader().ToString());
// outputs "2.0.0.0;did-not-exist=false;retrieved-from-cache=true"
</pre>
<br />
<h3>
Using alternative storages - Redis</h3>
If you have 10 boxes calling an API and they are using an In-Memory store, the response would have to be cached separately on each box and the origin server will be hit potentially 10 times. Also due to dispersion, usefulness of the cache is reduced and you will see lower cache hit ratio.<br />
<br />
In high-throughput scenarios you would want to use a distributed cache such as Redis. CacheCow.Client 1.x used to support Azure Fabric Cache (discontinued by Microsoft), two versions of Memcached, SQL Server, ElasticSearch, MongoDB and even File. Starting with 2.x, new storages will be added only when they absolutely make sense. There is a plan to migrate SQL Server storage but as for the others, there is currently no such plans. Having said that, it is very easy to implement your own and we will look into this further down in this post (I have chosen LMDB, a super fast file-based storage by <a href="https://twitter.com/hyc_symas" target="_blank">Howard Chu</a>).<br />
<br />
For this case, we would like to use Redis storage. In case you do not have access to an instance of Redis, you can download (<a href="https://redis.io/download" target="_blank">Mac/Linux</a> or <a href="https://github.com/MicrosoftArchive/redis/releases" target="_blank">Windows</a>) and run Redis locally without installation.<br />
<br />
<h4>
Add a dependency to Redis store package</h4>
After running your Redis (or perhaps creating one in the cloud), add a dependency to CacheCow.Client.RedisCacheStore:<br />
<pre class="tr_bq">> install-package CacheCow.Client.RedisCacheStore</pre>
<div>
<br /></div>
<h4>
Create an HttpClient with a Redis store</h4>
We use the <span style="font-family: "courier new" , "courier" , monospace;">ClientExtensions</span> to create a client with a Redis store - here it connects to a local cache:<br />
<br />
<pre class="brush:csharp">var client = ClientExtensions.CreateClient(new RedisStore("localhost")); </pre>
<br />
CacheCow.Client.RedisCacheStore library uses <a href="https://github.com/StackExchange/StackExchange.Redis" target="_blank">StackExchange.Redis</a>, the de-facto Redis client library in .NET, hence it can accepts connection string according to StackExchange.Redis conventions as well as <span style="font-family: "courier new" , "courier" , monospace;">IDatabase</span>, etc to initialise the store.<br />
<br />
<h4>
Make two calls to a cacheable resource</h4>
Rest of the code is the same as with In-Memory scenario, making two HTTP calls to the same cacheable resource and observing the CacheCow headers - see above.<br />
<br />
<h3>
Cache Validation</h3>
Cacheable resources provide a <i>validator</i> so that the client can validate whether the version they have is still current. This was explained in the <a href="http://byterot.blogspot.co.uk/2018/05/cachecow-20-is-here-supporting-netcore-netstandard-aspnetcore-httpclient-aspnetwebapi-etag.html" target="_blank">previous post</a>, but essentially representation's <span style="font-family: "courier new" , "courier" , monospace;">ETag</span> (or <span style="font-family: "courier new" , "courier" , monospace;">Last-Modified</span>) header gets used to validate the cached resource with the server. CacheCow.Client already does this for you so you do not have to worry about it.<br />
<br />
Another aspect of cache validation is on PUT calls so that the resource gets modified only if it has not changed since you have received it. This is essentially optimistic concurrency which is beautifully implemented in HTTP using validators. CacheCow.Client does this by default but there is a property on <span style="font-family: "courier new" , "courier" , monospace;">CachingHandler</span> if you need to turn it off. In case you would wish to do so (or to change any other aspect of the <span style="font-family: "courier new" , "courier" , monospace;">CachingHandler</span>), create the client without the ClientExtension:<br />
<pre class="brush:csharp">var handler = new CachingHandler()
{
InnerHandler = new HttpClientHandler(),
UseConditionalPut = false
};
var c = new HttpClient(handler);</pre>
There are bunch of other knobs that are provided for some edge cases so you could modify the default behaviour but they are pretty self-explanatory and not worth going into much details. Just browse public properties of <span style="font-family: "courier new" , "courier" , monospace;">CachingHandler</span> and GitHub or StackOverflow is the best place to discuss if you have a question.</div>
<div>
<br /></div>
<div>
<h2>
Supporting other storages - implementing ICacheStore for LMDB</h2>
CacheCow separates the storage from the HTTP caching functionality hence it is possible to plug-in your own storage with a few lines of code.<br />
<br />
ICacheStore is a simple interface with 4 async methods:<br />
<pre class="brush:csharp">public interface ICacheStore : IDisposable
{
Task<HttpResponseMessage> GetValueAsync(CacheKey key);
Task AddOrUpdateAsync(CacheKey key, HttpResponseMessage response);
Task<bool> TryRemoveAsync(CacheKey key);
Task ClearAsync();
}</pre>
LMDB is a lightning-fast database (as the name implies) that has a support in .NET, thanks to Cory Kaylor for his OSS project <a href="https://github.com/CoreyKaylor/Lightning.NET" target="_blank">Lightning.NET</a>. The project needs some more love and care fixing some of the build issues and updating to the latest frameworks but it is a great work.<br />
<br />
This scenario is useful especially if you need a local persistent store.<br />
<br />
The implementation is pretty straightforward and we use <span style="font-family: "courier new" , "courier" , monospace;">Put</span>, <span style="font-family: "courier new" , "courier" , monospace;">Get</span>, <span style="font-family: "courier new" , "courier" , monospace;">Delete</span> and <span style="font-family: "courier new" , "courier" , monospace;">Truncate</span> methods of <span style="font-family: "courier new" , "courier" , monospace;">LightningTransaction</span> to implement <span style="font-family: "courier new" , "courier" , monospace;">UpdateAsync</span>, <span style="font-family: "courier new" , "courier" , monospace;">GetValueAsync</span>, <span style="font-family: "courier new" , "courier" , monospace;">TryRemoveAsync</span> and <span style="font-family: "courier new" , "courier" , monospace;">ClearAsync</span> functionality. For <span style="font-family: "courier new" , "courier" , monospace;">Dispose</span>, we just need to dispose the lightning environment.<br />
<br />
Here is a pretty typical implementation:<br />
<pre class="brush:csharp">using System;
using System.IO;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using CacheCow.Client;
using CacheCow.Client.Headers;
using CacheCow.Common;
using LightningDB;
namespace CacheCow.Client.Lightning
{
public class LightningStore : ICacheStore
{
private readonly LightningEnvironment _environment;
private readonly string _databaseName;
private readonly MessageContentHttpMessageSerializer _serializer = new MessageContentHttpMessageSerializer();
public LightningStore(string path, string databaseName = "CacheCowClient")
{
_environment = new LightningEnvironment(path);
_environment.MaxDatabases = 1;
_environment.Open();
_databaseName = databaseName;
}
public async Task AddOrUpdateAsync(CacheKey key, HttpResponseMessage response)
{
var ms = new MemoryStream();
await _serializer.SerializeAsync(response, ms);
using (var tx = _environment.BeginTransaction())
using (var db = tx.OpenDatabase(_databaseName,
new DatabaseConfiguration { Flags = DatabaseOpenFlags.Create }))
{
tx.Put(db, key.Hash, ms.ToArray());
tx.Commit();
}
}
public Task ClearAsync()
{
using (var tx = _environment.BeginTransaction())
using (var db = tx.OpenDatabase(_databaseName,
new DatabaseConfiguration { Flags = DatabaseOpenFlags.Create }))
{
tx.TruncateDatabase(db);
tx.Commit();
}
return Task.CompletedTask;
}
public void Dispose()
{
_environment.Dispose();
}
public async Task<HttpResponseMessage> GetValueAsync(CacheKey key)
{
using (var tx = _environment.BeginTransaction())
using (var db = tx.OpenDatabase(_databaseName,
new DatabaseConfiguration { Flags = DatabaseOpenFlags.Create }))
{
var data = tx.Get(db, key.Hash);
if (data == null || data.Length == 0)
return null;
var ms = new MemoryStream(data);
return await _serializer.DeserializeToResponseAsync(ms);
}
}
public Task<bool> TryRemoveAsync(CacheKey key)
{
using (var tx = _environment.BeginTransaction())
using (var db = tx.OpenDatabase(_databaseName,
new DatabaseConfiguration { Flags = DatabaseOpenFlags.Create }))
{
tx.Delete(db, key.Hash);
tx.Commit();
}
return Task.FromResult(true);
}
}
}</pre>
<h2>
Conclusion</h2>
CacheCow.Client is a simple and straightforward to get started with. It supports In-Memory and Redis storages and storage of your choice can be plugged-in with a handful lines of code - here we demonstrated that for LMDB. It is capable of carrying out GET and PUT validation, making your client more efficient and your data more consistent.<br />
<br />
In the next post, we will look into Server scenarios in ASP.NET Core MVC.</div>
<script type="text/javascript">
SyntaxHighlighter.all();
</script>aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com0tag:blogger.com,1999:blog-2889416825250254881.post-40917588561554833402018-05-13T22:38:00.001+01:002018-06-04T18:18:32.156+01:00CacheCow 2.0 is here - now supporting .NET Standard and ASP.NET Core MVC<script src="http://softxnet.co.uk/sh/js/shcore.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushjscript.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushcsharp.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
<br />
CacheCow 2.0 Series:<br />
<ul>
<li>Part 1 - CacheCow 2.0 is here - supporting .NET Standard and ASP.NET Core MVC [This post]</li>
<li>Part 2 - <a href="http://byterot.blogspot.co.uk/2018/05/cachecowclient-20-http-caching-for-your-api-http-httpclient-caching-cachecow-etag-dotnetcore-aspnetcore.html" target="_blank">CacheCow.Client 2.0: HTTP Caching for your API calls</a></li>
<li>Part 3 - <a href="http://byterot.blogspot.com/2018/06/cachecowserver-20-using-it-on-aspnetcoremvc-dotnetcore-etag-http-caching-aspnetcore.html" target="_blank">CacheCow.Server 2.0: Using it on ASP.NET Core MVC</a></li>
<li>Part 4 - CacheCow.Server for ASP.NET Web API [Coming Soon]</li>
<li>Epilogue: side-learnings from supporting Core [Coming]</li>
</ul>
<ol>
</ol>
<h3>
So, no CacheCore in the end!</h3>
<div>
Yeah. I did <a href="http://byterot.blogspot.co.uk/2017/04/Future-CacheCow-birth-CacheCore-REST-HTTP-dotnetcore-middleware-caching-conditional-put-get.html" target="_blank">announce</a> last year that the new updated <a href="https://github.com/aliostad/CacheCow" target="_blank">CacheCow</a> will live under the name <a href="https://github.com/aliostad/CacheCore" target="_blank">CacheCore</a>. The more I worked on it, the more it became evident that only a tiny amount of CacheCow will ever be Core-related. And frankly trends come and go, while HTTP Caching is pretty much unchanged for the last 20 years.</div>
<div>
<br /></div>
<div>
So the name CacheCow lives on, although in the end what matters for a library is if it can solve any of your problems. I hope it will and carry on doing so. Now you can use CacheCow.Client with .NET 4.52+ and .NET Standard 2.0+. Also CacheCow.Server also supports both Web API and ASP.NET Core MVC - and possibly Nancy soon!</div>
<div>
<br /></div>
<div>
<a href="https://github.com/aliostad/CacheCow" target="_blank">CacheCow 2.0</a> has lots of documentation and the project now has 3 sample projects covering both client and server sides in the same project.</div>
<div>
<br /></div>
<h3>
CacheCow.Server has changed radically</h3>
<div>
The design for the server-side of the CacheCow 0.x and 1.x was based on the assumption that your API is a pure RESTful API and the data only changes through calling its endpoints so the API layer gets to <i>see</i> all changes to its underlying resources. The more I explored and over the years, this turned out to be a pretty big assumption in the end, and is realistic only in the REST La La Land - a big learning for me. And even if the case is true, the relationship between resources resulted in server-side cache directive management to be a mammoth task. For example in the familiar scenario of customer-product-orders, if an order changes, the cache for the collection of orders is now invalidated - hence the API needs to understand which resource is collection of which. What is more, change in customer could change the order data (depending on implementation of course, but just the take it for the sake of argument). So it meant that the API now has to know a lot more: single vs collection resources, relationship between resources... it was a slippery slope to a very bad place.</div>
<div>
<br /></div>
<div>
With removing that assumption, the responsibility now lies with the back-end stores which provide data for the resources - they will be queried by a couple of constructs added to CacheCow.Server. If you opt to implement that part for your API, then you have a supper-efficient API. If not, there are some defaults there to do the work for you - although super-optimal. All of this will be explained in the CacheCow.Server posts, but the point is CacheCow.Server is now a clean abstraction for HTTP Caching, as clean as I could make it. Judge for yourself.</div>
<div>
<br /></div>
<h2>
What is HTTP Caching?</h2>
<div>
Caching is a very familiar notion in programming and pretty much every developer uses it on a regular basis. This familiarity has a downside to it since HTTP Caching is more complex and in many ways different to the routing caching in code - hence it is very common to see misunderstandings even amongst senior developers. If you ask an average developer this question: "In HTTP Caching, where the cache data gets stored?" it is probably more likely to hear the wrong answer "server" than the correct answer "client". In fact, many developers are looking for to improve their server-side code's performance by turning on the caching on the server, while if the callers ignore the caching directives it will not result in any benefit.</div>
<div>
<br /></div>
<div>
This reminds me of a blog post I wrote <a href="http://byterot.blogspot.co.uk/2012/06/what-i-think-coupling-is.html" target="_blank">6 years ago</a> where I used HTTP Caching as an example of <i>mixed-concern</i> (as opposed to <i>server-concern</i> or <i>client-concern</i>) where <i>"For HTTP caching to work, client and server need to work in tandem"</i>. This a key difference with the usual caching scenarios seen everyday. What makes HTTP Caching even more complex is the concurrency primitives, built-in starting with HTTP 1.1 - we will look into those below. </div>
<div>
<br /></div>
<div>
I know HTTP Caching is hardly new and has been explained many times before. But <b>considering number of times I have seen being completely misunderstood, I think it deserves your 5-10 minutes - even though as refresher.</b></div>
<h3>
<br />Resources vs. Representations</h3>
<div>
REST advocates exposing services through a uniform API (where HTTP is one such implementation) allowing <i>resources</i> to be created, modified and queried using the API. A <i>resource</i> is addressed by its location identifier or URL (e.g. <span style="font-family: "courier new" , "courier" , monospace;">/api/car/123</span>). When a client requests a resource, only <b><i>a</i></b> <i>representation</i> of the <i>resource</i> is sent back. This means that the client receives only a representation out of many possible representations. This also would mean that when the client caches the representation, this representation is only valid if the the representation requested matches the one cached. And finally, a client might cache different representations of the same resource. But what does all of this mean?</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYv8fWIZU38aBYEMjsJe_my00jfxdSZRLAi5rFeWlDHEBdoh1aVYsrFbyfTn6tMM89xX2IgVF_J-XrYHgySAVmDN-uVir3cO-JGP7zYw04047lj2mZAK3wCXCJvp5520-gOMUeuEHy7pW9/s1600/Screen+Shot+2018-05-13+at+19.12.40.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="997" data-original-width="1600" height="398" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYv8fWIZU38aBYEMjsJe_my00jfxdSZRLAi5rFeWlDHEBdoh1aVYsrFbyfTn6tMM89xX2IgVF_J-XrYHgySAVmDN-uVir3cO-JGP7zYw04047lj2mZAK3wCXCJvp5520-gOMUeuEHy7pW9/s640/Screen+Shot+2018-05-13+at+19.12.40.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">HTTP GET - The server serving a <i>representation</i> of the <i>resource.</i> Server also send cache directives.</td></tr>
</tbody></table>
<div>
A resource could be represented differently in terms of format, encoding, language and other presentation concerns. HTTP provides semantic for the client to express its preferences in such concerns with headers such as <span style="font-family: "courier new" , "courier" , monospace;">Accept</span>, <span style="font-family: "courier new" , "courier" , monospace;">Accept-Language</span> and <span style="font-family: "courier new" , "courier" , monospace;">Accept-Encoding</span>. There could be other headers that can result in alternative representations. The <b>server</b> is responsible for returning the definitive list of such headers in the <span style="font-family: "courier new" , "courier" , monospace;">Vary</span> header.</div>
<div>
<br /></div>
<h3>
Cache Directives</h3>
<div>
Server is responsible for returning cache directives along with the representation. <span style="font-family: "courier new" , "courier" , monospace;">Cache-Control</span> header is the de-factor cache directive defining whether the representation can be cached, for how long, whether by the end client or also by the HTTP intermediaries/proxies, etc. HTTP 1.0 had the simple <span style="font-family: "courier new" , "courier" , monospace;">Expires</span> header which only defined absolute expire time of the representation.<br />
<br />
You could also think of other cache-related headers as cache directives (although purely speaking they are not) such as <span style="font-family: "courier new" , "courier" , monospace;">ETag</span>, <span style="font-family: "courier new" , "courier" , monospace;">Last-Modified</span> and <span style="font-family: "courier new" , "courier" , monospace;">Vary</span>.</div>
<div>
<br /></div>
<div>
<h3>
Resource Version Identifiers (Validators)</h3>
HTTP 1.1 defines <span style="font-family: "courier new" , "courier" , monospace;">ETag</span> as an opaque identifier which defines the version of the resource. ETag (or EntityTag) can be strong or weak. Normally a strong ETag identifies version of the <i>representation</i> while a weak ETag is only at the <i>resource</i> level.<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">Last-Modified</span> header was the main validator in HTTP 1.0 but since it is based on a date with up-to-a-second precision, it is not suitable for achieving high consistency since a resource could change multiple times in a second.<br />
<br />
CacheCow supports both validators (<span style="font-family: "courier new" , "courier" , monospace;">ETag</span> and <span style="font-family: "courier new" , "courier" , monospace;">Last-Modified</span>) and combines these two notions in the construct TimedETag.<br />
<br />
<h3>
Validating (conditional) HTTP Calls</h3>
A GET call can request the server for the resource with the <i>condition</i> that the resource has been modified with respect to its validator. In this case, the client sends ETag(s) in the <span style="font-family: "courier new" , "courier" , monospace;">If-None-Match</span> header or Last-Modified date in the <span style="font-family: "courier new" , "courier" , monospace;">If-Modified-Since</span> header. If validation matches and no change was made, the server returns status 304 otherwise the resource is sent back.<br />
<br /></div>
<div>
For a PUT (and DELETE) call, the client sends validators in <span style="font-family: "courier new" , "courier" , monospace;">If-Match</span> or <span style="font-family: "courier new" , "courier" , monospace;">If-Unmodified-Since</span>. The server performs the action if validation matches otherwise status 412 is sent back.<br />
<br />
<h3>
Consistency</h3>
</div>
<div>
The client normally caches representations longer than the expiry and after the expiry it resorts to validating calls and if they succeed it can carry on using the representations.</div>
<div>
<br /></div>
<div>
In fact the sever can return representations with immediate expiry forcing the client to validate every time before using the cache resource. This scenario can be called High-Consistency caching since it ensures the client always uses the most recent version.</div>
<div>
<br /></div>
<div>
<h2>
Is HTTP Caching suitable for my scenario?</h2>
Consider using HTTP Caching if:<br />
<ul>
<li>Both your client and server are cache-aware. The client either is a browser which is the ultimate HTTP machine well capable of handling cache directives or a client that understands caching such as HttpClient + CacheCow.Client.</li>
<li>You need a High-Consistency caching and you cannot afford clients to use outdated data</li>
<li>Saving on network bandwidth is important</li>
</ul>
<br />
HTTP Caching is unsuitable for you if:<br />
<ul>
<li>Your client does not understand/implement HTTP caching</li>
<li>The server is unable to provide cache directives</li>
</ul>
<br />
<br />
In the next post, we will look into CacheCow.Client.</div>
<div>
<br /></div>
aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com0tag:blogger.com,1999:blog-2889416825250254881.post-74435857348084929672018-03-19T17:39:00.000+00:002018-03-19T17:39:13.685+00:00Business and Log Events, Azure EventHub and Psyfon<script src="http://softxnet.co.uk/sh/js/shcore.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushjscript.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushcsharp.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://www.blogger.com/blogger.g?blogID=2889416825250254881" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"></a></div>
<i>TLDR; If you need to send large number of events to Azure EventHub from a .NET process or passthru API, consider using <a href="https://github.com/aliostad/psyfon" target="_blank">psyfon</a>.</i><br />
<br />
Over the last two decades, many businesses have transformed themselves and modeled their processes and operations as software (bespoke or customising off-the-shelf products). These systems would turn business processes and transactions into <b>data</b> that can be stored, queried or exchanged - ROI for such data is very high and the challenges of building/evolving such systems have been widely known. These systems typically generate <i>business events</i>.<br />
<br />
Businesses have been turning their attention to the next goal: capturing (and <b>analysing in near real-time</b>) the information that commonly not considered as valuable <b>data</b>, such as minute user interactions with sites/apps down to the level of mouse movements and scrolls, sensor outputs in vehicles or factories, CCTV streams from municipal cameras to predict/forecast traffic, shopper interactions/behaviour in supermarkets to gain insight/provide recommendations, etc. These systems generate what I - for better or worse - call <i>log events </i>which I have explained in the past <a href="https://www.infoq.com/articles/reactive-cloud-actors" target="_blank">here</a> but would be useful to re-cap their differences with business events in the table below.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjW191VeJqtasMH6_hilbiUeeUEspqLK2VILAH04qITwftBgfeUNriUZXWcjiiOobpoOeyZPLnAnY_eiwq75KeRJdxSNEwNNQNN_AukQ1uqH1SjfGXuOVYDU9eP8Mgt8Y28RZRLGnfXhP5g/s1600/Screen+Shot+2018-03-19+at+17.23.50.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="868" data-original-width="1600" height="346" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjW191VeJqtasMH6_hilbiUeeUEspqLK2VILAH04qITwftBgfeUNriUZXWcjiiOobpoOeyZPLnAnY_eiwq75KeRJdxSNEwNNQNN_AukQ1uqH1SjfGXuOVYDU9eP8Mgt8Y28RZRLGnfXhP5g/s640/Screen+Shot+2018-03-19+at+17.23.50.png" width="640" /></a></div>
<br />
While log events could have been historically stored and then analysed in batch mode, there is growing need to make some sense of the data in real-time in addition to in-depth analysis in offline mode. That is essentially <b>stream processing</b>.<br />
<br />
Stream processing is hard. Building resilient processes to be able to reliably process tons of data in parallel while handling back-pressure, point failures, peaks of activity - all of which with few seconds or even sub-second latency - is not trivial. There are such systems already available such as Apache <a href="https://www.youtube.com/watch?v=nuu_Zat6yus" target="_blank">Flink</a>, <a href="https://kafka.apache.org/documentation/streams/" target="_blank">Kafka Streams</a> or <a href="https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html" target="_blank">Spark Streaming</a>. These systems typically work on top of an Event Store such as Kafka or Azure EventHubs.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
Azure EventHub has been built for publishing and consuming events at high-scale. The design is not dissimilar to that of Kafka: a replicated/Highly-Available log per arbitrary (but constant) number of partitions where ordering can be guaranteed only at the partition level. You can read from the beginning of the log or from any point in the stream but remembering where you last read events from (checkpointing) is completely left to the consumers.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgasFRyfZrKg4WnrLzfoKgwhyphenhyphenVU24wanwwPPK3vtQCO6BH-L7xOwYd-c7tDTSaXzw1ADhBSoQRRdCQ4cdtZ_wamyNnrUarNZmPN0ppvNRb9-FJFJ9Vk7deYcj6ru6Id3tuMf1iQ63UlTK6G/s1600/EventHubs.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="968" data-original-width="1600" height="386" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgasFRyfZrKg4WnrLzfoKgwhyphenhyphenVU24wanwwPPK3vtQCO6BH-L7xOwYd-c7tDTSaXzw1ADhBSoQRRdCQ4cdtZ_wamyNnrUarNZmPN0ppvNRb9-FJFJ9Vk7deYcj6ru6Id3tuMf1iQ63UlTK6G/s640/EventHubs.png" width="640" /></a></div>
<br />
Typically only a single consumer is meant to read from a partition hence having more partitions is important for improving scalability. In terms of publishing, this is much more laxed: a high number of producers can send events to EventHub.<br />
<br />
How does EventHub assign events to partitions? You can optionally send a Partition Key which gets hashed and used for assigning to partitions. To make sure you get the best out of your system, the Partition Key needs to be evenly distributed. If you are sending device events, you would most likely use the DeviceId. For customer events, Customer ID is a natural choice. This will ensure all events for a device or customer are ordered according to the time they are arrived at the EventHub.<br />
<br />
Usually there are data pipelines that receive and funnel the incoming data (usually through a passthru API) to these stores but the key point is these data pipelines exhibit the same challenges shared by the stream processing. While initially exposing EventHub directly to the outside world was advocated by Microsoft, you would most likely want to hide your EventHub behind a passthru API that does authentication and optimises delivery of the events to the EventHub by batching. This layer is also useful to handle back-pressure by buffering events so you can deal with spikes gracefully. One thing you cannot do here at the API is to keep the caller waiting for event to be successfully committed to the EventHub for a few reasons. First of all, EventHub can sometimes have latency in the order of 100-150ms. While this is completely acceptable for most purposes, (other than High-Frequency Trading!), keeping clients waiting means more power consumption for publishers many of whom are phones and other low-power devices, sending many events per hour. Another reason is that EventHub works best if you send events in batches hence waiting until your buffer is full and then committing the batch of events.<br />
<br />
Batching is already supported built-in with the EventHub:<br />
<pre class="brush:csharp">var batch = new EventDataBatch("myPartitionKey");
batch.TryAdd(eventData); // keep adding until method returns false
await client.SenAsync(batch);
</pre>
<br />
But did you notice something? All events within a batch must have the same partition key. While it is understandable Microsoft made this decision for performance reasons - since all such events will be sent to the same partition otherwise batch has to wait for all partitions involved to respond successfully - it essentially renders batching remarkably less useful. As said earlier, Partition Key must have widely diverse value such as Customer ID or Device ID. There is no guarantee that an event arrived from a customer at the API is followed by enough events from the same customer in a reasonable amount of time to fill the batch and make batching worthwhile - let alone those events arriving at exactly the same web-head.<br />
<a href="https://www.blogger.com/blogger.g?blogID=2889416825250254881" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"></a><br />
Solution is to essentially send the events directly to partitions. But the hashing takes place at the EventHub, how could we know what Partition Key gets allocated to which partition? This implementation is opaque and is not possible to reproduce it outside EventHub. That is why we have to hash the Partition Keyes <em>ourselve</em>s and send batches directy to the partitions. All we need is a hashing algorithm capable of uniformly hash Partition Keyes across partitions. It turns out that most hashing algorithms including MD5 can easily achieve this, although some might be cryptographically broekn. MD5 is a very quick and efficient algorithm hence is a good fit.<br />
<br />
<h1>
<a href="https://www.blogger.com/null" id="Psyfon_28"></a>Psyfon</h1>
Now, all of what I have said so far - batching, buffering and hashing - have been implemented in an Open Source project called <a href="https://github.com/aliostad/psyfon">Psyfon</a>. Using this library supporting both .NET Standard 2.0 and .NET 4.52, all you have to do is to create a single instance of <code>BufferingEventDispatcher</code> per process, start it and add events to it:<br />
<br />
<pre class="brush:csharp">var singletonDispatcher = BufferingEventDispatcher("<connection string>");
singletonDispatcher.Start();
// and somewhere else in the code where events generated
var ed = new EventData(mySerialisedEventAsByteArray);
singletonDispatcher.Add(ed);
</pre>
<br />
You can set a maximum byte size (according to the size of your events) and maximum number of seconds before committing the batches, whichever is reached earlier batch will be committed to the partition. I have tested it under high scale and essentially a single process had no issue sending 5000 EPS to EventHub. I will be publishing the results of a more extended test soon.<br />
<br />
While my use case was a passthru API, this cane be equally used for dispatching monitoring and instrumentation events to the EventHub. <a href="https://github.com/aliostad/psyfon">PerfIt</a>, another Open Source library will benefit from this very soon - watch the space.<br />
<script type="text/javascript">
SyntaxHighlighter.all();
</script>aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com0tag:blogger.com,1999:blog-2889416825250254881.post-28394425995442683972017-04-14T22:24:00.001+01:002018-05-14T21:33:10.479+01:00Future of CacheCow and the birth of CacheCore [Update: No CacheCore!]<script src="http://softxnet.co.uk/sh/js/shcore.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushjscript.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushcsharp.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
<a href="https://github.com/aliostad/CacheCow" target="_blank">CacheCow</a> is my most popular OSS project which came to being back in 2012. It started its life as part of <a href="https://github.com/WebApiContrib/WebAPIContrib">WebApiContrib</a> but the need for a full-fledged project supporting different storage options soon led me to create CacheCow - which needed both client and server components.<br />
<br />
With the availability of .NET Core which brings its completely new HTTP pipeline, the question has been when or how CacheCow will move to the .NET Core. On the client-side, <code>HttpClient</code> still is the king of making HTTP requests meaning CacheCow.Client will work when the long awaited .NET Standard 2.0 comes along allowing us to reference older .NET libraries. On the server-side, however, it is clear that CacheCow.Server has no place of existence since the pipeline in its entirety has been changed. So what should we do? Create a completely new project for both client-side and server-side or maintain CacheCow.Client (while migrating it to .NET Standard to support the new .NET) and create a new project for the server-side?<br />
<br />
I have been thinking hard about this and for the reasons I will explain, <strong>I will be creating a completely new project called <a href="http://github.com/aliostad/CacheCore">CacheCore</a></strong> (other contenders were <i>Cache-vNext</i>, <i>CacheDnx</i> and also recently <i>CacheStandard</i>) which will contain both client and server elements. [UPDATE: Please view newer announcement <a href="http://byterot.blogspot.co.uk/2018/05/cachecow-20-is-here-supporting-netcore-netstandard-aspnetcore-httpclient-aspnetwebapi-etag.html" target="_blank">here</a>]<br />
<br />
If you would like to know the details (REST discussions, lessons learned including some gory confessions below) read the rest, otherwise feel free to watch the space as things will start to happen.<br />
<br />
I am under no illusion that this will require quite some effort. Apart from the learning curve and niggles with tooling, I find the most frustrating aspect to be trying to google anything Core related: internet is now full of <strong>discarded evolutionary artifacts</strong> in the form of blogs, stackoverflow questions, tutorials and even MSDN documentation - each documenting the journey not the current state. If you think that is a small issue, ask anyone picking .NET Core for the first time. I wish we had a big giant flush and could have flushed all that to /dev/null wiping the history clean - never mind all those many many hours lost. OK, rant over - promise.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="CacheCoreServer_13"></a>CacheCore.Server</h2>
As I said, I have confessions to make and one is just coming. When I designed server components of CacheCow as an API middleware, the idea was that they would be used for services that are purely RESTful in the sense that all changes to the state of the resource would be going through the API to carry out the state change. Initially there does not seem anything extra-ordinary about this but I gradually learnt that <b>cache coherency is a very big responsibility for a mere middleware to take on</b>.<br />
<br />
First of all, there are many services out there that the underlying data <em>could</em> change without a request passing through the API. What is worse is that even if all state change is via API calls, the change to a resource could invalidate other resources. For example, a <code>PUT</code> request to <code>/cars/123</code> will invalidate the <code>/cars/123</code> which is fine, but what about ‘/cars’? So I started thinking about resources in terms of collection and instance and CacheCow.Server started to infer collection and instance resources based on a convention - hence I used <strong>Route Pattern</strong> concept so the application could configure the cache invalidation, so here route pattern would be <code>/cars/*</code>.<br />
<br />
But the problem did not stop there. A change to <code>/cars/123/contracts/456</code> could invalidate all these URLS: <code>/cars/123/contracts</code>, <code>/cars/123</code> and possibly <code>/cars</code> - hence CacheCow now needs to walk up the tree and invalidate all those resources. And now to the next level of headaches: a <code>POST /orders/1234</code> could invalidate <code>customer/987</code> as there is no apparent connection unless the application tells us - which made me introduce the concept of <strong>Linked Route Patterns</strong> so the application could configure these relationships. And configuring was of course a pain, and frankly I think except me and a handful other people really did not quite get what I was on about.<br />
<br />
Now, I believe it is too much of a responsibility for an HTTP middleware to do cache coherency. As such <strong>CacheCore.Server will be a lot simpler</strong>: no more entity tag storage, application will decide to use ETag or LastModifieDate for cache coherency and will be responsible for providing these values - although I will provide some helpers. One key difference in this implementation would be a set of tools fitting different scenarios rather than a single HTTP Caching god-class.<br />
<br />
To explain this aspect further, HTTP caching is a spectrum of primitives that help you build more scalable (caching) and consistent (concurrency) systems - some of which are basic and used by many, while others have remained obscure and seldom used. Caching and expiry on resources are better known while from my experience, conditional PUT to achieve optimistic concurrency is rarely used - even conditional GET is rarely used by HTTP clients other than browsers. As such, CacheCore will come with three filters starting from the most basic to the most advanced:<br />
<ul>
<li>
<strong>BasicCacheFilter</strong>: This is the simplest filter which covers returning Cache-Control headers according to expiry configuration, reading the ETag or LastModified from the returned model (or inferring them by using reflection) and handling conditional GET for you. As long as you have a property called ETag or LastModified (or LastModifiedDate, etc) on the model you return from your API, this will work. For conditional GETs to this filter you would not save on any pressure your “database”: API calls will result on retrieval of data to the API so the filter can find the ETag or LastModified and accordingly respond to conditional GET requests.</li>
</ul>
<ul>
<li>
<strong>LookupCacheFilter</strong>: This filter improves on the BasicCacheFilter by allowing the application to provide a callback mechanism for the application to look up ETag or LastModified without having to load the full model. Caching almost always gets used on resources where the operation is expensive either in IO or computation costs and this approach helps you to replace loading the full model with a light-weight lookup call. For example, let’s say the resource is <code>/cars/123</code> and you keep a <code>LastModifiedDate</code> on your <em>cars</em> database and use hash of the <code>LastModifiedDate</code> as the ETag (you could use LastModifiedDate to do cache validation on the date but HTTP date’s precision is sadly up to a second which might not be enough for you). In this case, the filter will enquire the application for ETag or LastModified of the resource and you can call your database and read that value for car:id=123 without loading the whole car - which is going to be a lighter database call. So this filter will do all BasicCacheFilter (and in more efficient way) and will even do conditional PUT for you. What is the problem with this one? Consistency: in terms of conditional PUT, validation is not atomic, e.g. you look up the ETag and you find the condition is met and proceed to update meanwhile data could have changed between the lookup and update (same could also apply to conditional GET but has less serious impact). This is not a problem for everyone hence I think this filter hits the sweet spot for simplicity and effectiveness.</li>
</ul>
<ul>
<li>
<strong>StrongConsistencyCacheFilter</strong>: This is basically the same as above but maintains airtight consistency by allowing the application to implement atomic conditional GET and PUT - which means application has to do more.<br />
</li>
</ul>
I have plans for these to be GET or PUT specific since actions are usually designed as such.<br />
Now you might ask, why CacheCore is a filter and not a middleware? If you remember, CacheCow.Server was a DelegatingHandler (akin to an <a href="http://asp.net/">ASP.NET</a> Core middleware). Well, here is another lesson learnt: caching is a highly localised concern, it is a mistake to implement it as a global HTTP intermediary.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="CacheCoreClient_35"></a>CacheCore.Client</h2>
Considering the client story in .NET Core for HTTP has not been drastically changed, it is fair to assume CacheCow.Client can still be used.<br />
<br />
That is true, however, there are a few reasons I would like to start afresh. First of all, CacheCow’s inception and the main of the codebase was designed when .NET yet did not have an <code>await</code> keyword. This resulted in a <code>.ContinueWith()</code> soup which was hard to read and difficult to maintain. On the other hand, some interfaces supported async while others did not, resulting in breaking the <em>async all the way</em> rule. Also I had in mind for the storage to be clever about how much storage it uses per site and implement LRU while many underlying storages did not provide the primitive to do so - and frankly in this 5 years I have never needed it.<br />
<br />
I think it is time to get rid of these shortcomings hence there will be a new client project too.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="Future_of_CacheCowServer_and_CacheCowClient_41"></a>Future of CacheCow.Server and CacheCow.Client</h2>
It would be naive to think everyone will move to .NET Core straightaway. In fact, with .NET Standard 2.0, Microsoft has shown to have realised there needs to be a better interoperability between the <em>classic</em> .NET and the .NET Core. Apart from interoperability, I think people will carry on using and building .NET APIs for another few years.<br />
<br />
Fore these reasons, I will carry on supporting CacheCow and releasing bug fixes, etc. Thanks for helping it improve by using it, reporting issues and sending bug fixes.aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com13tag:blogger.com,1999:blog-2889416825250254881.post-4307716701925150362017-01-31T18:23:00.000+00:002017-02-02T12:10:45.636+00:00Announcing Zipkin Collector for Azure EventHub<script src="http://softxnet.co.uk/sh/js/shcore.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushjscript.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushcsharp.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
If you are reading this, you have probably heard of Zipkin. If not, please take my word to leave this post to spend 10 minutes <a href="http://zipkin.io/" target="_blank">reading up</a> on it - a very worthwhile 10 minutes which will introduce to you one of the best, yet simplest distributed tracing systems. It one word, it tells you where the time to serve requests been most spent helping you to optimise your Microservice architecture.<br />
<br />
Zipkin, used by the likes of Twitter and Netflix, has already created a complete storm in the Java/JVM ecosystem, but many of us in the .NET community have not heard of it - and that is frankly a real pity. And if you have heard it and want to use it, yes of course we can try to port the whole system over to .NET but that would be a huge amount of work and frankly a waste since Zipkin is designed to work across different stacks as long as you can somehow get your data over to it. The data is normally pushed to Kafka, and Zipkin consume messages from Kafka by a component called <i>Collector.</i> Data then gets stored in a storage (currently available for MySQL, Cassandra or Elasticsearch) and then served by the UI.<br />
<br />
Of course nothing stops you to run Kafka in your cloud or on-premise environment, but if you have never done it, to say the least, ZooKeeper (a consensus required for running Kafka) is not the easiest service to operate. And frankly if you are on Azure it makes a lot of sense to use EventHub, an Azure PaaS service with functionality very similar to Kafka. Sadly there were no collector for it.<br />
<br />
I have been very keen to bring Zipkin to ASOS, but could not quite justify running ZK and Kafka, even for myself. Hence I felt something has to be done about it. The only problem: had never done a Java/Maven project before.<br />
<br />
<div style="text-align: center;">
* * *</div>
<br />
I have been doing what I have been doing - being a professional developer - for some time now. And I have had my ups and downs, both moments that I am proud of and moments of embarrassment because I have messed up. But never, have I just picked up a complete different stack, and built something like what I am going to share, within a couple of weeks. [Yeah I am talking about Zipkin Collector for Azure EventHub]<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://i0.kym-cdn.com/photos/images/facebook/000/234/765/b7e.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://i0.kym-cdn.com/photos/images/facebook/000/234/765/b7e.jpg" /></a></div>
<br />
<br />
This really has been a testament to how pluggable and nicely designed-Zipkin is, and above all it has a truly amazing community - championed by <a href="https://twitter.com/adrianfcole" target="_blank">Adrian Cole</a>. Help was always around the corner, be it on hardcore stuff such as how to modularise collector or my noob problems with Maven.<br />
<br />
Not to forget too, that Azure EventHub SDK basically made it completely trivial to implement a working solution. All the heavy lifting has been done by the EventProcessorHost so all is left is a bit of plumbing to get the configuration over to these components.<br />
<br />
<div style="text-align: center;">
* * *</div>
<h2>
How to use EventHub Collector</h2>
So the idea is that you would run <a href="https://github.com/openzipkin/zipkin/tree/master/zipkin-server" target="_blank">zipkin-server</a> (which hosts the Zipkin UI) and in the same process you run your collector. Zipkin uses Spring Boot's auto configuration mechanism to load the right collector based on the configurations provided. The project is host on <a href="https://github.com/aliostad/zipkin-collector-eventhub" target="_blank">github</a>. [<b>UPDATE: </b><i>Project has moved to OpenZipkin organisation <a href="https://github.com/openzipkin/zipkin-azure" target="_blank">here</a></i>]<br />
<br />
EventHub Collector gets triggered by the existence of "zipkin.collector.eventhub.eventHubConnectionString" configuration via command line. Rest of the configurations necessary can be passed by an application.properties or application.yaml file.<br />
<br />
So to run the EventHub collector you need:<br />
<br />
1- zipkin.jar (zipkin-server)<br />
2- application.properties file<br />
3- zipkin-collector-eventhub-autoconfig module jar (which contains transitive dependencies too). This jar is not on maven yet<br />
<br />
So in order to run:<br />
<br />
<h3>
1- Clone the source and build</h3>
<span style="font-family: "courier new" , "courier" , monospace;">mkdir zipkin-collector-eventhub</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">cd zipkin-collector-eventhub</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">git clone git@github.com:aliostad/zipkin-collector-eventhub.git</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">mvn package</span><br />
<br />
If you do not have maven, get maven <a href="http://maven.apache.org/install.html" target="_blank">here</a>.<br />
<br />
<h3>
2- Unpackage MODULE jar into an empty folder</h3>
copy <span style="font-family: "courier new" , "courier" , monospace;">zipkin-collector-eventhub-autoconfig-x.x.x-SNAPSHOT-module.jar</span> (that has been package in the target folder) into an empty folder and unpackage<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">jar xf zipkin-collector-eventhub-autoconfig-0.1.0-SNAPSHOT-module.jar</span><br />
<br />
You may then delete the jar itself.<br />
<br />
<h3>
3- Download zipkin-server jar</h3>
<br />
Download the latest zipkin-server jar (which is named zipkin.jar) from here. For more information visit zipkin-server homepage.<br />
<br />
<h3>
4- create an application.properties file for configuration next to the zipkin.jar file</h3>
Populate the configuration - make sure the resources (Azure Storage, EventHub, etc) exist. <b>Only storageConnectionString is mandatory</b> the rest are optional and must be used only to override the defaults:<br />
<br />
<blockquote class="tr_bq">
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: x-small;">zipkin.collector.eventhub.storageConnectionString=<azure storage connection string><br />zipkin.collector.eventhub.eventHubName=<name of the eventhub, default is zipkin><br />zipkin.collector.eventhub.consumerGroupName=<name of the consumer group, default is $Default><br />zipkin.collector.eventhub.storageContainerName=<name of the storage container, default is zipkin><br />zipkin.collector.eventhub.processorHostName=<name of the processor host, default is a randomly generated GUID><br />zipkin.collector.eventhub.storageBlobPrefix=<the path within container where blobs are created for partition lease, processorHostName></span></blockquote>
<h3>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>5- Run the server along with the collector</h3>
Assuming zipkin.jar and application.properties are in the current working directory, run this from the command line (note that the connection string to the eventhub itself is passed in the command line):<br />
<br />
<blockquote class="tr_bq">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">java -Dloader.path=/where/jar/was/unpackaged -cp zipkin.jar org.springframework.boot.loader.PropertiesLauncher --spring.config.location=application.properties --zipkin.collector.eventhub.eventHubConnectionString="<eventhub connection string, make sure quoted otherwise won't work>"</span></blockquote>
<br />
<br />
After running, spring boot and the rest of the stack gets loaded and then you should be able to see some INFO output from the collector outputting the configuration you have passed.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDfDG2-Tb8L9o_Xm9C9Z7_qBx4ZHrOM2UlN5RFj7ixW0brqG4ZCBvlwIJw3OqJ0-dZ3QOyLoUkEmZLDl08x2DcM0dwabPiX6DycALymTZq3llHdqzYD60GMF-W-Pz1K4OWCTFAz0zmBKMO/s1600/Screen+Shot+2017-01-31+at+21.00.50.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="210" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDfDG2-Tb8L9o_Xm9C9Z7_qBx4ZHrOM2UlN5RFj7ixW0brqG4ZCBvlwIJw3OqJ0-dZ3QOyLoUkEmZLDl08x2DcM0dwabPiX6DycALymTZq3llHdqzYD60GMF-W-Pz1K4OWCTFAz0zmBKMO/s640/Screen+Shot+2017-01-31+at+21.00.50.png" width="640" /></a></div>
<br />
You should be up and running and can start pushing spans to your EventHub.<br />
<br />
<h2>
Span serialisation guideline</h2>
<div>
EventHub Collector expects spans serialised as JSON array of spans. The payload gets read as a UTF-8 string and gets deserialised by the zipkin-server components.</div>
<div>
<br /></div>
<h2>
Roadmap</h2>
<div>
Next step is to get the jar on to maven central. Also I will start working on a .NET library to make building spans easier. </div>
<br />
<br />
<br />aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com11tag:blogger.com,1999:blog-2889416825250254881.post-64522000820333737032016-07-20T21:29:00.000+01:002016-07-21T09:26:59.464+01:00Singleton HttpClient? Beware of this serious behaviour and how to fix it<script src="http://softxnet.co.uk/sh/js/shcore.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushjscript.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushcsharp.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
If you are consuming a Web API in your server-side code (or .NET client-side app), you are very likely to be using an HttpClient.<br />
<br />
<a href="https://msdn.microsoft.com/en-us/library/system.net.http.httpclient.aspx">HttpClient</a> is a very nice and clean implementation that came as part of Web API and replaced its clunky predecessor <a href="https://msdn.microsoft.com/en-us/library/system.net.webclient.aspx">WebClient</a> (although only in its HTTP functionality, WebClient can do more than just HTTP).<br />
<br />
HttpClient is usually meant to be used with more than just a single request. It conveniently allows for default headers to be set and applied to all requests. Also you can plug in a CookieContainer to allow for all sessions.<br />
<br />
Now, ironically it also implements <code>IDisposable</code> suggesting a short-lived lifetime and disposing it as soon as you are done with. This lead to several discussions in the community (<a href="https://github.com/mspnp/performance-optimization/blob/master/ImproperInstantiation/docs/ImproperInstantiation.md">here</a> from Microsoft Patterns and Practices, <a href="https://twitter.com/darrel_miller">Darrel Miller</a> in <a href="http://www.bizcoder.com/httpclient-it-lives-and-it-is-glorious">here</a> and a few references in StackOverflow <a href="http://codereview.stackexchange.com/questions/69950/single-instance-of-reusable-httpclient">here</a>) to discuss whether it can be used with longer lifetime and more importantly whether it needs disposal.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhx3iRF8ppIBWbexn5koSvhTCaT5KmqWwB3YD_B5FuTOjvD1QGOT_azEGv8wjNccIaJg__DfVR3xFhuJzYQAiVhkD6YrJbx0w54nWYukM2Wrp1vBWKYZ1tr4FsqoQQt29E4VG9zkoRmVSnJ/s1600/22594851144_cb9cee0aa4_k.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="426" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhx3iRF8ppIBWbexn5koSvhTCaT5KmqWwB3YD_B5FuTOjvD1QGOT_azEGv8wjNccIaJg__DfVR3xFhuJzYQAiVhkD6YrJbx0w54nWYukM2Wrp1vBWKYZ1tr4FsqoQQt29E4VG9zkoRmVSnJ/s640/22594851144_cb9cee0aa4_k.jpg" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Singleton HttpClient matters, especially when it comes to the <b>performance</b> [Dragan Brankovich - <a href="https://www.flickr.com/photos/draganbrankovic/22594851144/in/photolist-AqCw43-qZ9WQy-pXwMYF-gRSubv-psyv3n-ov2ZBh-p7GDjY-shSx3x-pgE3pg-qjAGRF-qWcQhk-dM89F3-dB7VKm-riqhrn-h2GewF-dBp4Em-n1du36-p7yXiQ-qPyz1S-gDzxw5-dvrcNZ-9Jzqdy-9gS1AV-rbCbGY-eVpgTG-9JTyx3-rNbM1L-9zTeQm-pvZKSo-pPQDNE-gS4vUg-p1WMi8-hFqTxo-nXvZuX-jbXJmo-qr9bys-pxQ3oq-i9Kto2-dFbWwm-rdL5Fs-e3nZww-kuo5jH-enNiS7-aQve9x-jvKigd-hiohR6-qiRYQF-qYHB9c-akabfT-s8gCBQ/" target="_blank">Flickr</a>]</td></tr>
</tbody></table>
<br />
HttpClient implements IDisposable only indirectly through <code>HttpMessageHandler</code> and only as a result of in-case not an immediate need - I am not aware of an implementation of <code>HttpMessageHandler</code> that holds unmanaged resources (the mere reason for implementing <code>IDisposable</code>).<br />
<br />
In short, the community agreed that it was 100% safe, not only not disposing the HttpClient, but also to use it as Singleton. The main concern was thread safety when making concurrent HTTP calls - and even official documentations said there is no risk doing that.<br />
<br />
But it turns out there is a serious issue: <strong>DNS changes are NOT honoured</strong> and HttpClient (through HttpClientHandler) hogs the connections until socket is closed. Indefinitely. So when does DNS change occur? Everytime you do blue-green deployment (in Azure cloud services when you deploy to staging slot and then swap production/staging slots). Everytime you change settings in your Azure Traffic Manager. Failover scenarios. Internally in a myriad of PaaS offerings.<br />
<br />
<strong><i>And this has been going on for more than 2 years without being reported... makes me wonder what kind of applications we build with .NET?</i></strong><br />
<strong><br /></strong>
Now if the reason for DNS change is failover, your connection would have been faulted anyway so this time connection would open against the new server. But if this were the blue-black deployment, you swap the staging and production and your calls would still go to the staging environment - a behaviour we had seen but had fixed it by bouncing the dependent servers thinking possibly this was an Azure oddity. What a fool was I - it was there in the code! Whose code? Well debateable...<br />
<br />
<h1>
<a href="https://www.blogger.com/null" id="Analysis_19"></a>Analysis</h1>
All of this goes back to the implementation in <code>HttpClientHandler</code> that uses <code>HttpWebRequest</code> to make connections none of which code is open sourced. But obviously using Jetbrain’s <a href="https://www.jetbrains.com/decompiler/">dotPeek</a> we can look into the decompiled code and see that HttpClientHandler creates a connection group (named with its hashcode) and does not close the connections in the group until getting disposed. This basically means the DNS check never happens as long as a connection is open. This is really terrifying...<br />
<pre class="brush:csharp">protected override void Dispose(bool disposing)
{
if (disposing && !this.disposed)
{
this.disposed = true;
ServicePointManager.CloseConnectionGroups(this.connectionGroupName);
}
base.Dispose(disposing);
}
</pre>
As you can see, <a href="https://msdn.microsoft.com/en-us/library/system.net.servicepoint.aspx">ServicePoint</a> class plays an important role here: controlling number of concurrent connects to a ‘service point/endpoint’ as well as keep-alive behaviours.<br />
<br />
<h1>
Solution</h1>
A naive solution would be to dispose the HttpClient (hence the HttpClientHandler) every time you use it. As explained this is not how HttpClient is intended to be used.<br />
<br />
Another solution is to set <code>ConnectionClose</code> property of <code>DefaultRequestHeaders</code> on your HttpClient:<br />
<pre class="brush:csharp">var client = new HttpClient();
client.DefaultRequestHeaders.ConnectionClose = true;
</pre>
This will set the HTTP’s keep-alive header to false so the socket will be closed after a single request. It turns out this can add roughly <b>extra 35ms</b> (with long tails, i.e amplifying outliers) to each of your HTTP calls preventing you to take advantage of benefits of re-using a socket. So what is the solution then?<br />
<br />
Well, courtesy of my good friend <a href="https://www.linkedin.com/in/andrew-jutton-6a619713">Andy Jutton</a> of <a href="https://www.amido.com/">Amido</a>, the solution lies in an obscure feature of the ServicePoint class. Basically, as we said, ServicePoint controls many aspects of TCP connections and one of the properties is <a href="https://msdn.microsoft.com/en-us/library/system.net.servicepoint.connectionleasetimeout.aspx">ConnectionLeaseTimeout</a> which controls how many milliseconds a TCP socket should be kept open. Its default value is -1 which means connections will be stay open indefinitely… well in real terms, until the server closes the connection or there is a network disruption - or the HttpClientHandler gets disposed as discussed.<br />
<br />
So the root cause is basically that with the default value of -1, which is IMHO, wrong and potentially dangerous setting.<br />
<br />
Now to fix it, all we need to do is to get hold of the ServicePoint object for the endpoint by passing the URL to it and set the ConnectionLeaseTimeout:<br />
<pre class="brush:csharp">var sp = ServicePointManager.FindServicePoint(new Uri("http://foo.bar/baz/123?a=ab"));
sp.ConnectionLeaseTimeout = 60*1000; // 1 minute
</pre>
So this is something that you would want to do only at the <i><b>startup</b> of your application, </i>once and for all endpoints your application is going to hit (if endpoints decided at runtime, you would be setting that at the time of discovery). Bear in mind, path and query strings are ignored and <i>only the host, port and schema</i> are important. Depending on your scenario, values of 1-5 minutes probably make sense.<br />
<br />
<h1>
<a href="https://www.blogger.com/null" id="Conclusion_61"></a>Conclusion</h1>
Using Singleton HttpClient results in your instance not to honour DNS changes which can have serious implications. The solution is to set the ConnectionLeaseTimeout of the ServicePoint object for the endpoint.
<script type="text/javascript">
SyntaxHighlighter.all();
</script>aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com34tag:blogger.com,1999:blog-2889416825250254881.post-71408168778407083922016-06-14T12:00:00.000+01:002016-06-15T13:38:28.377+01:00After all, it might not matter - A commentary on the status of .NET<script async="" charset="utf-8" src="//platform.twitter.com/widgets.js"></script>
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript"></script>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div style="margin-left: 1em; margin-right: 1em;">
</div>
Did you know what was the most menacing nightmare for a peasant soldier in Medieval wars? Approaching of a knight.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhz1nQTZRpDw0-hwhTxLHjECun2qu2M0lmgdEAZxNFEhjr3loA5tf5w6QxmkwJRUuOaiWrBsaqeQtOITD6umNwR8f8LrxLMzCjP0y0UxxTM-4sgE7Xr7iho4EG7S_sDsMQlLAjny7uCDaOs/s1600/knight+%25281%2529.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhz1nQTZRpDw0-hwhTxLHjECun2qu2M0lmgdEAZxNFEhjr3loA5tf5w6QxmkwJRUuOaiWrBsaqeQtOITD6umNwR8f8LrxLMzCjP0y0UxxTM-4sgE7Xr7iho4EG7S_sDsMQlLAjny7uCDaOs/s640/knight+%25281%2529.jpg" width="426" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td class="tr-caption" style="font-size: 12.8px;">Approaching of a knight - a peasant soldier's nightmare [image <a href="https://eatinglikeahorse.files.wordpress.com/2011/04/knight.jpg" target="_blank">source</a>]<br />
<div>
<br /></div>
</td></tr>
</tbody></table>
</td></tr>
</tbody></table>
Famous for gallantry and bravery, armed to the teeth and having many years of training and battle experience, knights were the ultimate war machine for the better part of Medieval times. The likelihood of survival for a peasant soldier in an encounter with a knight was very small. They should somehow deflect or evade the attack of the knight’s sword or lance meanwhile wielding a heavy sword bring about the injury exactly at the right time when the knight passes. Not many peasant had the right training or prowess to do so.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhga5XpLaxQylzRIFDWgQ1OSBcSTxejUpLy4X2k9yY8k6p115nS2jIvjyrqaV5wkGlMm9_fxsfPyHSpG1Xs1CLZ75p3Q2k_uZOIqmbbZ-IsICMsex9Fr250HhQdVGWR21Qds1Iq04z3clAa/s1600/9200397_BibliographicResource_3000126282214+%25281%2529.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="448" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhga5XpLaxQylzRIFDWgQ1OSBcSTxejUpLy4X2k9yY8k6p115nS2jIvjyrqaV5wkGlMm9_fxsfPyHSpG1Xs1CLZ75p3Q2k_uZOIqmbbZ-IsICMsex9Fr250HhQdVGWR21Qds1Iq04z3clAa/s640/9200397_BibliographicResource_3000126282214+%25281%2529.jpg" width="640" /></a></div>
<br />
Appearing around <a href="http://www.carrollquigley.net/pdf/Weapons%20Systems%20and%20Political%20Stability.pdf">1000 AD</a>, the dominance of knights started following the conquest of <a href="https://en.wikipedia.org/wiki/William_the_Conqueror">William of Normandy</a> in 11th century and reached it heights in 14th century:<br />
<blockquote>
“When the 14th century began, knights were as convinced as they had always been that they were the topmost warriors in the world, that they were invincible against other soldiers and were
destined to remain so forever… To battle and win renown against other knights was regarded as the supreme knightly occupation” [Knights and the Age of Chivalry,<a href="https://www.amazon.co.uk/Knights-Age-Chivalry-Studio-book/dp/B0006CJM9C" target="_blank">1974</a>]</blockquote>
And then something happened. Something that changed the military combat for the centuries to come: the projectile weapons.<br />
<blockquote>
“During the fifteenth century the knight was more and more often confronted by disciplined and better equipped professional soldiers who were armed with a variety of weapons capable of piercing and crushing the best products of the armourer’s workshop: the Swiss with their halberds, the English with their bills and long-bows, the French with their glaives and the Flemings with their hand guns” [Arms and Armor of the Medieval Knight: An Illustrated History of Weaponry in the Middle Ages, <a href="https://www.amazon.com/Arms-Armor-Medieval-Knight-Illustrated/dp/0517103192" target="_blank">1988</a>]</blockquote>
The development of longsword had provided more effectiveness for the knight attack but there was no degree of training or improved plate armour could stop the rise of the projectile weapons:<br />
<blockquote>
“Armorers could certainly have made the breastplates thick enough to withstand arrows and bolts from longbows and crossbows, but the knights could not have carried such a weight around all day in the summer time without dying of heat stroke.”</blockquote>
And the final blow was the handguns:<br />
<blockquote>
“The use of hand guns provided the final factor in the inevitable process which would render armor obsolete” [Arms and Armor of the Medieval Knight: An Illustrated History of Weaponry in the Middle Ages, <a href="https://www.amazon.com/Arms-Armor-Medieval-Knight-Illustrated/dp/0517103192" target="_blank">1988</a>]</blockquote>
And with the advent of <a href="https://en.wikipedia.org/wiki/Arbalest">arbalests</a>, importance of lifelong training disappeared since <a href="http://www.medievalwarfare.info/weapons.htm">“an inexperienced arbalestier could use one to kill a knight who had a lifetime of training”</a><br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhr6nHbc_s_vZ-WGwybGwgk3CU3Ixp1QzffcdPEjUYeHMuEfDqiAYHHeReyMBvmPHjbsZ2eIbcP-nLMZMahmIqpwwOlxhu_fWywuCcrPHaNr2qV8qS571_v4shtlWq-TQLEaSyhqzexwFga/s1600/crecy11.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="326" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhr6nHbc_s_vZ-WGwybGwgk3CU3Ixp1QzffcdPEjUYeHMuEfDqiAYHHeReyMBvmPHjbsZ2eIbcP-nLMZMahmIqpwwOlxhu_fWywuCcrPHaNr2qV8qS571_v4shtlWq-TQLEaSyhqzexwFga/s640/crecy11.jpg" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: 12.8px;">Projectile weapons [image </span><a href="http://militaryhistorynow.com/wp-content/uploads/2012/05/crecy11.jpg" style="font-size: 12.8px;" target="_blank">source</a><span style="font-size: 12.8px;">]</span></td></tr>
</tbody></table>
<br />
Over the course of the century, knighthood gradually disappeared from the face of the earth.<br />
<br />
A paradigm shift. A disruption.<br />
<br />
<div style="text-align: center;">
* * *</div>
<br />
After the big promise of web 1.0 was not delivered resulting in the .com crash of 2000-2001, development of robust RPC technologies combined with better languages and tooling gradually rose to fulfill the same promise in web 2.0. On the enterprise front, the need for reducing cost by automating business process lead to the growth of IT departments in virtually any company that could have a chance to survive in the 2000s decade.<br />
<br />
In the small-to-medium enterprises, the solutions almost invariably involved some form of a database in the backend, storing CRUD operations performed on data entry forms. The need for reporting on those databases resulted in creating business Intelligence functions employing more and more SQL experts.<br />
<br />
With the rise of e-Commerce, there was a need for most companies to have online presence and and ability to offer some form of shopping experience online. On the other hand, to reduce cost of postage and paper, companies started having account management online.<br />
<br />
Whether SOA or not, these systems functioned pretty well for the limited functionality they were offering. The important skills the developers of these systems needed to have was good command of the language used, object-oriented coding design principles (e.g. SOLID, etc), TDD and also knowledge of the agile principles and process. In terms of scalability and performance, these systems were rarely, if ever, pressed hard enough to break - even with sticky sessions could work as long as you had enough number of servers (it was often said “we are not Google or Facebook”). Obviously availability suffered but downtime was something businesses had used to and it was accepted as the general failure of IT.<br />
<br />
True, some of these systems were actually “lifted and shifted” to the cloud, but in reality not much had changed from the naive solutions of the early 2000s. And I call these systems <strong>The Simpleton Swamps</strong>.<br />
<br />
Did you see what was lacking in all of above? <strong>Distributed Computing.</strong><br />
<div style="text-align: center;">
<br /></div>
<div style="text-align: center;">
* * *</div>
<span style="text-align: center;"><br /></span>
It is a fair question that we need to ask ourselves: what was it that we, as the .NET community, were doing during the last 10 years of innovations? The first wave of innovations was the introduction of revolutionary papers of on <a href="http://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf">BigTable</a> and <a href="http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf">Dynamo</a> Which later resulted in the emergence of NoSQL movement with Apache Cassandra, Riak and Redis (and later Elasticsearch). [During this time I guess we were busy with WPF and Silverlight. Where are they now?]<br />
<br />
The second wave was the Big Data revolution with Apache Hadoop ecosystem (HDFS, Pig, Hive, Mahout, Flume, HBase). [I guess we were doing Windows Phone development building Metro UI back then. Where are they now?]<br />
<br />
The third wave started with Kafka (and streaming solutions that followed), Grid Computing platforms with YARN and Mesos and also the extended Big Data family such as Spark, Storm, Impala, Drill, too many to name. In the meantime, Machine Learning became mainstream and the success of Deep Learning brought yet another dimension to the industry. [I guess we were rebuilding our web stack with Katana project. Where is it now?]<br />
<br />
And finally we have the Docker family and extended Grid Computing (registry, discovery and orchestration) software such as DCOS, Kubernetes, Marathon, Consul, etcd… Also the logging/monitoring stacks such as Kibana, Grafana, InfluxDB, etc which had started along the way as an essential ingredient of any such serious venture. The point is neither the creators nor the consumers of these frameworks could do any of this without in-depth knowledge of Distributed Computing. These platforms are not built to shield you from it, but to merely empower you to make the right decisions without having to implement a consensus algorithm from scratch or dealing with the subtleties of building a gossip protocol.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVSn8cg137uXVxK4Ii8p0irL4pZH7q_0-0NACL6zPPiHqdjlVNucOcTQbDmWbAiA9cuHECGzaRd1Nd9mJ1ME559RJBThyphenhyphenvLamuHsxT3HqU4xDmBSsKZqEq1RaSfvxKQOf76gXk9XoKkV-I/s1600/spm-integrations.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVSn8cg137uXVxK4Ii8p0irL4pZH7q_0-0NACL6zPPiHqdjlVNucOcTQbDmWbAiA9cuHECGzaRd1Nd9mJ1ME559RJBThyphenhyphenvLamuHsxT3HqU4xDmBSsKZqEq1RaSfvxKQOf76gXk9XoKkV-I/s640/spm-integrations.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><br /></td></tr>
</tbody></table>
And what was it that we have been doing recently? Well I guess we were rebuilding our stacks again with the #vNext aka #DNX aka #aspnetcore. Where are they now? Well <b>actually a release is coming soon: <a href="https://twitter.com/msdevUK/status/740209826039537665" target="_blank">27th of June</a> to be exact</b>. But anyone who has been following the events closely knows that due to recent changes in direction, we are still - give or take - 9 to18 months far from a stable platform that can be built upon.<br />
<br />
So a big storm of paradigm shifts swept the whole industry and we have been still tinkering with our simpleton swamps. Please just have a look at this <a href="https://hadoopecosystemtable.github.io/" target="_blank">big list</a>, only a single one of them is C#: Greg Young’s EventStore. And by looking at the list you see the same pattern, same shifts in focus.<br />
<br />
.NET ecosystem is dangerously oblivious to distributed computing. True we have recent exceptions such as Akka.net (a JVM port) or Orleans but it has not really penetrated and infused the ecosystem. If all we want to do is to simply build the front-end APIs (akin to nodejs) or cross-platform native apps (using Xamarin studio) is not a problem. But if we are not supposed to build the sizeable chunk of backend services, let’s make it clear here.<br />
<br />
<div style="text-align: center;">
* * *</div>
<br />
Actually there is fair amount of distributed computing happening in .NET. Over the last 7 years Microsoft has built significant numbers of services that are out to compete with the big list mentioned above: Azure Table Storage (arguably a BigTable implementation), Azure Blob Storage (Amazon Dynamo?) and EventHub (rubbing shoulder with Kafka). Also highly-available RDBM database (SQL Azure), Message Broker (Azure Service Bus) and a consensus implementation (Service Fabric). There is a plenty of Machine Learning as well, and although slowly, Microsoft is picking up on Grid Computing - alliance with Mesosphere and DCOS offering on Azure.<br />
<br />
But none of these have been open sourced. True, Amazon does not Open Source its bread-and-butter cloud. But considering AWS has mainly been an IaaS offering while Azure is banking on its PaaS capabilities, making Distributed Computing easy for its predominantly .NET consumers. It feels that Microsoft is saying, <i>you know, let me deal with the really hard stuff, but for sure, I will leave a button in Visual Studio so you could deploy it to Azure.</i><br />
<br />
<blockquote class="twitter-tweet" data-lang="en-gb">
<div dir="ltr" lang="en">
OK, SO I’VE INSTALLED DOCKER<br />
<br />
ARE WE DEVOPS YET OR DO I NEED TO INSTALL CONTAINERS TOO<br />
<br />
ALSO WHERE DO I DOUBLE CLICK TO MAKE IT CLOUD AWARE</div>
— PHP CEO (@PHP_CEO) <a href="https://twitter.com/PHP_CEO/status/671812156560642049">1 December 2015</a></blockquote>
<br />
At points it feels as if, <strong>Microsoft as the Lords of the .NET stack fiefdom, having discovered gunpowder, are charging us knights and peasant soldiers to attack with our lances, axes and swords while keeping the gunpowder weapons and its science safely locked for the protection of the castle.</strong> .NET community is to a degree contributing to the #dotnetcore while also waiting for the Silver Bullet that #dotnetcore has been promised to be, revolutionising and disrupting the entire stack. But ask yourself, when was the last time that better abstractions and tooling brought about disruption? The knight is dead, gunpowder has changed the horizon yet there seems to be no ears to hear.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0afvKQI-aZMZ-hggLoWM5cL62y4k1wbODoYlTmdDvvFML5vMr5rhXGL6ge7AsCdblKQWko0PIOB6Ad_JS7NNTLG-LpgfiVql6QCacqhUZE2BvniRYUqo4NnO_JdVTL7MG3JO99DjkisUC/s1600/Upperclassesmidieval.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="476" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0afvKQI-aZMZ-hggLoWM5cL62y4k1wbODoYlTmdDvvFML5vMr5rhXGL6ge7AsCdblKQWko0PIOB6Ad_JS7NNTLG-LpgfiVql6QCacqhUZE2BvniRYUqo4NnO_JdVTL7MG3JO99DjkisUC/s640/Upperclassesmidieval.jpg" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Fiefdom of .NET stack</td></tr>
</tbody></table>
We cannot fault any business entity for keeping its trade secrets. <strong>But if the soldiers fall, ultimately the castle will fall too.</strong><br />
<strong><br /></strong>
In fact, a single company is not able to pull the weight of re-inventing the emerging innovations. While the quantity of technologies emerged from Azure is astounding, quality has not always followed exactly. After complaining to Microsoft on the performance of Azure Table Storage, others finding it too and sometimes abandon the Azure ship completely.<br />
<br />
<blockquote class="twitter-tweet" data-conversation="none" data-lang="en-gb">
<div dir="ltr" lang="en">
<a href="https://twitter.com/aliostad">@aliostad</a> we are switching away as fast as we can</div>
— Rinat Abdullin (@abdullin) <a href="https://twitter.com/abdullin/status/730320319077044224">11 May 2016</a></blockquote>
<br />
No single company is big enough to do it all by itself. Not even Microsoft.<br />
<br />
<div style="text-align: center;">
* * *</div>
<br />
I remember when we used to make fun of Java and Java developers (uninspiring, slow, eclipse was nightmare). They actually built most of the innovations of the last decade, from Hadoop to Elasticsearch to Storm to Kafka... In fact, looking at the top 100 Java repositories on github (minus Android Java), you find 24 distributed computing projects, 4 machine library repos and 2 languages. On C# you get only 3 with claims to distributed computing: ServiceStack, Orleans and Akka.NET. <br />
<br />
But maybe it is fine, we have our jobs and we focus on solving different kinds of problems? Errrm... let's look at some data.<br />
<br />
Market share of IIS web server has been halved over the last 6 years - according multiple independent sources [This <a href="http://www.acunetix.com/blog/articles/statistics-from-the-top-1000000-websites/" target="_blank">source</a> confirms the share was >20% in 2010].<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDlMvLrArikZz_eylY4t7nupdZMCX43T4I3nFdiz95eAf_Y12z6lM7IRIVQ88me4pot0m44_CoiqJJmbW1PWKyvAa92miCvfFMWbpJhFBEqe2rrE7e2gw-zI4CQIibtB-b0w83oNHgESAk/s1600/y.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="354" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDlMvLrArikZz_eylY4t7nupdZMCX43T4I3nFdiz95eAf_Y12z6lM7IRIVQ88me4pot0m44_CoiqJJmbW1PWKyvAa92miCvfFMWbpJhFBEqe2rrE7e2gw-zI4CQIibtB-b0w83oNHgESAk/s640/y.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">IIS share of the market has almost halved in the last 6 years [<a href="https://w3techs.com/technologies/history_overview/web_server/ms/y" target="_blank">source</a>]</td></tr>
</tbody></table>
<br />
Now the market share of C# ASP.NET developers are decreasing to half too from tops of 4%:<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrbG47fYWAocwpNkzmo0GzvQiv8A8qkAo1TaHUcQVe78iVCLoyz7hx01S2U-wgZZw1okx5uLStlGHwtBkFJG6GrEhCik_02uPqI0wze3HpgrzwC_1y5-ywnKhmJh9NM6Hht4P7uDz4dOQN/s1600/Screen+Shot+2016-06-13+at+22.24.04.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="374" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrbG47fYWAocwpNkzmo0GzvQiv8A8qkAo1TaHUcQVe78iVCLoyz7hx01S2U-wgZZw1okx5uLStlGHwtBkFJG6GrEhCik_02uPqI0wze3HpgrzwC_1y5-ywnKhmJh9NM6Hht4P7uDz4dOQN/s640/Screen+Shot+2016-06-13+at+22.24.04.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Job trend for C# ASP.NET developer [<a href="http://www.itjobswatch.co.uk/jobs/uk/csharp/asp.net%20developer.do" target="_blank">source</a>]</td></tr>
</tbody></table>
And if you do not believe that, see another comparison with other stacks from another source:<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMB-TmakMGnuohyphenhyphenBpeB8rb29t6wxpS1-MOj5d4F19w_Pnypn8rF-eabgK4oCg6S2UjDe0FRyP9E_I6oM8IqqaJoZtgZfsSl-lOZifBZuFwH-UTieZLrVsryYfR1fmwxGoKeMdB6gcdfvRQ/s1600/Screen+Shot+2016-06-13+at+22.37.00.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="420" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMB-TmakMGnuohyphenhyphenBpeB8rb29t6wxpS1-MOj5d4F19w_Pnypn8rF-eabgK4oCg6S2UjDe0FRyP9E_I6oM8IqqaJoZtgZfsSl-lOZifBZuFwH-UTieZLrVsryYfR1fmwxGoKeMdB6gcdfvRQ/s640/Screen+Shot+2016-06-13+at+22.37.00.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Comparing trend of C# (dark blue) and ASP.NET (red) jobs with that of Python (yellow), Scala (green) and nodejs (blue). C# and ASP.NET dropping while the rest growing [<a href="http://www.indeed.com/jobtrends/q-node.js-q-ASP.NET-q-Scala-q-Python-q-C%23.html" target="_blank">source</a>]</td></tr>
</tbody></table>
<br />
OK, that was actually nothing, what I care more is OSS. Open Source revolution in .NET which had a steady growing pace since 2008-2009, almost reached a peak in 2012 with <a href="http://asp.net/">ASP.NET</a> Web API excitement and then grew with a slower pace (almost plateau, visible on 4M chart - see appendix). [by the way, I have had my share of these repos. 7 of those are mine]<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihG2LZxEh5-FvoPlO8KjxAq1Dle623-eV0MS7lKxf5IacEpYDjudSFaeU1_BFlxuKNAriwVu07XSgYHwgN0JIdsSmhOJyZef_J8dOPN7NsqqsQP0LUYP6OcjYRzq9qLaM5Qnwsi73WHuBA/s1600/Screen+Shot+2016-06-13+at+21.04.49.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="314" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihG2LZxEh5-FvoPlO8KjxAq1Dle623-eV0MS7lKxf5IacEpYDjudSFaeU1_BFlxuKNAriwVu07XSgYHwgN0JIdsSmhOJyZef_J8dOPN7NsqqsQP0LUYP6OcjYRzq9qLaM5Qnwsi73WHuBA/s640/Screen+Shot+2016-06-13+at+21.04.49.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">OSS C# project creation in Github over the last 6 years (10 stars or more). Growth slowed since 2012 and there is a marked drop after March 2015 probably due to "vNext". [Source of the data: <a href="https://developer.github.com/v3/" target="_blank">Github</a>]</td></tr>
</tbody></table>
<br />
What is worse is that the data showing with the announcement of #vNext aka #DNX aka #dotnetcore there was a <strong>sharp decline in the new OSS C# projects</strong> - the community is in a limbo situation waiting for the release - people find it pointless to create OSS projects on the current platform and the future platform is so much in flux which is not stable enough for innovation. With the recent changes announced, practically it will take another 12-18 months for it to stabilise (some might argue 6-12 months, fair enough, take what you like). <b><i>For me this is the most alarming of all.</i></b><br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="So_all_is_lost_105"></a>So all is lost?</h2>
All is never lost. You still find good COBOL or FoxPro developers and since it is a niche market, they are usually paid very well. But the danger is losing relevance…<br />
<br />
Practically can Microsoft pull it off? Perhaps. I do not believe it is hopeless, I feel a radical change by taking the steps below, Microsoft could materially reverse the decay:<br />
<ol>
<li>
Your best community brains in the Distributed Computing and Machine Learning are in the F# community, they have already built many OSS projects on both - sadly remaining obscure and used by only few. <strong>Support and promote F# not just as a first class language but as THE preferred language of .NET stack</strong> (and by the way, wherever I said .NET stack, I meant C# and VB). Ask everyone to gradually move. I don’t know why you have not done it. I think someone somewhere in Redmond does not like it and he/she is your biggest enemy.</li>
<li>
Open Source good part of distributed services of Azure. Let the community help you to improve it. Believe me, you are behind the state of the art, frankly no one will look to copy it. Someone will copy from Azure Table Storage and not Cassandra?!<br />
</li>
<li>
Stop promoting deployment to Azure from Visual Studio with a click of a button making Distributed Computing looking trivial. Tell them the truth, tell them it is hard, tell them so few do succeed hence they need to go back and study, and forever forget about one-button click stuff. You are not doing a favour to them nor to yourself. No one should be acknowledged to deploy anything in distributed fashion without sound knowledge of Distributed Computing. </li>
</ol>
<br />
<h2>
Last word</h2>
So when I am asked about whether I am optimistic about the future of .NET or on the progress of dotnetcore, I usually keep silent: we seem to be missing the point on where we need to go with .NET - a paradigm shift has been ignored by our ecosystem. True dotnetcore will be released on 27th but after all, it might not matter as much as we so much care about. One of the reasons we are losing to other stacks is that we are losing our relevance. We do not have all the time in the world. <i>Time is short...</i><br />
<br />
<h2>
<i>Appendix</i></h2>
<u><b><i>Github Data</i></b></u><br />
<i><br /></i><i>Gathering the data from github is possible but due to search results being limited to 1000to rate-limiting, it takes a while to process. The best approach I found was to list repos by update date and keep moving up. I used a python script to gather the data.</i><br />
<i><br /></i>
<i>It is sensible to use the number of stars as the bar for the quality and importance of Github projects. But choosing the threshold is not easy and also there is usually a lag between creation of a project and it to gain popularity. That is why the threshold has been chosen very low. But if you think the drop in creation of C# projects in Github was due to this lag, think again. Here is the chart of all C# projects regardless of their stars (0 stars and more):</i><br />
<i><br /></i>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjR4Pp3ub1jlY0zhz3zqlecbS0mag8iSuWnMvjCRSN3zMGqHgmWMlOPGP-2H6PKuLcNbOLWv43D75nnEWmOfHQBG1jJoPmYT5katkfFsIZmZyXfcExqlvIQC5JeQ7sDZUCnDf2G-iEC2JyN/s1600/Screen+Shot+2016-06-13+at+21.18.45.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><i><img border="0" height="316" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjR4Pp3ub1jlY0zhz3zqlecbS0mag8iSuWnMvjCRSN3zMGqHgmWMlOPGP-2H6PKuLcNbOLWv43D75nnEWmOfHQBG1jJoPmYT5katkfFsIZmZyXfcExqlvIQC5JeQ7sDZUCnDf2G-iEC2JyN/s640/Screen+Shot+2016-06-13+at+21.18.45.png" width="640" /></i></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><i>All C# projects in github (0 stars and more) - marked drop in early 2015 and beyond</i></td></tr>
</tbody></table>
<i><br /></i>
<i>F# showing healthy growth but the number of projects and stars are much less than that of C#. Hence here we look at the projects with 3 stars and more:</i><br />
<i><br /></i>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidnOjxj4tphXhaUWojum0cAkOgpAS3r9Nm2ttJKU1VYxX3riGcUJ2Feh92YbZT4XxYudMnzAd6cQELfkXti0C3lto5tsqDSgrxZuQ-5foeaNPwtzloyRc8ANC6Df6CpMtKWT00hzIqRu0b/s1600/Screen+Shot+2016-06-13+at+21.34.12.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="312" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidnOjxj4tphXhaUWojum0cAkOgpAS3r9Nm2ttJKU1VYxX3riGcUJ2Feh92YbZT4XxYudMnzAd6cQELfkXti0C3lto5tsqDSgrxZuQ-5foeaNPwtzloyRc8ANC6Df6CpMtKWT00hzIqRu0b/s640/Screen+Shot+2016-06-13+at+21.34.12.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">OSS F# projects in Github - 3 stars or more</td></tr>
</tbody></table>
<i>Projects with 0 stars and more (possible showing people starting picking up and playing with it) is looking very healthy:</i><br />
<i><br /></i>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiz89BnfhgXMj4k3bc4wTceYqCmVk-zrJ8nw-At1RovD83AoWywPClczRukBdYyO-SrhNyzMaEfOCyW0ix2CADC2f0kMX5jOReZS8d-7nXQn0Le0EXueQMm7QPjoZUnuGrp7JLJ4Lj-EDAJ/s1600/Screen+Shot+2016-06-13+at+21.57.23.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="326" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiz89BnfhgXMj4k3bc4wTceYqCmVk-zrJ8nw-At1RovD83AoWywPClczRukBdYyO-SrhNyzMaEfOCyW0ix2CADC2f0kMX5jOReZS8d-7nXQn0Le0EXueQMm7QPjoZUnuGrp7JLJ4Lj-EDAJ/s640/Screen+Shot+2016-06-13+at+21.57.23.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">All F# projects regardless of stars - steady rise.</td></tr>
</tbody></table>
<i><br /></i>
<i><br /></i>
<i>Data is available for download: C# <a href="https://drive.google.com/file/d/0By4PF7Jis9FzaVR5X21yUlR0ZjA/view?usp=sharing" target="_blank">here</a> and F# <a href="https://drive.google.com/file/d/0By4PF7Jis9Fzazd1MlNGWGJtQ0U/view?usp=sharing" target="_blank">here</a>. </i><br />
<i><br /></i>
<u><b><i>My previous predictions</i></b></u><br />
<i><br /></i>
<i>This is actually my second post of this nature. I <a href="http://byterot.blogspot.co.uk/2013/12/thank-you-microsoft-and-so-long.html" target="_blank">wrote one 2.5 years ago</a>, raising alarm bells for the lack of innovation in .NET and predicting 4 things that would happen in 5 years (2.5 years from now):</i><br />
<ol>
<li><i>All Data problems will be Big Data problems</i></li>
<li><i>Any server-side code that cannot be horizontally scaled is gonna die</i></li>
<li><i>Data locality will still be an issue so technologies closer to data will prevail</i></li>
<li><i>We need 10x or 100x more data scientists and AI specialists</i></li>
</ol>
<i>Judge for yourself...</i><br />
<i><br /></i>
<i><br /></i><u><b><i>Deleted section</i></b></u><br />
<br />
<i>For the sake of brevity, I had to delete this section but this puts in context how we have many more hyperscale companies:</i><br />
<i><br /></i>
<i>"In the 2000s, not many had the problem of scale. We had Google, Yahoo and Amazon, and later Facebook and Twitter. These companies had to solve serious computing problems in terms of scalability and availability that on one hand lead to the <b>Big Data</b> innovations and on the other hand made <b>Grid Computing</b> more accessible.</i><br />
<i><br /></i>
<i>By commoditising the hardware, the Cloud computing allowed companies to experiment with the scale problems and innovate for achieving high availability. The results have been completely re-platformed enterprises (such as Netflix) and emergence of a new breed of hyperscale startups such as LinkedIn, Spotify, Airbnb, Uber, Gilt and Etsy. Rise of companies building software to solve problems related to these architectures such as Hashicorp, Docker, Mesosphere, etc has added another dimension to all this.</i><br />
<i><br /></i>
<i>And last but not least, is the importance of close relationship between academia and the industry which seems to be happening after a long (and sad) hiatus. This has lead many academy lecturers acting as Chief Scientists, etc to influence the science underlying the disruptive changes.</i><br />
<i><br /></i>
<i>There was a paradigm shift here. Did you see it?"</i><br />
<i><br /></i>aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com36tag:blogger.com,1999:blog-2889416825250254881.post-5474863226752093932016-05-13T19:02:00.000+01:002016-05-14T13:34:04.904+01:00XML or JSON, and that is not the question<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript"></script>
<script charset="utf-8" src="https://platform.twitter.com/widgets.js"></script>
So in last couple of days, our .NET community has showed some strong reactions to the announcements in the <a href="http://asp.net/">ASP.NET</a> team stand-up. While Ruby and more recently node community are known to endless dramas on arguably petty issues, it felt that .NET community was also capable of throwing tantrums. For those who are outside .NET community or have not caught up with the news, .NET/ASP.NET team have <a href="https://blogs.msdn.microsoft.com/webdev/2016/05/11/notes-from-the-asp-net-community-standup-may-10-2016/" target="_blank">decided</a> to revert the<i> project.json</i> (JSON) in #DotNetCore back to <i>*.csproj/*.vbproj </i>(XML)<i> </i>and resurrect msbuild. So was it petty in the end?<br />
<br />
<strong>Some</strong> believed it was: they argued all that was changed was the format of the <em>project</em> file and the drama associated with it was excessive. They also pointed out that all the goodness of <code>project.json</code> would be ported to the familiar yet different <code>*.csproj</code>. I call this group the <em>loyalists</em>:<br />
<blockquote class="twitter-tweet" data-conversation="none" data-lang="en">
<div dir="ltr" lang="en">
<a href="https://twitter.com/palantirza">@palantirza</a> <a href="https://twitter.com/terrajobst">@terrajobst</a> <a href="https://twitter.com/tourismgeek">@tourismgeek</a> <a href="https://twitter.com/aliostad">@aliostad</a> I'm amazed that you picked a technology based on the format of the project file.</div>
— Isaac Abraham (@isaac_abraham) <a href="https://twitter.com/isaac_abraham/status/730661242172510208">May 12, 2016</a></blockquote>
<br />
On the other hand, <strong>some</strong> were upset by the return of the <code>msbuild</code> to the story of .NET development. This portion of the community were arguing that +15-year-old <code>msbuild</code> does not have a place in the modern development. They have been celebrating death of this technology not knowing it was never really dead - I call them <em>msbuild-antagnoists</em>. The first group (loyalists), on the other hand, were flagging that the msbuild would be improved and the experience would be modernised.<br />
<blockquote class="twitter-tweet" data-lang="en">
<div dir="ltr" lang="en">
That PR replacing MSBuild with Make doesn’t seem so foolish now does it?</div>
— Colin Scott (@AbstractCode) <a href="https://twitter.com/AbstractCode/status/730277883193647105">May 11, 2016</a></blockquote>
<br />
Now there were another <strong>group</strong> of people were frustrated that this decision had been made despite the community feedback and solely based on the feedback of “some customers” behind the closed doors. I call them <em>OSS-apologetics</em> and their main issue was the seemingly lack of weight of the community feedback when it comes to the internal decisions that Microsoft takes as a commercial enterprise - especially in the light of the fact that <code>project.json</code> was announced almost 2 years ago and it was very late to change it.<br />
<blockquote class="twitter-tweet" data-conversation="none" data-lang="en">
<div dir="ltr" lang="en">
<a href="https://twitter.com/terrajobst">@terrajobst</a> <a href="https://twitter.com/tourismgeek">@tourismgeek</a> <a href="https://twitter.com/isaac_abraham">@isaac_abraham</a> <a href="https://twitter.com/aliostad">@aliostad</a> And there it is: yeah, it's not REALLY OSS, and the community doesn't matter.</div>
— Palantir (@palantirza) <a href="https://twitter.com/palantirza/status/730602463569547264">May 12, 2016</a></blockquote>
<br />
Now there were yet <strong>another</strong> group that had invested time and effort (==money?) in building projects and tooling (some of which commercial) and they felt that the rug has been pulled from underneath them and all those hours gone to waste - for the lack of a better phrase I call them <em>loss-bearers</em>. And they were even more upset to see that their loss was accounted as a learning process:<br />
<blockquote class="twitter-tweet" data-lang="en">
<div dir="ltr" lang="en">
I always tell developers not to worry when their code gets thrown away. Learning is the most important thing.</div>
— David Fowler (@davidfowl) <a href="https://twitter.com/davidfowl/status/730239218639724544">May 11, 2016</a></blockquote>
Obviously there is not a great answer for them but it is usually said that it is a very minor part of the whole community who have been living on the bleeding edge and knew it could be coming any minute, as mentioned on the stand-up:<br />
<br />
<iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/P9HqMZviaMg?t=27m" width="560"></iframe>
<br />
<h2>
</h2>
<h2>
<a href="https://www.blogger.com/null" id="Where_do_I_stand_16"></a>Where do I stand?</h2>
I stand somewhere in between. Cannot quite agree with the <em>loyalists</em> since it is not just the question of format. On the other hand, I do not bear any losses since I had decided long time ago that I will skip the betas and pick it up when the train of changes slows down - something not yet in sight.<br />
<br />
But I do not think any of the above captures the essence of what has been happening recently. I am on the belief that this decision along with the previous disrupting ones have been <strong>important and shrewd business decisions to save the day and contain losses for Mircosoft as a commercial platform - and no one can blame Microsoft for doing that.</strong><br />
<br />
I had warned times and times again that the huge amount of change in the API and tooling and no easy migration path will result in dividing the community into a niche progressive #DotNetCore minority and the mainstream commercial majority who would stay on .NET Fx and need years (not months) to move on to #DotNetCore - <em>if at all</em>. And this potentially will create a <strong>Python-vs-3-like</strong> divide in the community.<br />
<blockquote class="twitter-tweet" data-conversation="none" data-lang="en">
<div dir="ltr" lang="en">
<a href="https://twitter.com/jeremydmiller">@jeremydmiller</a> <a href="https://twitter.com/Cranialstrain">@Cranialstrain</a> <a href="https://twitter.com/demisbellot">@demisbellot</a> fragmentation (some staying with 4.5 for long time) and then demise of .Net as we know it.</div>
— Ali Kheyrollahi (@aliostad) <a href="https://twitter.com/aliostad/status/625791355999141888">July 27, 2015</a></blockquote>
<br />
The cross from the old .NET to the new #DotNetCore (seemingly similar on the surface yet wildly different at heart) would not be dissimilar to the cross between VB6 to .NET. And what makes it worse is that unlike then, there are many viable alternatives OSS stacks (back then there was only Java and C/C++). This could have meant that the mainstream majority might in fact decide to try an altogether different platform.<br />
<br />
So Microsoft as a business entity had to step in and albeit late, fix the few yet key mistakes made at the start and alongside the project during the last 2 years:<br />
<ul>
<li><a href="http://asp.net/">ASP.NET</a> team to make platform/language decisions and implement features with clever tricks rather the .NET Fx baking such features in the framework itself. An example was Assembly Neutral Interfaces.</li>
<li>Ignoring the importance upgrade path for the existing projects and customers</li>
<li>Inconsistent, confusing and ever changing layering of the stack</li>
<li>Poor and conflicting scheduling messages</li>
<li>Using Silverlight’s CoreCLR for <a href="http://asp.net/">ASP.NET</a> resulting in dichotomy of the runtime, something that as far as I know has no parallel in any other language/platform. In the most recent slides I do not see CoreCLR being mentioned anymore yet it might be there. If it is, it will stay a technical debt to be paid later.</li>
</ul>
All in all it has been a rough ride both for the drivers and the passengers of this journey but I feel now the clarity and cohesion is back and long-standing issues have been addressed now.<br />
<h2>
<a href="https://www.blogger.com/null" id="Where_could_I_be_wrong_37"></a>Where could I be wrong?</h2>
My argument naturally brings these counterarguments:<br />
<ul>
<li>Perhaps had <a href="http://asp.net/">ASP.NET</a> team not pushed the envelope this far by single-handedly crusading to bring modern ideas and courageous undertakings such as cross-platform, we would be having .NET 5 now instead of #DotNetCore.</li>
<li>By carrying baggage from the past (msbuild), Microsoft is extending the lifespan its stacks which in the short term will be beneficial to the corporate but since it is not a clean break, in the long term results in dispersion of the community and a need for another <em>redeux</em>.</li>
</ul>
Hard to answer these arguments since one is a hypothetical situation and the other looks well into the uncertainty of the future. I will leave it to the readers to weigh the arguments.<br />
<h2>
<a href="https://www.blogger.com/null" id="Last_word_47"></a>Last word</h2>
It is not possible to hide that none of this has been without casualties. Some confidence lost, community at times upset and overall has not been all rosy as far as the Microsoft’s image in its OSS ventures goes. I did mention old and new Microsoft coming head-to-head, which might not be correct but as Satya Nadella said, <em>culture does not change overnight</em>.aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com2tag:blogger.com,1999:blog-2889416825250254881.post-31618807274191723682016-02-08T19:46:00.000+00:002016-03-02T16:38:35.539+00:00Future of Information: the Good, Bad and Ugly of it<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript"></script>
We are certainly at the cusp of a big revolution in the human civilisation - caused by the Information Technology and Machine Intelligence. There are golden moments in the history that have fundamentally changed us: late dark ages for the Astronomy, early renaissance for Physics, 1700-1800s for the Chemistry, late 1800s for the Microbiology, 1950s for the transistors… and the periods get more and more compressed. It looks like a labyrinth where it gets narrower when you get closer to its centre.<br />
<br />
Without speculating on what the centre could look like, and considering this could be still a flat line of constant progress, we need to start thinking what the future could look like - not because it is fun, but because an action could be warranted now. There is no shortage of speculation or commentary, one man can dream, and fathom a far future which might or might not be close to the distant reality. And that is not the point. The point is, as I will outline below, it could be getting late to do what we need to DO. Yes, this is not a sci-fi story…<br />
<br />
On one hand, there is nothing new under the sun, and the cycle of change has always been with the mankind since the beginning. We always had the reluctant establishment fighting with the wind of change promoted by the new generation.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYG82MRxMNsfVdJGl6bR22ueCmo0uiij5QrRYowYFTYeaRJ5ZlteQrp0LFb-7GpKSdeQMTzYNPRKSU_VcBmFsYGlj9dkOKzhoxqHeoDpcMSvO7t8t0GWzPgTjx-ckuKeRAEKGZXXJ0TvAD/s1600/PPTMASSuseInventionsLogPRINT.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="508" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiYG82MRxMNsfVdJGl6bR22ueCmo0uiij5QrRYowYFTYeaRJ5ZlteQrp0LFb-7GpKSdeQMTzYNPRKSU_VcBmFsYGlj9dkOKzhoxqHeoDpcMSvO7t8t0GWzPgTjx-ckuKeRAEKGZXXJ0TvAD/s640/PPTMASSuseInventionsLogPRINT.jpg" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 1 - Accelerating change [Source: Wikipedia]</td></tr>
</tbody></table>
<br />
On the other hand, <b>this is the first time in the history that the cycle of change has been reduced to less than a generation</b> (a generation is normally considered 20-25 years). You see, the politicians of the past had time to grow up with the changes, feel the needs, brew new ideas and come up with the solutions. Likewise, the nations have had the time to assimilate and react to the changes in terms of aspirations, occupation and direction as the new changes would not be fully in effect during the one person’s lifetime. What about now? Only a decade ago (pre-iPhone) looks like a century backwards. The <a href="https://en.wikipedia.org/wiki/Accelerating_change">cycle of change</a> already looks like to be around 5-10 years [see Figure 1]. And look at our politicians: it is not a coincidence someone like Trump can capture the imagination of a nation in the lack of visionary contenders. The politics as we know it has reached the end of life - IMHO due to lack of serious left-wing ideas - but that is not the topic of this post. The point I am trying to make is politicians are no longer able to propose but the most trivial changes since their view of the world is limited by their lack of understanding of a whole new virtual world being created alongside this physical world whose rules do not exist in any books.<br />
<br />
And it is not just the politics that is dropping far behind. Economics in the face of fast cycle of change will be different too. First of all, today’s financial malaise experienced in many developed countries might still be around for years to come. In an age of Keynesian economy and central intervention characterised by low inflation, low growth and abundance of money printed by central banks, it seems <strong>the banks are no longer relevant</strong>. Current economy sometimes referred to as the Japanisation, which was spotted back in 2011 and 5 year on feels no different. And it is no coincidence that an <a href="https://www.imf.org/external/pubs/ft/wp/2010/wp10294.pdf">IMF report</a> finds decreasing efficiency of capital in Japanese Economy - that can be applied elsewhere. Looking at the value of bank stocks provides the glaring fact that they are remnants of institutions from the past. True, they are probably still financing mine and your mortgage but their importance as the cornerstone of development during the previous centuries is gone. Why? <strong>Because the importance of capital in a world where there is so much of it around without finding a suitable investment is overrated</strong>. With <a href="http://www.marketwatch.com/investing/bond/TMUBMUSD10Y">10-yr US Bonds</a> at around 1.8% and yield on <a href="http://www.bloomberg.com/quote/GDBR2:IND">2-yr German bund</a> at -0.5% (!), an investment with 2% annual return is a bargain. In fact today’s banking is characterised by piling up losses year on year (for example <a href="http://www.cnbc.com/2016/01/28/deutsche-bank-post-loss-in-investment-banking.html">this</a> and <a href="http://www.cnbc.com/2016/02/04/credit-suisse-posts-loss-warns-on-volatility.html">this</a>). Looking at the Citigroup or Bank of America’s 10 year chart is another witness to the same decline. In an environment when money is cheap (Because of ZIRP), it cannot be the main driver in the business, as money (and hence banks) is not the scarce commodity anymore. See? We did not even have to mention bitcoin, blockchain or crowdfunding.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRxx-jiXYBys8CEfbkNpj7bqHzy_84YHH9OeH49ghFOQ1L3MZ4Gggewvoi6t_W9wiiXxqiPc_MgT1SMWsKVBLxoxjcnWp_s2aIEP6Ies6T-9tKpAKFbKh9ZiG3hskRoBSJM0idszCGo2yT/s1600/db.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="284" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRxx-jiXYBys8CEfbkNpj7bqHzy_84YHH9OeH49ghFOQ1L3MZ4Gggewvoi6t_W9wiiXxqiPc_MgT1SMWsKVBLxoxjcnWp_s2aIEP6Ies6T-9tKpAKFbKh9ZiG3hskRoBSJM0idszCGo2yT/s640/db.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 2 - Deutsche Bank Stock since year 2000 [Source: Yahoo Finance]</td></tr>
</tbody></table>
But beyond our myopic view of the economy focused on the current climate, there is a rhetoric looking at it from a different angle and far into the future, seeing the same pattern. In one interesting essay on <a href="https://www.foreignaffairs.com/articles/united-states/2014-06-04/new-world-order">Economics of the future</a>, authors find an ever decreasing role for the capital. While mentioning the importance of the suitable labour (in terms of geek labour force, currently the scarcest resource resulting in companies not growing their full potential) could be helpful, it is evident that capital is no more an issue.<br />
<br />
In essence what all this means is, <strong>if historically the banks as the institutions controlling capital had the upper hand, in the days to come it will be those controlling Information.</strong> The future of our civilisation will be surrounding the conflicts to control the Information, <strong>on one hand by the state, on the other by the institutions and finally by us and NGOs for the privacy</strong>.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="The_Good_17"></a>The Good</h2>
Rate of data acquisition has been described as exponential. This has been mainly with regard to the virtual world and our surroundings but very soon it will be us. From our exact whereabouts to our blood pressure to various hormonal levels and perhaps our emotional status, all will be around very soon. A lot of this is already possible, also known as <a href="http://hanselminutes.com/399/chris-dancy-the-worlds-most-quantified-man-explains-the-quantified-self">quantifying self</a>. But it is only a matter of time for this to be for everyone.<br />
<br />
It is not difficult to think what it can do to promote health and disease prevention. Even now those who suffer from heart arrhythmia carry devices that can defibrillate their heart if a deadly ventricular fibrillation occurs. The blood pressure, sugar level, various hormonal levels, and all sorts of measurable elements can be tracked. Cancerous cells can be detected in blood (and its source identified) well before it could grow and spread. The plaques in the blood vessels will be identified by the micro devices circulating and any serious stenosis can be identified. Clots in the heart or brain vessels (resulting in stroke) can be detected at the time of formation with a device releasing thrombolytic agents immediately alleviating the problem. Going for extra medical diagnosis could be very similar to how our cars are being serviced today: a device gets connected to the myriad of micro devices in your body and full picture of health status will be immediately visible to the medical staff. You could be walking on a road or in a car, witnessing a rare yet horrific accident (would there be accidents?) and the medical team would know whether you would suffer from PTSD and whether you would need certain therapies for it - they would know where you were and whether you witnessed the incident from your various measurements.<br />
<br />
And of course, this is only the medical side. The way we work, entertain ourselves and interact with the outside world will be completely different. It is not very hard to imagine what it will be like: one cheesy way is to just take everything that you do at home and think of adding an automation/scheduling/verbal command to it. From making coffee, to checking information, to entertainment. But I will refrain from limiting your view by my imagination. What it is clear is that the presence and impact of the virtual world will be much more pronounced.<br />
<br />
At the end of the day, it is all about the extra information coupled with machine intelligence.<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="The_Bad_27"></a>The Bad</h2>
This section is not speculating on what it could look like. We can all go and read any of the dystopian books, many to choose from and could be like any or none.<br />
<br />
But instead it is about simple reasoning: taking what we know, projecting the rate of change and looking at what we might get. It is very reasonable to think that machine intelligence will be at a point where it can reason very efficiently with a pretty good rate of success. And on the other hand, it is reasonable to think that there will be many many data points for every person. If we as humans <em>can</em> be represented as <strong>intelligent machines that turn data plus our characters into decisions</strong>, it is not silly to think that if our characters (historical data) known to the machines and the input perceived by us already available via the many many agents present in and around us, it is not unreasonable to think that the systems can estimate your decisions. So when you think of advertising, this gets really frightening since you would know pretty well what the reaction will be if you have enough information. And it is about, how much, how much money do you have to decide on…<br />
<br />
You see, the fight for your disposable income (that part of your income that you can choose how to spend) could not be more fierce: it can make or break companies in the future. The future of advertising and the fight for this disposable income is what makes Eric Schmidt to <a href="http://www.amazon.com/The-New-Digital-Age-Reshaping/dp/0307957136" target="_blank">come out and almost say</a> there won’t be online privacy in the future:<br />
<blockquote class="tr_bq">
"Some governments will consider it too risky to have thousands of anonymous, untraceable and unverified citizens - hidden people - they'll want to know who is associated with each online account... Within search results, information tied to verified online profiles will be ranked higher than content without... even the most fascinating content, if tied to an anonymous profile simply won't be seen because of its excessive low ranking." - The New Digital Age / page 33</blockquote>
And when you see how the top four companies have already moved into media industry, you get it. Your iPhone selects a handful of news items for you to see, Facebook controls your timeline, Amazon is a full-blown media company and Google controls youtube which has <a href="http://www.huffingtonpost.com/entry/youtube-vs-cable_us_55acf44fe4b0d2ded39f5370">overtaken</a> conventional media for the entertainment of the millennials. We must reiterate that none of these companies are by nature evil but when it is to choose between you and their income, it is natural that they will pick the latter. And guess what: they have what the state wants too.<br />
<br />
Let’s revisit banking for a moment to clarify the point. Banks have what politicians need: capital to fund ever more expensive political campaigns. And the state has what the banks need: regulation or rather de-regulation which banks thought will help them prosper because they can enter the stock market’s casino with the high street bank deposits (which ironically has been the source of their losses). And above all, the state catches the banks if they fall, as it did in 2008. ECB uses its various funds (EFSF, ESM, etc) to keep the banks in Greece and Italy (and others, soon Germany?) afloat. And catches the stock market when they fall as it has constantly done by various QE measures, interest rate cuts, printing money, etc. In such a financial milieu, where there are cushions all along the path, there is no real risk anymore leading to irresponsible behaviour by the banks. And the party should never end, no wonder Obama could not move an inch towards bringing back some regulation. Heads of state’s financial institutions come from ex-CEOs of the likes of Goldman Sachs. This alliance of the state and banking has contributed to the growth of inequality (ultimately leading to modern slavery) and no wonder, state is not bothered, the state is made up of politicians in alliance with bankers and the bankers.<br />
<br />
And what does it have to do with the future of information? Exactly the same thing can happen in the future, only with the state and heads of companies owning the information. If capital no longer holds the power and it is the information, then the alliance of state and info bosses will lead to the modern slavery. <b>States control the legislations and information companies own private data and control the media: each one having what the other needs. </b><br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="The_Ugly_41"></a>The Ugly</h2>
Why ugly? Because we are already there, almost. First of all, the states have started gathering and controlling information. NSA is just an example. The states have started <a href="http://www.theguardian.com/world/2014/feb/03/microsoft-facebook-google-yahoo-fisa-surveillance-requests" target="_blank">requesting companies</a> owning the data to provide them. <a href="https://en.wikipedia.org/wiki/Encryption_ban_proposal_in_the_United_Kingdom" target="_blank">Legislations</a> are under way to prevent effective encryption. This could all look harmless when we are busy checking out our twitter and facebook timelines, but I have already started to freak me out: companies are already started thinking and acting in this area. As we visited, Google's Eric Schmidt is portraying a future where anonymity has little value, either you agree or otherwise you need to speak out.<br />
<br />
Going back to the politics, we do not have politicians or lawyers that have a correct understanding of the technology and its implications, and it is not their fault: they were not prepared for it. But soon, very soon, we will have heads of companies turning politicians. Very much like CEOs of the Goldman Sachs, and I do not mean it necessarily in a bad way, why? Because the power will be in the hands of the geeks and by the same token, we need strong oppositions, we need politicians among us to rise to the occasion and lead us safely into the future where we have meaningful legislations protecting our privacy while allowing safe data sharing. Problem is, we had 2500 years or so to think about democracy and government in the physical world (from the Greek and Roman philosophers to now) but we are confronted with a virtual world where the ethics and philosophy are not well-defined and do not quite map to the physical world we live in yet every lawmaker is trying to shoehorn it to the only thing they know about. Enough is enough.<br />
<br />
But where do we start from? My point in this post/essay has been to ask the questions, I do not claim to have the answers. We have not yet explored the problems well enough to come up with the right answers... we need the help of think tanks, many of which I see rising amongst us.<br />
<br />
We are surrounded by the questions whose answers (like all other aspects of our industry) tangled with so much of our personal opinions. When it comes to the court of law what doesn't matter is your or my opinion. Is Edwards Snowden a hero or a traitor? Was Julian Assange a visionary or an opportunist? What is ethical hacking, and how is it different from unethical, in fact could hacking ever be legitimate? Is Anonymous a bunch of criminals or a collection of selfless vigilantes working for the betterment of the virtual world in lack of a legal alternative? What is the right to privacy, and is there a right to be anonymous?<br />
<br />
Needless to say, there could be some quick wins. I think defining privacy and data sharing is one of the key elements. One improvement could be turning small-print legal mumbo jumbo of the terms and conditions to bullet-wise fact sheets. Similar to “Key Fact Sheet” for mortgages where the APR and various fees are clearly defined, we can enforce a privacy fact sheet where the answers to questions such as “My anonymised/non-anonymised data might/might not be sold”, “I can ask for my records can be physically erased”, “My personal information can/cannot be given to third parties”, etc are clearly defined for non-technical consumers, as well as most of the rest of us who rarely read the terms and conditions.<br />
<br />
Whatever the solutions, we need to start… now! And it could be already late.aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com3tag:blogger.com,1999:blog-2889416825250254881.post-41291849765969210192015-11-24T22:40:00.000+00:002015-11-26T11:00:28.970+00:00Interactive DataViz: Rock albums by the genre since 1960<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript"></script>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCfUrCCKzknizU2cDBsZL8cmkT6gU2B_9-Az7CUVOl7dRlCarkpAt5JbAUX9CoI_d9y9aeta48y46o0BOk5GTsZTVtwSQzGxKxYhLcBcEviqzk-pNC6SYlIfg24A5x2of04fb1Qn2bZKaJ/s1600/Screen+Shot+2015-11-24+at+22.09.36.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="194" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCfUrCCKzknizU2cDBsZL8cmkT6gU2B_9-Az7CUVOl7dRlCarkpAt5JbAUX9CoI_d9y9aeta48y46o0BOk5GTsZTVtwSQzGxKxYhLcBcEviqzk-pNC6SYlIfg24A5x2of04fb1Qn2bZKaJ/s640/Screen+Shot+2015-11-24+at+22.09.36.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Interactive DataViz here: <a href="http://wiki-rock.azurewebsites.net/top10-album-genres.html" target="_blank">http://wiki-rock.azurewebsites.net/top10-album-genres.html</a></td></tr>
</tbody></table>
Last week I presented a <a href="https://buildstuff15lithuania.sched.org/event/4P1q/ali-kheyrollahi-aliostad-from-power-chords-to-power-of-modelhellip" target="_blank">talk</a> in #BuildStuffLT titled <a href="https://buildstuff15lithuania.sched.org/event/4P1q/ali-kheyrollahi-aliostad-from-power-chords-to-power-of-modelhellip" target="_blank">“From Power Chords to the Power of Models”</a> which was a study of the Rock Music by the way of Data Mining, Mathematical Modelling and also Machine Learning. It is such a fun subject to explore, especially for me that Rock Music has been one of my passions since I was a kid.<br />
<h1>
<a href="https://www.blogger.com/null" id="DataViz_Rock_albums_by_the_genre_since_1960_0"></a></h1>
The slides from the talk is <a href="http://www.slideshare.net/AliKheyrollahi/power-chords-and-the-power-of-models" target="_blank">available</a> and the videos will be available soon (although my performance during the talk was suboptimal due to lack of sleep, a problem which seems to be shared by many at the event). <a href="http://buildstuff.lt/" target="_blank">BuildStuffLT</a> is a great event, highly recommended if you have never been to. It is a software conference with known speakers such as <a href="https://twitter.com/mfeathers" target="_blank">Michael Feathers</a>, <a href="https://twitter.com/randyshoup" target="_blank">Randy Shoup</a>, <a href="https://twitter.com/venkat_s" target="_blank">Venkat Subramaniam</a>, <a href="https://twitter.com/hintjens" target="_blank">Pieter Hintjens</a> and this year was the host of <a href="https://twitter.com/conways_law" target="_blank">Melvin Conway</a> (yeah, the visionary who came up with Conway’s law in 1968) with really <strong>mind stimulating</strong> talks. You also get a variety of other speakers with very interesting talks.<br />
<br />
I will be presenting my talk in <a href="https://www.codemash.org/" target="_blank">CodeMash 2016</a> so I cannot share all of the material yet but I think this interactive DataViz alone is many many slides in a single representation. I can see myself spending hours just looking at the trends and artist names and their album covers - yeah this is how much I love Rock Music and its history - but even for others this could be fun and also help you discover some new to listen to.<br />
<br />
<h1>
<a href="https://www.blogger.com/null" id="DataViz_8"></a>DataViz</h1>
This is an interactive percentage-based stacked area chart of top 10 genres in a year, since 1960, where Rock Music as we know it started to appear. That is a mouthful but basically for every year, top 10 genres selected so the dataset contains only those Rock (or related) genres that at some point were among the top 10 genres. You can access it <a href="http://wiki-rock.azurewebsites.net/top10-album-genres.html" target="_blank">here</a> or simply clone GitHub repo (see below) and host your own.<br />
<br />
<br />
The data was collected from Wikipedia by capturing Rock Albums and then processing their genres, finding top 10 in every year and then presenting in a chart - I am using Highcharts which is really powerful and simple to use and has a non-commercial license too. The data itself I have shared so you can run your own DataViz if you want to. The license for the data is of course Wikipedia’s, which covers these purposes.<br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGp7hUpU4YNW9C_lfy_KL6W3rF_X0ySPYrHOkKFGLHulOjmDQrtRogTLtwFZCoUjR4-UAUSv2yDAchcgt9l9VMr9X09XNF10MjikepSWJnH8VbfyDNlkwA3JJzoyBK7uP3hpUPeDkI7S0B/s1600/Screen+Shot+2015-11-24+at+22.08.03.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="446" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGp7hUpU4YNW9C_lfy_KL6W3rF_X0ySPYrHOkKFGLHulOjmDQrtRogTLtwFZCoUjR4-UAUSv2yDAchcgt9l9VMr9X09XNF10MjikepSWJnH8VbfyDNlkwA3JJzoyBK7uP3hpUPeDkI7S0B/s640/Screen+Shot+2015-11-24+at+22.08.03.png" width="640" /></a><br />
<br />
I highly recommend you start with the Visualisation with “All Unselected” (Figure 2) and then select a genre and visualise its rise and fall in the history.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfIR9ojlBCw1eM4eY_SGL5f5KoQRyRgJQ4akLRrank_yQycPqcP0X244CgPiJ5yxKvxqtZI0DkQf2A4TS6gWQwsBJRKZ8tWdt77xkoZAVosvmhGCALcXjTrNZDGj8xHYKGr2xLpnIwXmRA/s1600/Screen+Shot+2015-11-24+at+22.05.10.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="612" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfIR9ojlBCw1eM4eY_SGL5f5KoQRyRgJQ4akLRrank_yQycPqcP0X244CgPiJ5yxKvxqtZI0DkQf2A4TS6gWQwsBJRKZ8tWdt77xkoZAVosvmhGCALcXjTrNZDGj8xHYKGr2xLpnIwXmRA/s640/Screen+Shot+2015-11-24+at+22.05.10.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Then you can click on a point (year/genre) to list all albums of that genre for that year (Figure 3). Please note that even when the chart shows 0%, there could be some albums for that genre - which are from a year which that genre was not among the top 10 genres.<br />
<br />
<h1>
Looking at the data in a different way</h1>
<div>
Here is the 50 years of Rock (starting from 1965) with the selected albums:</div>
<div>
<br /></div>
<iframe allowfullscreen="" frameborder="0" height="360" src="https://www.youtube.com/embed/DopngUSscAI" width="640"></iframe>
<br />
<div>
<br /></div>
<h2>
<a href="https://www.blogger.com/null" id="Things_to_bear_in_mind_23"></a>Things to bear in mind</h2>
<ul>
<li>The data has been captured by capturing all albums for all links found in documents that traversed from the <a href="https://en.wikipedia.org/wiki/List_of_rock_genres" target="_blank">list of rock genres</a> then to the artist pages. As far as I know, the list includes all albums by the major (and minor) rock artists - according to Wikipedia. If you find a missing album (or artist), please let me know.</li>
<li>Every album will contribute all its genres to the list. This means if it has genres “Blues Rock” and “Rock”, then it will be counted once for each of the its genres and you can find it if you look at both Rock or Blues Rock genres.</li>
<li>Data has some oddities, sometimes an album occurs more than once, mainly due to nuances of data in Wikipedia, there are multiple entries (URLs) for the same document, etc. Data has already been cleansed through many processes and these oddities do not materially change the results. In the future however, there are things that can be done remove these remaining oddities.</li>
<li>Again, it is highly recommended that you click the “Unselect All” button and click on the genres that you are interested one by one and explore the name of the albums.</li>
<li>Clicking “Select All” or “Unselect All” takes a bit too much time. I am sure it has an easy solution (turn rendering off when changing the state) but have not been able to find it. Expect your PRs!</li>
<li>There are some genres in the list which are not really Rock genres. These genres would have been mentioned alongside a rock genre in the album cover or had been a not-so-much-rock album by an otherwise Rock artist.</li>
</ul>
<h1>
<a href="https://www.blogger.com/null" id="Code_and_Data_31"></a>Code and Data</h1>
All code and data published in <a href="https://github.com/aliostad/wiki-rock" target="_blank">GitHub</a>. Code uses Highchartsjs, knockoutjs and foundations UI framework. Have fun!aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com8tag:blogger.com,1999:blog-2889416825250254881.post-63891499423165623812015-09-19T11:45:00.001+01:002015-09-19T13:42:41.003+01:00The Rule of "The Most Accessible" and how it can make you feel better <script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript"></script>
I remember when I was a kid, I watched a documentary on <a href="https://www.youtube.com/watch?v=UTX7Cxq8aGc" target="_blank">how to catch a monkey</a>. Basically you dig a hole in a tree, big enough for a stretched monkey hand to go in but not too big that fisted hand can get out and sit and watch.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://www.tarekcoaching.com/tcwp/wp-content/uploads/2009/11/londonlifecoach-monkey.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://www.tarekcoaching.com/tcwp/wp-content/uploads/2009/11/londonlifecoach-monkey.png" height="282" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Source: http://www.tarekcoaching.com/blog/dont-fall-in-the-monkey-trap/</td></tr>
</tbody></table>
<br />
<br />
Apart from holes, buttons and levers (things that can be pushed) are concepts very easy for animals to learn. Without getting too Freudian, furrowing and protrusions (holes and buttons) are one of the first concepts we learn.<br />
<br />
This is nice when dealing with animals. On the other hand, it can be dangerous - especially for kids. A meat mincer machine has exactly these two: a hole and a button. Without referring to the disturbing images of its victims on internet, it is imaginable what can happen - many children sadly lose their fingers or hands this way. Safety of these machines are much better now but I grew up with a kid who was left with pretty much a claw of his right hand after the accident.<br />
<br />
Now, the point is:<b> in confrontation with entities that we encounter for the first time or do not have enough appreciation of their complexity, we approach from the most accessible angle we can understand.</b> If this phenomenon did not have a name (and it is not BikeShedding, that is different), now it has: <b>Rule of The Most Accessible TMA</b>. The problem is, as the examples tried to illustrate, it is <i>dangerous</i>. Or it can be a sign of <i>mediocrity</i>.<br />
<br />
<div style="text-align: center;">
* * *</div>
<br />
Now what does it have to do with our geeky world?<br />
<br />
Have you noticed that in some projects, critical bugs go unnoticed but there are half a dozen bugs raised for the layout being one pixel out? Have you written a document and the main feedback you got was the font being used? Have you attended a technical review meeting which you get a comment on the icons of your diagram? Have you seen a performance test project that focuses only on testing the API because it is the easiest to test? Have you witnessed a code review that results in a puny little comments on namings only?<br />
<br />
When I say it can be a sign of mediocrity, I think you know now what I am talking about. I cannot describe my frustration when we replace the most critical with the most accessible. And I bet you feel the same.<br />
<br />
<h3>
Resist TMA in you</h3>
You know how bad it is when someone falls into the TMA trap? Then you shouldn't. Take your time, and approach from the angle which matters most. If you cannot comment anything worthwhile then don't. Don't be a hypocrite.<br />
<br />
Ask for more time, break down the complexity and get a sense of the critical parts. And then comment.<br />
<br />
<h3>
Fight TMA in others</h3>
Someone does TMA to you? Show it to their face. Remind them that we need to focus on the critical aspects first. Ask them not to waste time on petty aspects of the problem.<br />
<br />
<h3>
If it cannot be fight, laugh inside</h3>
And I guess we all have cases where the person committing TMA is a manager high up that fighting TMA can have unpleasant consequences. Then you know what? Just remember face of the monkey cartoon above and laugh inside. <b>It will certainly make you feel better :)</b><br />
<br />
<br />
<br />aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com5tag:blogger.com,1999:blog-2889416825250254881.post-79681448065279089642015-08-27T22:22:00.000+01:002015-09-12T16:11:25.104+01:00No-nonsense Azure Monitoring in 20 Minutes (maybe 21) using ECK stack<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript"></script>
Azure platform has been there for 6 years now and going from strength to strength. With the release of many different services and options (and sometimes too many services), it is now difficult to think of a technology tool or paradigm which is not “there” - albeit perhaps not exactly in the shape that you had wished for. Having said that, monitoring - even to the admission of some of the product teams - has not been the strongest of the features in Azure. Sadly, <b>when building cloud systems, </b><strong>monitoring/telemetry is not a feature: it is a must</strong>.<br />
<br />
I do not want to rant for hours why and how a product that is mainly built for external customers is different from the internal one which on its strength and success gets packaged up and released (as is the case with AWS) but a consistent and working telemetry option in Azure is pretty much <em>missing - </em>there are bits and pieces here and there but not a consolidated story. I am <em>informed</em> that even internal teams within Microsoft had to build their own monitoring solutions (something similar to what I am about to describe further down). And as the last piece of rant, let me tell you, whoever designed this chart with this <i>puny level of </i><em>data resolution</em> must be punished with the most severe <b>penalty</b> ever known to man: actually using it - to investigate a production issue.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiflGHm6Uto6Gu5EsEsQg7ho2h7SVK0nUxjYttxXcyPej98gm1VxeXenZl0zJrkqrB5yYQEWz4y5t22isBfF9dhBpoMlr49MTdrMDvL4p4Ah-lg3QFhHOUpoc-npLW1EKpQL2vMUCazJZFZ/s1600/Screen+Shot+2015-08-26+at+22.51.14.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="185" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiflGHm6Uto6Gu5EsEsQg7ho2h7SVK0nUxjYttxXcyPej98gm1VxeXenZl0zJrkqrB5yYQEWz4y5t22isBfF9dhBpoMlr49MTdrMDvL4p4Ah-lg3QFhHOUpoc-npLW1EKpQL2vMUCazJZFZ/s640/Screen+Shot+2015-08-26+at+22.51.14.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">A 7-day chart, with 14 data points. Whoever designed this UI should be punished with the most severe penalty known to man ... actually using it - to investigate a production issue.</td></tr>
</tbody></table>
<br />
<h2>
<a href="https://www.blogger.com/null" id="What_are_you_on_about_7"></a>What are you on about?</h2>
Well if you have used Azure to deliver any serious solution and then tried to do any sort of support, investigation and root cause analysis, without using one of the paid telemetry solutions (and even with using them), painfully browsing through gigs of data in Table Storage, you would know the pain. Yeah, that's what I am talking about! I know you have been there, me too.<br />
<br />
And here, I am presenting a solution to the telemetry problem that can give you these kinds of sexy charts, very quickly, on top of your existing Azure WAD tables (and other data sources) - tried, tested and working, requiring some setup and very little maintenance.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiHUPF3EwuSvkXGrB8F0ckrO3a8kc7qTVo3mAe9KwAYukXfLUAISPamEJ_KZnUoZ-8nBZxK2rB304tgjBE4ZxuOB0zUInBgXTopBgdMFjmrKiJNDCitL1IgR_yaglXG9d1Npx4hEcyafYO/s1600/Screen+Shot+2015-07-24+at+18.29.58.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiHUPF3EwuSvkXGrB8F0ckrO3a8kc7qTVo3mAe9KwAYukXfLUAISPamEJ_KZnUoZ-8nBZxK2rB304tgjBE4ZxuOB0zUInBgXTopBgdMFjmrKiJNDCitL1IgR_yaglXG9d1Npx4hEcyafYO/s640/Screen+Shot+2015-07-24+at+18.29.58.png" width="640" /></a></div>
<br />
If you are already familiar with ELK (Elasticsearch, LogStash and Kibana) stack, you might be saying you already got that. True. But while LogStash is great and has many groks, it has been very much designed with the Linux mindset: just a daemon running locally on your box/VM, reading your syslog and delivering them over to Elasticsearch. The way Azure works is totally different: the local monitoring agent running on the VM keeps shovelling your data to durable and highly available storages (Table or Blob) - which I quite like. With VMs being essentially ephemeral, it makes a lot master your logging outside boxes and to read the data from those storages. Now, that is all well and good but when you have many instances of the same role (say you have scaled to 10 nodes) writing to the same storage, the data is usually much bigger than what a single process can handle and shoveling needs to be scaled requiring a centralised scheduling.<br />
<br />
<strong>The gist of it, I am offering ECK (Elasticsearch, ConveyorBelt and Kibana), an alternative to LogStash that is Azure friendly (typically runs in Worker Role), out-of-the-box can tap into your existing WAD logs (as well as custom ones) and with a push of a button can be horizontally scaled to N, to handle the load for all your projects - and for your enterprise if you work for one. And it is <a href="https://github.com/aliostad/ConveyorBelt" target="_blank">open source</a>, and can be extended to shovel data from any other sources.</strong><br />
<br />
At core, <a href="https://github.com/aliostad/ConveyorBelt" target="_blank">ConveyorBelt</a> employs a clustering mechanism that can break down the work into chunks (scheduling), keep a pointer to the last scheduled point, pushing data to Elasticsearch in parallel and in batches and gracefully retry the work if fails. It is headless, so any node can fail, be shut down, restarted, added or removed - without affecting integrity of the cluster. All of this, without waking you up at night, and basically after a few days, making you forget it ever existed. In the enterprise I work for, we use just 3 medium instances to power analytics from 70 different production Storage Tables (and blobs).<br />
<br />
<h2>
<a href="https://www.blogger.com/null" id="Basic_Concepts_15"></a>Basic Concepts</h2>
Before you set up your own ConveyorBelt CB, it is better to know a few concepts and facts.<br />
<br />
First of all, there is a <b>one-to-one mapping</b> between an Elasticsearch cluster and a ConveyorBelt <em>cluster</em>. ConveyorBelt has a list of <strong>DiagnosticSources</strong>, typically stored in an Azure Table Storage, which contains all data (and state) pertaining to a <em>source</em>. A source typically is a Table Storage, or a blob folder containing diagnostic data (or other) - but CB is extensible to accept other data stores such as SQL, file or even Elasticsearch itself (yes if you ever wanted to copy data from one ES to another). DiagnosticSource contains connection information for the CB to connect. CB continuously breaks down the work (schedules) for its DiagnosticSources and keeps updating the <em>LastOffset</em>.<br />
<br />
Once the work is broken down to bite size chunks, they are picked up by actors (it internally uses <a href="https://www.blogger.com/!!">BeeHive</a>) and data within each chunk pushed up to your Elasticsearch cluster. There is usually a delay between data captured (something that you typically set in Azure configuration: how often copy data), so you set a <em>Grace Period</em> after which if the data isn't there, it is assumed there won’t be. Your Elasticsearch data will usually be behind realtime by the <em>Grace Period</em>. If you left everything as defaults, Azure copies data every minute which Grace Period of 3-5 minutes is safe. For IIS logs this is usually longer (I use 15-20 minutes).<br />
<br />
The data that is pushed to the Elasticsearch requires:<br />
<ul>
<li>An index name: by default the date in the <code>yyyyMMdd</code> format is used as the index name (but you can provide your own index)</li>
<li>The type name: default is PartitionKey + _ + RowKey (or the one you provide)</li>
<li>Elasticsearch mapping: Elasticsearch equivalent of a schema which defines how to store and index data for a source. These mappings are stored on a URL (a web folder or a public read-only Azure Blob folder) - schema for typical Azure data (WAD logs, WAD Perf data and IIS Logs) already available by default and you just need to copy them to your site or public Blob folder.</li>
</ul>
<h2>
Set up your own monitoring suite</h2>
OK, now time to create our own ConveyorBelt cluster! Basically the CB cluster will shovel the data to a cluster of Elasticsearch. And you would need Kibana to visualise your data. Here I will explain how to set up Elasticsearch and Kibana in a <b>Linux VM box. </b>Further below I am explaining how to do this. But ...<br />
<br />
<b>if you are just testing the waters and want to try CB, you can create a Windows VM, download Elasticsearch and Kibana and run their batch files and then move to setting up CB. </b>But after you have seen it working, come back to the instructions and set it up in a Linux box, its natural habitat.<br />
<b><br /></b>
So setting this up in Windows is just to download the files from the links below, unzip and then running the batch files <span style="font-family: Courier New, Courier, monospace;">elasticsearch.bat</span> and <span style="font-family: Courier New, Courier, monospace;">kibana.bat</span>. Make sure you expose the ports 5601 and 9200 from your VM, by creating endpoints.<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">https://download.elastic.co/kibana/kibana/kibana-4.1.1-windows.zip</span><br />
<span style="font-family: Courier New, Courier, monospace;">https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.1.zip</span><br />
<h2>
</h2>
<h2>
Set up ConveyorBelt</h2>
As discussed above, ConveyorBelt is typically deployed as an Azure Cloud Service. In order to do that, you need to clone Github repo, build and then deploy it with your own credentials and settings - and all of this should be pretty easy. Once deployed, you would need to define various diagnostic source and point them to your ElasticSearch and then just relax and let CB do its work. So we will look at the steps now.<br />
<br />
<h3>
Clone and build ConveyorBelt repo</h3>
You can use command line:<br />
<pre><code><span class="hljs-built_in">git clone https://github.com/aliostad/ConveyorBelt.git</span></code></pre>
Or use your tool of choice to clone the repo. Then open administrative PowerShell window, move to the build folder and execute <span style="font-family: Courier New, Courier, monospace;">.\build.ps1</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://www.blogger.com/blogger.g?blogID=2889416825250254881" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9uaGXeP7GffaXEjKhxABowiPj3pgVpfWghiSA1iNgWPaxoI4Sn-wObn_JnO-zbitauvGvheBERZCCJ4GW4SyaTzf5TRu1IME8ezAcRFcRxK-Zk7DVXEnpFaLIfrYpPYkFmyQ-BbBM2jww/s1600/Untitled.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="282" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9uaGXeP7GffaXEjKhxABowiPj3pgVpfWghiSA1iNgWPaxoI4Sn-wObn_JnO-zbitauvGvheBERZCCJ4GW4SyaTzf5TRu1IME8ezAcRFcRxK-Zk7DVXEnpFaLIfrYpPYkFmyQ-BbBM2jww/s640/Untitled.png" width="640" /></a></div>
<br />
<h3>
Deploy mappings</h3>
<div>
<div>
Elasticsearch is able to guess the data types of your data and index them in a format that is usually suitable. However, this is not always true so we need to tell Elasticserach how to store each field and that is why CB needs to know this in advance.</div>
<div>
<br /></div>
<div>
To deploy mappings, create a Blob Storage container with the option "Public Container" - this allows the content to be publicly available in a read-only fashion. </div>
<div>
<br /></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://www.blogger.com/blogger.g?blogID=2889416825250254881" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9FBMQNRD_VhJvM3sv0TWMyYLZPhjvJcHq94UsGJVNdDU6LkPFUW8hJyXs9xuWHdMws37ROSL5_Wui8YXsT8aDBDwLBd66hrJNHJSUctNf8zWLjq4cvBdvGzcEYyDCJoxddmB6P2sDwYjo/s1600/Screen+Shot+2015-08-27+at+08.36.49.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="272" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9FBMQNRD_VhJvM3sv0TWMyYLZPhjvJcHq94UsGJVNdDU6LkPFUW8hJyXs9xuWHdMws37ROSL5_Wui8YXsT8aDBDwLBd66hrJNHJSUctNf8zWLjq4cvBdvGzcEYyDCJoxddmB6P2sDwYjo/s400/Screen+Shot+2015-08-27+at+08.36.49.png" width="400" /></a></div>
You would need the URL for the next step. It is in the format:<br />
<span style="background-color: white; color: #222222; font-family: Menlo, monospace; font-size: 11px; white-space: pre-wrap;">https://<storage account name>.blob.core.windows.net/<container name>/</span><br />
<br />
Also use the tool of your choice and copy the mapping files in the mappings folder under ConveyorBelt directory.<br />
<br />
<h3>
Configure and deploy</h3>
Once you have built the solution, rename <span style="font-family: Courier New, Courier, monospace;">tokens.json.template</span> file to <span style="font-family: Courier New, Courier, monospace;">tokens.json</span> and edit tokens.json file (if you need some more info, find the instructions <a href="https://github.com/aliostad/ConveyorBelt/blob/master/README.md" target="_blank">here</a>). Then in the same PowerShell window, run the command below, replacing placeholders with your own values:<br />
<pre style="background-color: #f7f7f7; border-radius: 3px; box-sizing: border-box; color: #333333; font-family: Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 13.6000003814697px; font-stretch: normal; line-height: 1.45; overflow: auto; padding: 16px; word-break: normal; word-wrap: normal;">.\<span class="pl-c1" style="box-sizing: border-box; color: #0086b3;">PublishCloudService.ps1</span> <span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">`</span>
<span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">-</span>serviceName <span class="pl-k" style="box-sizing: border-box; color: #a71d5d;"><</span>name your ConveyorBelt Azure service<span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">></span> <span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">`</span>
<span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">-</span>storageAccountName <span class="pl-k" style="box-sizing: border-box; color: #a71d5d;"><</span>name of the storage account needed <span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">for</span> the deployment of the service<span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">></span> <span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">`</span>
<span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">-</span>subscriptionDataFile <span class="pl-k" style="box-sizing: border-box; color: #a71d5d;"><</span>your .publishsettings file<span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">></span> <span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">`</span>
<span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">-</span>selectedsubscription <span class="pl-k" style="box-sizing: border-box; color: #a71d5d;"><</span>name of subscription to use<span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">></span> <span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">`</span>
<span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">-</span>affinityGroupName <span class="pl-k" style="box-sizing: border-box; color: #a71d5d;"><</span>affinity group or Azure region to deploy to<span class="pl-k" style="box-sizing: border-box; color: #a71d5d;">></span></pre>
After running the commands, you should see the PowerShell deploying CB to the cloud with a single Medium instance. In the storage account you had defined, you should now find a new table, whose name you defined in the tokens.json file.<br />
<br />
<h3>
Configure your diagnostic sources</h3>
Configuring the diagnostic sources can wildly differ depending on the type of the source. But for standard tables such as WADLogsTable, WADPerformanceCountersTable and WADWindowsEventLogsTable (whose mapping file you just copied) it will be straightforward.<br />
<br />
Now choose an Azure diagnostic Storage Account with some data, and in the diagnostic source table, create a new row and add the entries below:<br />
<br />
<ul style="box-sizing: border-box; color: #333333; font-family: 'Helvetica Neue', Helvetica, 'Segoe UI', Arial, freesans, sans-serif; font-size: 16px; line-height: 25.6000003814697px; margin-bottom: 16px; margin-top: 0px; padding: 0px 0px 0px 2em;">
<li style="box-sizing: border-box;">PartitionKey: whatever you like - commonly <code style="background-color: rgba(0, 0, 0, 0.0392157); border-radius: 3px; box-sizing: border-box; font-family: Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 13.6000003814697px; margin: 0px; padding: 0.2em 0px;"><top level business domain>_<mid level business domain></code></li>
<li style="box-sizing: border-box;">RowKey: whatever you like - commonly <code style="background-color: rgba(0, 0, 0, 0.0392157); border-radius: 3px; box-sizing: border-box; font-family: Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 13.6000003814697px; margin: 0px; padding: 0.2em 0px;"><env: live/test/integration>_<service name>_<log type: logs/wlogs/perf/iis/custom></code></li>
<li style="box-sizing: border-box;">ConnectionString (string): connection string to the Storage Account containing <code style="background-color: rgba(0, 0, 0, 0.0392157); border-radius: 3px; box-sizing: border-box; font-family: Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 13.6000003814697px; margin: 0px; padding: 0.2em 0px;">WADLogsTable (or others)</code></li>
<li style="box-sizing: border-box;">GracePeriodMinutes (int): Depends on how often your logs gets copied to Azure table. If it is 10 minutes then 15 should be ok, if it is 1 minute then 3 is fine.</li>
<li style="box-sizing: border-box;">IsActive (bool): True</li>
<li style="box-sizing: border-box;">MappingName (string): <code style="background-color: rgba(0, 0, 0, 0.0392157); border-radius: 3px; box-sizing: border-box; font-family: Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 13.6000003814697px; margin: 0px; padding: 0.2em 0px;">WADLogsTable </code>. ConveyorBelt would look for mapping in URL "X/Y.json" where X is the value you defined in your tokens.json for mappings path and Y is the TableName (see below).</li>
<li style="box-sizing: border-box;">LastOffsetPoint (string): set to ISO Date (<i>second and millisecond MUST BE ZERO!!</i>) <b>from which</b> you want the data to be copied e.g. 2015-02-15T19:34:00.0000000+00:00</li>
<li style="box-sizing: border-box;">LastScheduled (datetime): set it to a date in the past, same as the LastOffset point. Why do we have both? Well each does something different so we need both. </li>
<li style="box-sizing: border-box;">MaxItemsInAScheduleRun (int): 100000 is fine</li>
<li style="box-sizing: border-box;">SchedulerType (string): ConveyorBelt.Tooling.Scheduling.MinuteTableShardScheduler</li>
<li style="box-sizing: border-box;">SchedulingFrequencyMinutes (int): 1</li>
<li style="box-sizing: border-box;">TableName (string): WADLogsTable, WADPerformanceCountersTable or WADWindowsEventLogsTable</li>
</ul>
And save. OK, now CB will start shovelling your data to your Elasticsearch and you should start seeing some data. If you do not, look at the entries you have created in the Table Storage and you will find an Error column which tells you what went wrong. Also to investigate further, just RDP to one of your ConveyorBelt VMs and run <a href="https://technet.microsoft.com/en-us/library/bb896647.aspx" target="_blank">DebugView</a> while having "Capture Global Win32" enabled - you should see some activity similar to below picture. Any exceptions will also show in there.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg27Cw9VCwz1bmX2Al5NG68aN3FtjervXOOKO16B-UL3ehQ8FyvPLyTXUj77ZYwktp-BXbuGk3HuxSO_WG6Y1zH2WB_VVealZd7amOPmdbNtYtLtlMfHwxrBWE5pxJMpeTouyZZL4BPOJP-/s1600/Screen+Shot+2015-08-27+at+21.18.04.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="283" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg27Cw9VCwz1bmX2Al5NG68aN3FtjervXOOKO16B-UL3ehQ8FyvPLyTXUj77ZYwktp-BXbuGk3HuxSO_WG6Y1zH2WB_VVealZd7amOPmdbNtYtLtlMfHwxrBWE5pxJMpeTouyZZL4BPOJP-/s400/Screen+Shot+2015-08-27+at+21.18.04.png" width="400" /></a></div>
<br />
OK, that is it... you are done! ... well barely 20 minutes, wasn't it? :)<br />
<br />
<br />
<span style="color: #741b47; font-size: x-large;">Now in case you are interested in setting up ES+Kibana in Linux, here is your little guide.</span><br />
<h2>
Set up your Elasticsearch in Linux</h2>
<h3>
<a href="https://www.blogger.com/null" id="Set_up_your_Elasticsearch_30"></a></h3>
You can run Elasticsearch on Windows or Linux - I prefer the latter. To set up an Ubuntu box on Azure, you can follow instructions <a href="https://www.blogger.com/!!">here</a>. Ideally you need to add a Disk Volume as the VM disks are ephemeral - all you need to know is outlined <a href="https://www.blogger.com/!!">here</a>. Make sure you follow instructions to re-mount the drive after reboots. Another alternative, especially for your dev and test environments, is to go with D series machines (SSD hard disks) and use the ephemeral disks - they are fast and basically if you lose the data, you can always set ConveyorBelt to re-add the data, and it does it quickly. As I said before, never use Elasticsearch to master your logging data so you can recover losing it.<br />
<br />
Almost all of the commands and settings below, needs to be run in an SSH session. If you are a geek with a lot of linux experience, you might find some of details below obvious and unnecessary - in which case just move on.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgL4yaXYd2bmL24sW-qXSIS9s1yFfvXaJSfLhHn_WOeqaAjpQPg__xLGSAPyefQL8Js3AdbDMyIbNpicJLFksMRJm5pYCcwepYADDG0oOjFeBpqn_9JOy-hJQAs7CM_OIWuMg8Vvg8xe6Vi/s1600/Screen+Shot+2015-08-26+at+23.17.17.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="136" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgL4yaXYd2bmL24sW-qXSIS9s1yFfvXaJSfLhHn_WOeqaAjpQPg__xLGSAPyefQL8Js3AdbDMyIbNpicJLFksMRJm5pYCcwepYADDG0oOjFeBpqn_9JOy-hJQAs7CM_OIWuMg8Vvg8xe6Vi/s640/Screen+Shot+2015-08-26+at+23.17.17.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="font-size: 12.8000001907349px;">SSH is your best friend</td></tr>
</tbody></table>
<br />
Anyway, back to setting up ES - after you got your VM box provisioned, SSH to the box and install Oracle JDK:<br />
<pre><code>sudo <span class="hljs-built_in">add</span>-apt-repository <span class="hljs-keyword">pp</span><span class="hljs-variable">a:webupd8team</span>/java
sudo apt-<span class="hljs-built_in">get</span> <span class="hljs-keyword">update</span>
sudo apt-<span class="hljs-built_in">get</span> install oracle-java7-installer
</code></pre>
And then install Elasticsearch:<br />
<pre><code>wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.1.deb
sudo dpkg -<span class="hljs-tag">i</span> elasticsearch-1.7.1.deb
</code></pre>
Now you have installed ES v 1.7.1. To set Elasticsearch to start at reboots (equivalent of Windows services) run these commands in SSH:<br />
<pre><code><span class="hljs-built_in">sudo</span> update-rc.d elasticsearch defaults <span class="hljs-number">95</span> <span class="hljs-number">10</span>
<span class="hljs-built_in">sudo</span> /etc/init.d/elasticsearch start
</code></pre>
Now ideally you would want to move the data and logs to the durable drive you have mounted, just edit the Elasticsearch config in <em>vim</em> and change:<br />
<pre><code>sudo vim /etc/elasticsearch/elasticsearch<span class="hljs-class">.yml</span>
</code></pre>
and then (note uncommented lines):<br />
<pre><code><span class="hljs-title">path</span>.<span class="hljs-typedef"><span class="hljs-keyword">data</span>: /mounted/elasticsearch/<span class="hljs-keyword">data</span></span>
<span class="hljs-preprocessor"># Path to temporary files:</span>
<span class="hljs-preprocessor">#</span>
<span class="hljs-preprocessor">#path.work: /path/to/work</span>
<span class="hljs-preprocessor"># Path to log files:</span>
<span class="hljs-preprocessor">#</span>
<span class="hljs-title">path</span>.logs: /mounted/elasticsearch/<span class="hljs-typedef"><span class="hljs-keyword">data</span></span>
</code></pre>
Now you are ready to restart Elasticsearch:<br />
<pre><code>sudo <span class="hljs-keyword">service</span> elasticsearch <span class="hljs-literal">restart</span>
</code></pre>
<blockquote>
Note: Elasticsearch is Memory, CPU and IO hungry. SSD drives really help but if you do not have them (class D VMs), make sure provide plenty of RAM and enough CPU. Searches are CPU heavy so it will depend on number of concurrent users using it.</blockquote>
If your machine has a lot of RAM, make sure you set ES memory settings as the default ones will be small. So update the file below and set the memory to 50-60% of the total memory size of the box:<br />
<pre><code>sudo vim <span class="hljs-regexp">/etc/</span><span class="hljs-keyword">default</span>/elasticsearch
</code></pre>
And uncomment this line and set the memory size to half of your box’s memory (here 14GB, just an example!):<br />
<pre><code><span class="hljs-attribute">ES_HEAP_SIZE</span>=<span class="hljs-string">14g</span></code></pre>
<pre><span style="font-family: Times;"><span style="white-space: normal;">There are potentially other changes that you might wanna do. For example, based on number of your nodes, you wanna set the index.number_of_replicas in your elasticsearch.yml - if you have a single node set it to 0. Also turning off the <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html" target="_blank">multicast/Zen discovery</a> since will not work in Azure. But these are things you can start learning about when you are completely hooked on the power of information provided by the solution. Believe me, more addicting than narcotics!</span></span></pre>
<h2>
Set up the Kibana in Linux</h2>
<h2>
<a href="https://www.blogger.com/null" id="Set_up_the_Kibana_86"></a></h2>
Up until version 4, Kibana was simply a set of static HTML+CSS+JS files that would run locally on your browser by just opening root HTML in the browser. This model could not really be sustainable and with version 4, Kibana runs as a service on a box, most likely different to your ES nodes. But for PoC and small use cases it is absolutely fine to run it on the same box.<br />
Installing Kibana is straightforward. You just need to download and unpack it:<br />
<pre><code>wget https://download.elastic.co/kibana/kibana/kibana-4.1.1-linux-x64.tar.gz
tar xvf </code>kibana-4.1.1-linux-x64.tar.gz</pre>
So now Kibana will be downloaded to your home directory and be unpacked to kibana-4.1.1-linux-x64 folder. If you want to see where that folder is you can run <code>pwd</code> to get the folder name.<br />
Now to run it you just run the command below to start kibana:<br />
<pre><code><span class="hljs-built_in">cd</span> bin
./kibana
</code></pre>
That will do for testing if it works but you need to configure it to start at the boot. We can use upstart for this. Just create a file in /etc/init folder:<br />
<pre><code>sudo vim /etc/init/kibana<span class="hljs-class">.conf</span>
</code></pre>
and copy the below (path could be different) and save:<br />
<pre><code><span class="hljs-literal">description</span> <span class="hljs-string">"Kibana startup"</span>
author <span class="hljs-string">"Ali"</span>
<span class="hljs-literal">start</span> on runlevel [<span class="hljs-number">2345</span>]
<span class="hljs-literal">stop</span> on runlevel [!<span class="hljs-number">2345</span>]
<span class="hljs-keyword">exec</span> /<span class="hljs-literal">home</span>/azureuser/kibana-<span class="hljs-number">4.1</span>.<span class="hljs-number">1</span>-linux-x64/bin/kibana
</code></pre>
Now run this command to make sure there is no syntax error:<br />
<pre><code>init-checkconf /etc/init/kibana<span class="hljs-class">.conf</span>
</code></pre>
If good then start the service:<br />
<pre><code><span class="hljs-built_in">sudo</span> start kibana
</code></pre>
If you have installed Kibana on the same box as the Elasticsearch and left all ports as the same, now you should be able to go to browser and browse to the server on port 5601 (make sure you expose this port on your VM by configuring endpoints) and you should see the Kibana screen (obviously no data).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4qFZt-_FZzgxDA9L2o3iS5DdoHLMCUWRhsxht0KOlQYL0TtOKRdaHASTMWMpGF0xX8En4dFILqM4M8l1uYViVp1W82-GSJj6G3KowHi8Q9aQw-QKWGRVOrVz4OwQ4hLJX6mRdy3zM9Nxw/s1600/Screen+Shot+2015-08-26+at+23.11.11.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="246" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4qFZt-_FZzgxDA9L2o3iS5DdoHLMCUWRhsxht0KOlQYL0TtOKRdaHASTMWMpGF0xX8En4dFILqM4M8l1uYViVp1W82-GSJj6G3KowHi8Q9aQw-QKWGRVOrVz4OwQ4hLJX6mRdy3zM9Nxw/s640/Screen+Shot+2015-08-26+at+23.11.11.png" width="640" /></a></div>
<br />
<br />aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com10tag:blogger.com,1999:blog-2889416825250254881.post-355528523289900162015-07-09T13:35:00.000+01:002015-07-28T21:42:14.838+01:00Daft Punk+Tool=Muse: word2vec model trained on a small Rock music corpus<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
In my last <a href="http://byterot.blogspot.co.uk/2015/06/five-crazy-abstractions-my-deep-learning-word2doc-model-just-did-NLP-gensim.html" target="_blank">blog post</a>, I outlined a few interesting results from a word2wec model trained on half a million news documents. This was pleasantly met with some positive reactions, some of which not necessarily due to the scientific rigour of the report but due to awareness effect of such "populist treatment of the subject" on the community. On the other hand, there were more than some negative reactions. Some believing I was "cherry-picking" and reporting only a handful of interesting results out of an ocean of mediocre performances. Others rejecting my claim that training on a small dataset in any language can produce very encouraging results. And yet others literally threatening me so that I would release the code despite I reiterating the code is small and not the point.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://img-new.cgtrader.com/items/66020/large_am_i_the_only_one_around_here_lithophane_3d_model_stl_191c19e9-4d83-4326-9fdb-969e8ff3483e.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://img-new.cgtrader.com/items/66020/large_am_i_the_only_one_around_here_lithophane_3d_model_stl_191c19e9-4d83-4326-9fdb-969e8ff3483e.jpg" height="277" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Am I the only one here thinking word2vec is freaking awesome?!</td></tr>
</tbody></table>
<br />
So I am back. And this time I have trained the model on a <em>very small</em> corpus of Rock artists obtained from Wikipedia, as part of my <strong>Rock History project</strong>. And I have built an API on top of the model so that you could play with the model and try out different combinations to your heart's content - [but please be easy on the API it is a small instance only] :) <strong>strictly no bots</strong>. And that's not all: I am releasing the code and the dataset (which is only 36K Wiki entries).<br />
<br />
But now, my turn to RANT for a few paragraphs.<br />
<br />
First of all, quantification of the performance of an unsupervised learning algo in a highly subjective field is very hard, time-consuming and potentially non-repeatable. Google in their <a href="http://arxiv.org/pdf/1506.05869.pdf" target="_blank">latest paper</a> on seq2seq had to resort to reporting mainly man-machine conversations. I feel in these subjects crowdsourcing the quantification is probably the best approach. Hence you would help by giving a rough accuracy score according to your experience.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzp1en_RO5sAXwsj2FaXkqCqkS9bgK4wu8F63BRcpY08Gy1o1jJm5HWV3peRu3qkezriPR4WPffT-KhplXEJDOv1Msylnf4imIUQ9WGhbVdOwx91WRmiOjV4zUb9lSBmRe2vNI05Cl7Noy/s1600/iycwz.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzp1en_RO5sAXwsj2FaXkqCqkS9bgK4wu8F63BRcpY08Gy1o1jJm5HWV3peRu3qkezriPR4WPffT-KhplXEJDOv1Msylnf4imIUQ9WGhbVdOwx91WRmiOjV4zUb9lSBmRe2vNI05Cl7Noy/s400/iycwz.jpg" width="393" /></a></div>
<br />
On the other hand, sorry, those who were expecting to see a formal paper - perhaps in laTex format - you completely missed the point. As others said, there are plenty of hardcode papers out there, feel free to knock yourselves down. My point was to evangelise to a much wider audience. And, if you liked what you saw, go and try it for yourself.<br />
<br />
Finally, alluding to "cognition" turned a lot of eyebrows but as Nando de Freitas puts it when asked about intelligence, whenever we build an intelligent machine, we will look at it as bogus not containing the "real intelligence" and we will discard it as not AI. So the world of Artifical Intelligence is a world of moving targets essentially because intelligence has been very difficult to define.<br />
<br />
For me, word2vec is a breath of fresh air in a world of arbitrary, highly engineered and complex NLP algorithms which can breach the gap forming a meaningful relationship between tokens of your corpus. And I feel it is more a tool enhancing other algorithms rather than the end product. But even on its own, it generates fascinating results. For example in this tiny corpus, it was not only able to find the match between the name of the artists, but it can successfully find matches between similar bands - able to be used it as a <strong>Recommender system</strong>. And then, even adding the vector of artists generates interesting fusion genres which tend to correspond to real bands influenced by them.<br />
<h2 id="api">
API</h2>
<i>BEWARE: Tokens are <b>case-sensitive</b>. So u2 and U2 not the same.</i><br />
<br />
The API is basically a simple RESTful flask on top of the model:<br />
<pre><code>http://localhost:5000/api/v1/rock/similar?pos=<pos>&neg=<neg>
</code></pre>
where <code>pos</code> and <code>neg</code> are comma separated list of zero to many 'phrases' (<code>pos</code> for similar, and <code>neg</code> for opposite) - that are English words, or multi-word tokens including name of the bands or phrases that have a Wiki entry (such as albums or songs) - list if which can be found here <a href="https://www.blogger.com/!!"></a>.<br />
For example:<br />
<pre><code>http://localhost:5000</code>/api/v1/rock/similar?pos=Captain%20Beefheart</pre>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSvKpeY0P0QXAC0zesTHOP27Xn9bRnR4uxifWh6Souz6u7afRGnwKpKuY85KpycNMJ4AOC68VjsxeFec25ct9izF2844hsumrN8GEc3J5cRn9RlzVhY1OUoQCEKcLGi9efPyFTrExwLUC0/s1600/Screen+Shot+2015-07-09+at+13.38.06.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSvKpeY0P0QXAC0zesTHOP27Xn9bRnR4uxifWh6Souz6u7afRGnwKpKuY85KpycNMJ4AOC68VjsxeFec25ct9izF2844hsumrN8GEc3J5cRn9RlzVhY1OUoQCEKcLGi9efPyFTrExwLUC0/s640/Screen+Shot+2015-07-09+at+13.38.06.png" width="331" /></a></div>
<br />
<br />
You can add vectors of words, for example to mix genres:<br />
<pre><code>http://localhost:5000</code>/api/v1/rock/similar?pos=Daft%20Punk,Tool&min_freq=50</pre>
or add an artist with an adjective for example a softer Bob Dylan:<br />
<pre><code>http://localhost:5000/api/v1/rock/similar?pos=Bob%20Dylan,soft&min_freq=50
</code></pre>
Or subtract:<br />
<pre><code>http://localhost:5000</code>/api/v1/rock/similar?pos=Bob%20Dylan&neg=U2</pre>
But the tokens do not have to be a band name or artist names:<br />
<pre><code>http://localhost:5000</code>/api/v1/rock/similar?pos=drug</pre>
If you pass a non-existent or misspelling (it is case-sensitive!) of a name or word, you will get an error:<br />
<pre><code>http://</code>localhost:5000/api/v1/rock/similar?pos=radiohead</pre>
<pre><code>
{
result: "Not in vocab: radiohead"
}
</code></pre>
<pre><code>
</code></pre>
You may pass minimum frequency of the word in the corpus to filter the output to remove the noice:<br />
<pre><code>http://localhost:5000/api/v1/rock/similar?pos=Daft%20Punk,Tool&min_freq=50
</code></pre>
<h2 id="code">
Code</h2>
The code on <a href="https://github.com/aliostad/WikiRockWord2Vec" target="_blank">github</a> as I said is tiny. Perhaps the most complex part of the code is the Dictionary Tokenisation which is one of the tools I have built to tokenise the text without breaking multi-word phrases and I have found it very useful allowing to produce much more meaningful results.<br />
<br />
The code is shared under <a href="http://opensource.org/licenses/MIT" target="_blank">MIT license</a>.<br />
<br />
To build the model, uncomment the line in wiki_rock_train.py, specifying the location of corpus:<br />
<br />
<pre><code>train_and_save('data/wiki_rock_multiword_dic.txt', 'data/stop-words-english1.txt', '<THE_LOCATION>/wiki_rock_corpus/*.txt')
</code></pre>
<h2 id="dataset">
Dataset</h2>
As mentioned earlier, dataset/corpus is the text from 36K Rock music artist entries on the Wikipedia. This list was obtained by scraping the links from the "<a href="https://en.wikipedia.org/wiki/List_of_rock_genres" target="_blank">List of rock genres</a>". Dataset can be downloaded from <a href="https://drive.google.com/file/d/0By4PF7Jis9FzTTFpS1VVVzB4NFk/view?usp=sharing" target="_blank">here</a>. For information on the Copyright of the Wikipedia text and its terms of use please see <a href="https://en.wikipedia.org/wiki/Wikipedia:Copyrights" target="_blank">here</a>.aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com2tag:blogger.com,1999:blog-2889416825250254881.post-20727815165929079922015-06-14T13:46:00.000+01:002015-06-14T13:56:44.037+01:00Five crazy abstractions my Deep Learning word2vec model just did<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script><i>Seeing is believing. </i><br />
<br />
Of course, there is a whole host of Machine Learning techniques available, thanks to the researchers, and to Open Source developers for turning them into libraries. And I am not quite a complete stranger to this field, I have been, on and off, working on Machine Learning over the last 8 years. But, nothing, absolutely nothing for me has ever come close to what blew my mind recently with word2vec: so effortless yet you feel <b>like</b> the model knows so much that it has obtained <i>cognitive coherence of the vocabulary</i>. Until neuroscientists nail cognition, I am happy to <i>foolishly</i> take that as some early form of machine cognition.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO-a0xFUUo1pl70ddeg44m7hn_JwzXnRz9hSkaeFTsjTXCwnPaHWyG9SLNHfiU3UYQyQHX9LvRSxvq2_ukJ_cKTM3yFqDXgdM8gFerDdYjSjYymFyAb0FrhfP8_LjP9T3ZEzHXW0QHF77Y/s1600/Singularity_Dance+%25281%2529.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="593" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO-a0xFUUo1pl70ddeg44m7hn_JwzXnRz9hSkaeFTsjTXCwnPaHWyG9SLNHfiU3UYQyQHX9LvRSxvq2_ukJ_cKTM3yFqDXgdM8gFerDdYjSjYymFyAb0FrhfP8_LjP9T3ZEzHXW0QHF77Y/s640/Singularity_Dance+%25281%2529.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;">Singularity Dance - Wiki</span></td></tr>
</tbody></table>
<br />
But, no, don't take my word for it! If you have a corpus of 100s of thousand documents (or even 10s of thousands), feed it and see it for yourselves. What language? Doesn't really matter! My money is on that you will get results that equally blow your tops off.<br />
<br />
<h2>
What is word2vec?</h2>
word2vec is a Deep Learning technique first <a href="http://arxiv.org/pdf/1301.3781v3.pdf" target="_blank">described</a> by Tomas Mikolov only 2 years ago but due to its simplicity of algorithm and yet surprising robustness of the results, it has been widely <a href="https://radimrehurek.com/gensim/models/word2vec.html" target="_blank">implemented</a> and adopted. This technique basically trains a model based on a neighborhood window of words in a corpus and then projects the result onto [an arbitrary number of] <i>n</i> dimensions where each word is a vector in the <i>n</i> dimensional space. Then the words can be compared using the <a href="https://en.wikipedia.org/wiki/Cosine_similarity" target="_blank">cosine similarity</a> of their vectors. And what is much more interesting is the <b>arithmetics</b>: vectors can be added or subtracted for example vector of <span style="font-family: Courier New, Courier, monospace;">Queen</span> is almost equal to <span style="font-family: Courier New, Courier, monospace;">King</span> + <span style="font-family: Courier New, Courier, monospace;">Woman</span> - <span style="font-family: Courier New, Courier, monospace;">Man</span>. In other words, if you remove Man from the King and add Woman to it, <i>logically</i> you get Queen and but this model is able to represent it <i>mathematically</i>.<br />
<br />
LeCun recently <a href="http://arxiv.org/abs/1502.01710" target="_blank">proposed</a> a variant of this approach in which he uses characters and not words. Altogether this is a fast moving space and likely to bring about significant change in the state of the art in Natural Language Processing.<br />
<br />
<h2>
Enough of this, show us ze resultz!</h2>
OK, sure. For those interested, I have brought the methods after the results.
<br />
<br />
<h3>
1) Human - Animal = Ethics</h3>
Yeah, as if it knows! So if you remove the animal traits from human, what remains is Ethics. And in word2vec terms, subtracting the vector of <span style="font-family: Courier New, Courier, monospace;">Human</span> by the vector of <span style="font-family: Courier New, Courier, monospace;">Animal</span> results in a vector which is closest to <span style="font-family: Courier New, Courier, monospace;">Ethics</span> (0.51). The other similar words to the <span style="font-family: Courier New, Courier, monospace;">Human - Animal</span> vector are the words below: <span style="font-family: Courier New, Courier, monospace;">spirituality</span>, <span style="font-family: Courier New, Courier, monospace;">knowledge</span> and <span style="font-family: Courier New, Courier, monospace;">piety</span>. Interesting, huh?<br />
<br />
<h3>
2) Stock Market ≈ Thermometer</h3>
In my model the word Thermometer has a similarity of 0.72 to the Stock Market vector and the 6th similar word to it - most of closer words were other names for the stock market. It is not 100% clear to me how it was able to make such abstraction but perhaps proximity of Thermometer to the words increase/decrease or up/down, etc could have resulted in the similarity. In any case, likening Stock Market to Thermometer is a higher level abstraction.<br />
<br />
<h3>
3) Library - Books = Hall</h3>
What remains of a library if you were to remove the books? word2vec to the rescue. The similarity is 0.49 and next words are: <span style="font-family: Courier New, Courier, monospace;">Building</span> and <span style="font-family: Courier New, Courier, monospace;">Dorm</span>. <span style="font-family: Courier New, Courier, monospace;">Hall</span><span style="font-family: Times, Times New Roman, serif;">'s</span> vector is already similar to that of <span style="font-family: Courier New, Courier, monospace;">Library</span> (so the subtraction's effect could be incidental) but <span style="font-family: Courier New, Courier, monospace;">Building</span> and <span style="font-family: Courier New, Courier, monospace;">Dorm</span> are not. Now <span style="font-family: Courier New, Courier, monospace;">Library - Book</span> (and not <span style="font-family: Courier New, Courier, monospace;">Books</span>) is closest to <span style="font-family: Courier New, Courier, monospace;">Dorm</span> with 0.51 similarity.<br />
<br />
<h3>
4) Obama + Russia - USA = Putin</h3>
This is a classic case similar to <span style="font-family: Courier New, Courier, monospace;">King+Woman-Man</span> but it was interesting to see that it works. In fact finding leaders of most countries was successful using this method. For example, <span style="font-family: Courier New, Courier, monospace;">Obama + Britain - USA</span> finds <span style="font-family: Courier New, Courier, monospace;">David Cameron</span> (0.71).<br />
<br />
<h3>
5) Iraq - Violence = Jordan</h3>
So a country that is most similar to Iraq after taking its violence is Jordan, its neighbour. Iraq's vector itself is most similar to that of Syria - for obvious reasons. After Jordan, next vectors are <span style="font-family: Courier New, Courier, monospace;">Lebanon</span>, <span style="font-family: Courier New, Courier, monospace;">Oman</span> and <span style="font-family: Courier New, Courier, monospace;">Turkey</span>.<br />
<br />
Not enough? Hmm there you go with another two...<br />
<br />
<h3>
Bonus) President - Power = Prime Minister</h3>
Kinda obvious, isn't it? But of course we know it depends which one is Putin which one is Medvedev :)<br />
<br />
<h3>
Bonus 2) Politics - Lies = Germans??</h3>
OK, I admit I don't know what this one really means but according to my model, German politicians do not lie!<br />
<div>
<br /></div>
Now the <i>boring stuff</i>...<br />
<br />
<h2>
Methods</h2>
I used a corpus of publicly available online news and articles. Articles extracted from a number of different Farsi online websites and on average they contained ~ 8KB of text. The topics ranged from local and global Politics, Sports, Arts and Culture, Science and Technologies, Humanities and Religion, Health, etc.<br />
<br />
The processing pipeline is illustrated below:<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8qNpRvxQF7bic-tjrbEYnqPH9eHVHhPd7bpiYIyZ7AxZeaX_oSuYzTCYxxJBJGFRrv6Kok6GAz_SBCs90mab-1lhuSSEdr4RjJkxhzVNPSRk-n_gre3sKqbh1_QFE0tEDaE9Blp12riEq/s1600/word2vec+pipeline+-+crop.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="258" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8qNpRvxQF7bic-tjrbEYnqPH9eHVHhPd7bpiYIyZ7AxZeaX_oSuYzTCYxxJBJGFRrv6Kok6GAz_SBCs90mab-1lhuSSEdr4RjJkxhzVNPSRk-n_gre3sKqbh1_QFE0tEDaE9Blp12riEq/s640/word2vec+pipeline+-+crop.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Figure 1 - Processing Pipeline</td></tr>
</tbody></table>
For word segmentation, an approach was used to join named entities using a dictionary of ~ 40K multi-part words and named entities.<br />
<br />
Gensim's <a href="https://radimrehurek.com/gensim/models/word2vec.html" target="_blank">word2vec</a> implementation was used to train the model. The default <i><span style="font-family: Courier New, Courier, monospace;">n=100</span></i> and <i><span style="font-family: Courier New, Courier, monospace;">window=5</span></i> worked very well but to find the optimum values, another study needs to be conducted.<br />
<br />
In order to generate the results presented in this post, <span style="font-family: Courier New, Courier, monospace;">most_similar</span> method was used. No significant difference between using <span style="font-family: Courier New, Courier, monospace;">most_similar</span> and <span style="font-family: Courier New, Courier, monospace;">most_similar_cosmul</span> was found.<br />
<br />
A significant problem was discovered where words with spelling mistake in the corpus or infrequent words generate sparse vectors which result in a very high score of similar with some words. I used frequency of the word in the corpus to filter out such occasions.<br />
<br />
<h2>
Conclusion</h2>
<div>
word2vec is relatively simple algorithm with surprisingly remarkable performance. Its implementation are available in a variety of Open Source libraries, including Python's Gensim. Based on the preliminary results, it appears that word2vec is able to make higher levels abstractions which nudges towards cognitive abilities.</div>
<div>
<br /></div>
Despite its remarkable it is not quite clear how this ability can be used in an application, although in its current form, it can be readily used in finding antonym/synonym, spelling correction and stemming.<br />
<br />aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com16tag:blogger.com,1999:blog-2889416825250254881.post-2148047132526292262015-05-27T23:39:00.000+01:002015-05-28T12:16:18.451+01:00PerfIt! decoupled from Web API: measure down to a closure in your .NET applicationLevel [<a href="http://byterot.blogspot.co.uk/2012/03/post-level-description.html" target="_blank">T2</a>]<br />
<br />
<script src="http://softxnet.co.uk/sh/js/shcore.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushjscript.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushcsharp.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
Performance monitoring is an essential part of doing any serious-scale software. Unfortunately in .NET ecosystem, historically first looking for direction and tooling from Microsoft, there has been a real lack of good tooling - for some reason or another effective monitoring has not been a priority for Microsoft although this could be changing now. Healthy growth of .NET Open Source community in the last few years brought a few innovations in this space (<a href="http://getglimpse.com/">Glimpse</a> being one) but they focused on solving development problems rather than application telemetry.<br />
<br />
2 years ago, while trying to build and deploy large scale APIs, I was unable to find anything suitable to save me having to write a lot of boilerplate code to add performance counters to my applications so I coded a <em>working prototype</em> of performance counters for ASP .NET Web API and open sourced and shared it on Github, calling it <a href="https://github.com/aliostad/PerfIt">PerfIt!</a> for the lack of a better name. Over the last few years PerfIt! has been deployed to production in a good number of companies running .NET. I added the client support too to measure calls made by <code>HttpClient</code> and it was a handy addition.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg16UFdQQ_QnmjoUAs5TdcUeP8sdsOZCk0KOMwXAIpW-kjn68Yh0ZiWNqBA1Fk1LC7KSojcNfgMru5j-SX8AkCCXQoV2wu7VnIlgY1pzj61BRw0ShrYKAHCwXhNUQhlRxaUdfNW11dRi5vl/s1600/5354737459_fd82ec1056_b.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="426" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg16UFdQQ_QnmjoUAs5TdcUeP8sdsOZCk0KOMwXAIpW-kjn68Yh0ZiWNqBA1Fk1LC7KSojcNfgMru5j-SX8AkCCXQoV2wu7VnIlgY1pzj61BRw0ShrYKAHCwXhNUQhlRxaUdfNW11dRi5vl/s640/5354737459_fd82ec1056_b.jpg" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">From <a href="https://www.flickr.com/photos/flowizm/5354737459/in/photolist-jGnGgS-9kn5hN-kCQWda-9Qjkb9-cT9zYm-aePa18-jaVUXA-dZvugW-cT9A5J-sgA71v-7rNLdi-ccnkoS-bP9aqP-e5TXsM-6bACnr-6SnuiB-kDK8MZ-kn1jad-dFSQNG-kCRgvZ-9QgtUP-368FTi-e3vi4z-p2Bv5t-exYBaS-9abqTD-4QNuQh-dvb3Qc-bFyL6x-9uXLtj-58HNLK-6pT7Ta-81Mfvj-acYGbo-bFyL5e-9eRke5-fhYKk8-nRJMr9-bEGkd9-ebxM89-6tJnL5-9ttTqk-dSsLGa-cEsND7-dSysvS-bFyL8r-dSymGL-krq4sB-gw4ncr-eWnh9R" target="_blank">Flickr</a></td></tr>
</tbody></table>
<br />
This is all not bad but in reality, REST API calls do not cover all your outgoing or incoming server communications (which you naturally would like to measure): you need to communicate to databases (relational or NoSQL), caches (e.g. Redis), Blob Storages, and many other. On top of that, there could be some other parts of your code that you would like to measure such as CPU intensive algorithms, reading or writing large local files, running Machine Learning classifiers, etc. Of course, PerfIt! in this current incarnation cannot help with any of those cases.<br />
<br />
It turned out with a little change and separating performance monitoring from Web API semantic (which is changing with vNext again) this can be done. Actually, not getting much credit for it, it was mainly ideas from two of my best colleagues which I am grateful for their contribution: <a href="https://uk.linkedin.com/in/andresdrb">Andres Del Rio</a> and <a href="https://www.linkedin.com/pub/jaiganesh-sundaravel/88/216/375">JaiGanesh Sundaravel</a>.<br />
<br />
<h2 id="new-perfit-features-and-limitations-">
New PerfIt! features (and limitations)</h2>
So currently at version alpha2, you can get the new PerfIt! by using nuget (when it works):<br />
<pre class="brush:csharp">PM> install-package PerfIt -pre</pre>
Here are the extra features that you get from the new PerfIt!.<br />
<br />
<h3 id="measure-metrics-for-a-closure">
Measure metrics for a closure</h3>
<div>
<br /></div>
So at the lowest level of an <em>aspect</em> abstraction, you might be interested in measuring metrics for a closure, for example:<br />
<pre class="brush:csharp">Action action = Thread.Sleep(1000);
action(); // measure
</pre>
Or in case of an async operation:<br />
<pre class="brush:csharp">foo result = null;
Func<Task> asyncCall = async () => result = await _command.ExecuteScalar();
// and then
await asyncCall();
</pre>
This closure could be wrapped in a method of course, but there again, having a unified closure interface is essential in building a common tool: each method can have different inputs of outputs while all can be presented in a closure having the same interface.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9nrbD3M2wOt87KtRWTy5_NwSQAXerj3v4-otGmwaBGdGxgyRlAyhx_I0jPeHe8ZW5LozeeMJDUHnWi2js-RaDAp21JYF3eC_QQlzdAD397hUx53gx0DFtCA8Dn-ZRYgjbStVBWeS3O-8X/s1600/10120004174_1bb3ceba2f_k.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="426" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9nrbD3M2wOt87KtRWTy5_NwSQAXerj3v4-otGmwaBGdGxgyRlAyhx_I0jPeHe8ZW5LozeeMJDUHnWi2js-RaDAp21JYF3eC_QQlzdAD397hUx53gx0DFtCA8Dn-ZRYgjbStVBWeS3O-8X/s640/10120004174_1bb3ceba2f_k.jpg" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Thames Barriers <b>Closure</b> - <a href="https://www.flickr.com/photos/whealie/10120004174/in/photolist-gqgEEA-gqgPq2-gqgYqY-fnbFSB-fnrfJS-fxCcCz-fnbMhn-5Y76e6-fwdT6F-nDUzBf-nYaWxt-nDV37X-iufi6A-nDVhtn-nWgTv5-mJt7Dk-nWoXxn-nUmoyo-aYjKUv-gRjfR8-ftrxck-bEXKpL-fndXDK-gqheUd-gqhzND-gqgEiU-gqhcRf-gqgCTu-gqgCdb-gqhwag-gqgAg5-gqgzHw-gqgyYL-gqgygd-gqgNKV-gqhrtx-gqgw4N-gqhpXX-gqguXu-gqgumQ-gqhnx6-gqhmMt-gqgGpP-gqgXUs-gqhiQ6-gqgoVj-9qkL5V-55ShJo-qU2oiV-aRv9fK" target="_blank">Flickr</a>. Sorry couldn't find a more related picture, but enjoy all the same</td></tr>
</tbody></table>
So in order to measure metrics for the <span style="font-family: Courier New, Courier, monospace;">action</span> closure, all we need to do is:<br />
<pre class="brush:csharp">var ins = new SimpleInstrumentor(new InstrumentationInfo()
{
Counters = CounterTypes.StandardCounters,
Description = "test",
InstanceName = "Test instance"
},
TestCategory);
ins.Instrument(() => Thread.Sleep(100));
</pre>
<br />
A few things here:<br />
<ul>
<li><span style="font-family: Courier New, Courier, monospace;">SimpleInstrumentor</span> is responsible for providing a hook to instrument your closures. </li>
<li><span style="font-family: Courier New, Courier, monospace;">InstrumentationInfo</span> contains the metadata for publishing the performance counters. You provide the name of the counters to raise to it (provided if they are not standard, you have already defined )</li>
<li>You will be more likely to create a single instrumentor instance for each <i>aspect</i> of your code that you would like to instrument.</li>
<li>This example assumes the counters and their category are installed. <span style="font-family: Courier New, Courier, monospace;">PerfitRuntime</span> class provides mechanism to register your counters on the box - which is covered in <a href="http://byterot.blogspot.co.uk/2014/09/Performance-Counters-for-your-HttpClient-aspnet-webapi-monitoring-api-rest.html" target="_blank">previous posts.</a></li>
<li><span style="font-family: Courier New, Courier, monospace;">Instrument</span> method has an option to pass the context as a string parameter. This context can be used to correlate metrics with application context in ETW events (see below).</li>
</ul>
<br />
Doing an async operation is not that different:<br />
<pre class="brush:csharp">ins.InstrumentAsync(async () => await Task.Delay(100));
//or even simpler:
ins.InstrumentAsync(() => Task.Delay(100))
</pre>
<span style="font-family: Courier New, Courier, monospace;">SimpleInstrumentor</span> is the building block for higher level abstractions of instrumentation. For example, <span style="font-family: Courier New, Courier, monospace;">PerfitClientDelegatingHandler</span> now uses <span style="font-family: Courier New, Courier, monospace;">SimpleInstrumentor</span> behind the scene.<br />
<br />
<h3 id="measure-metrics-for-a-closure">
Raise ETW events, effortlessly</h3>
<div>
<br /></div>
<a href="https://msdn.microsoft.com/en-us/library/ee517330%28v=vs.110%29.aspx" target="_blank">Event Tracing for Windows</a> (ETW) is a low overhead framework for logging, instrumentation, tracing and monitoring that has been in Windows since version 2000. Version 4.5 of the .NET Framework exposes this feature in the class <a href="https://msdn.microsoft.com/en-us/library/system.diagnostics.tracing.eventsource%28v=vs.110%29.aspx" target="_blank">EventSource</a>. Probably suffice to say, if you are not using ETW you are doing it wrong.<br />
<br />
One problem with Performance Counters is that they use sampling, rather than events. This is all well and good but lacks the resolution you sometimes need to find problems. For example, if 1% of calls take > 2 seconds, you need on average 100 samples and if you are unlucky a lot more to see the spike.<br />
<br />
Another problem is lack of context with the measurements. When you see such a high response, there is really no way to find out what was the context (e.g. customerId) for which it took wrong. This makes finding performance bottlenecks more difficult.<br />
<br />
So <span style="font-family: Courier New, Courier, monospace;">SimpleInstrumentor</span>, in addition to doing counters for you, raises <span style="font-family: Courier New, Courier, monospace;">InstrumentationEventSource</span> ETW events. Of course, you can turn it off or just leave it as it has almost no impact. But so much better, is that <a href="https://msdn.microsoft.com/en-us/library/dn440729%28v=pandp.60%29.aspx#sec21" target="_blank">use a sink</a> (Table Storage, ElasticSearch, etc) and persist these events to a store and then analyse using something like ElasticSearch and Kibana - as we do it in ASOS. Here is a console log sink, subscribed to these events:<br />
<pre class="brush:csharp">var listener = ConsoleLog.CreateListener();
listener.EnableEvents(InstrumentationEventSource.Instance, EventLevel.LogAlways,
Keywords.All);
</pre>
And you would see:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7zWtl7wRz8ctcRxnJHizObGOfY0iGb8bX-gGWMSk8wjrBVs6IdJhIsqo08QhaMrcHWlpSY-plkXbpVbs-5-_Omdjd9US4Y7jHQ3FKixU9V4N0pAxm72FklMwtX1RxeHFIVujr_POaxnUQ/s1600/unnamed.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="68" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7zWtl7wRz8ctcRxnJHizObGOfY0iGb8bX-gGWMSk8wjrBVs6IdJhIsqo08QhaMrcHWlpSY-plkXbpVbs-5-_Omdjd9US4Y7jHQ3FKixU9V4N0pAxm72FklMwtX1RxeHFIVujr_POaxnUQ/s640/unnamed.png" width="640" /></a></div>
<br />
Obviously this might not look very impressive but when you take into account that you have the <span style="font-family: Courier New, Courier, monospace;">timeTakenMilli</span> (here 102ms) and have the option to pass <span style="font-family: Courier New, Courier, monospace;">instrumentationContext</span> string (here "test..."), you could correlate performance with the context of in your application.<br />
<br />
<h3 id="only-net-4-5-or-higher">
PerfIt for Web API is all there just in a different nuget package</h3>
<div>
<br />
If you have been using previous versions of PerfIt, do not panic! We are not going to move the cheese, so the client and server delegating handlers are all there only in a different package, so you just need to install <span style="font-family: Courier New, Courier, monospace;">Perfit.WebApi </span>package:</div>
<pre class="brush:csharp">PM> install-package PerfIt.WebApi -pre</pre>
The rest is just the same.<br />
<br />
<h3 id="only-net-4-5-or-higher">
Only .NET 4.5 or higher</h3>
<br />
After spending a lot of time writing async code in <a href="https://github.com/aliostad/CacheCow">CacheCow</a> which was .NET 4.0, I do not think anyone should be subjected to such torture, so my apologies to those using .NET 4.0 but I had to move PerfIt! to .NET 4.5. Sorry .NET 4.0 users.<br />
<br />
<h3 id="only-net-4-5-or-higher">
PerfIt for MVC, Windsor Castle interceptors and more</h3>
Yeah, there is more coming. PerfIt for MVC has been long asked by the community and castle interceptors can simply remove all cross cutting concern codes out of your core business code. Stay tuned and please provide feedback before going fully to v1!
<script type="text/javascript">
SyntaxHighlighter.all();
</script>aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com1tag:blogger.com,1999:blog-2889416825250254881.post-8884640435882492332015-05-10T16:25:00.000+01:002015-05-10T20:47:24.953+01:00Machine Learning and APIs: introducing Mills in REST API DesignLevel [<a href="http://byterot.blogspot.co.uk/2012/03/post-level-description.html" target="_blank">C3</a>]<br />
<br />
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
REST (REpresentational State Transfer) was designed with the "state" at its heart, literally, standing for the S in the middle of the acronym.<br />
<br />
<blockquote class="tr_bq">
<i><b>TL;DR:</b> Mill is a special type of resource where server's authority purely comes from exposing an algorithm, rather than <a href="http://byterot.blogspot.co.uk/2012/11/client-server-domain-separation-csds-rest.html" target="_blank">"defining, exposing and maintaining integrity of a state"</a>. Unlike an RPC style endpoint, it has to adhere to a set of 5 constraints (see below). </i></blockquote>
<br />
Historically, when there were only a few thousand servers around, the <i>state</i> was predominantly documents. People were creating, editing and sharing a lot of text documents, and some HTML. With HTTP 1.1, caching and concurrency was built into the protocol and enabled it to represent richer distributed computing concerns and we have been building . With the rising popularity of REST over the last 10 years, much of today's web has been built on top of RESTful thinking, whether what is visible or what is behind the presentation (most external layer) servers. Nowadays when we talk of <i>state</i>, we normally mean data or rather records persisted in a data store (relational or non-relational). A lot of today's data, directly or indirectly, is created, updated and deleted using REST APIs. And this is all cool, of course.<br />
<br />
When we design APIs, we map the state into REST Resources. It is very intuitive to think of resources as collection and instance. It is unambiguous and useful to communicate these concepts when for example we refer to <span style="font-family: Courier New, Courier, monospace;">/books</span> and <span style="font-family: Courier New, Courier, monospace;">/books/123</span> URLs, as the collection or instance resources, respectively. We interact with these resources using verbs, and although HTTP verbs are not meant to be used just for CRUD operations, interacting with <u>the state that exists on the server</u> is inherent in the design.<br />
<br />
But that is not all the story. Mainstream adoption of Machine Learning in the industry means we need to expose Machine Learning applications using APIs. The problem is the resource oriented approach of REST (where the state is at the heart of the design) does not work very well.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://pbs.twimg.com/media/CD3K-2LWAAAPRPZ.png:large" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="400" src="https://pbs.twimg.com/media/CD3K-2LWAAAPRPZ.png:large" width="358" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">By the way, I am NOT 51...</td></tr>
</tbody></table>
How-old.net is an example of a Machine Learning application where instead of being an application, it could have been an API. For example (just for illustration, you could use other media types too):<br />
<br />
<pre class="code">POST /age_gender_classifier HTTP/1.1
Content-Type: image/jpeg</pre>
And the response:<br />
<pre class="code">200 OK
Content-Type: application/json
{
"gender":"M"
"age":37
}</pre>
<br />
Server is generating a response to the request by carrying out complex face recognition and running a model, most likely a deep network model. Server is not returning a <i>state </i>stored on the server, in fact this whole process is completely stateless.<br />
<br />
And why does this matter? Well I feel if REST is supposed to move forward with our needs and use cases, it should define, clarify, internalise and finally digest edge cases. While such edge cases were pretty uncommon, with the rise and popularity of Machine Learning, such endpoints will be pretty standard.<br />
<br />
A few days ago, on the second day of <a href="http://mediterranea.apidays.io/" target="_blank">APIdays Mediterranea 2015</a> conference, I presented a talk on Machine Learning and APIs. And in this <a href="http://www.slideshare.net/AliKheyrollahi/topic-modelling-farsi-and-ap-is" target="_blank">talk</a> I presented simple concept of <b>Mills</b>. Mills, where you take your wheat to be ground and you carry back the flour.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfQ2eXuWNf3ugGSmpjBrGE_g8CtU3LdPrQRODWQQpAnxPH5sGrhS8QE60qXhvLKAwrfIn78mzvxE54BuhBfQikOBL2CYNi9-Ot9eDAvNbcHwSnOLkcYWMbzIm_Ea9h4i7VgENizGq4LZ8Y/s1600/2473951927_a9060e5f16_b.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="428" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfQ2eXuWNf3ugGSmpjBrGE_g8CtU3LdPrQRODWQQpAnxPH5sGrhS8QE60qXhvLKAwrfIn78mzvxE54BuhBfQikOBL2CYNi9-Ot9eDAvNbcHwSnOLkcYWMbzIm_Ea9h4i7VgENizGq4LZ8Y/s640/2473951927_a9060e5f16_b.jpg" width="640" /></a></div>
<br />
<br />
Basically, it all goes back to the<b> origin of a server's authority</b>. To bring an example, a "Customer Profile" service, exposed by a REST API, is the authority to go to when another service requires access to a customer's profile. The "Customer Profile" service has defined a <i>state</i>, which is profile of the customers, and is responsible for ensuring integrity of the state (enforcing business rules on the state). For example, marketing email preference can have values of <span style="font-family: Courier New, Courier, monospace;">None</span>, <span style="font-family: Courier New, Courier, monospace;">WeeklyDigest</span> or <span style="font-family: Courier New, Courier, monospace;">All</span>, it should not allow the value to be set to <span style="font-family: Courier New, Courier, monospace;">MonthlyDigest</span>. We are quite used to these type of services and building REST APIs on top: <span style="font-family: Courier New, Courier, monospace;">CustomerProfile</span> becomes a resource that we can query or interact with.<br />
<br />
On the other hand, a server's authority could be exposing an algorithm. For example, tokenisation of text is a non-trivial problem that requires not only breaking the text to its words, but also maintaining muti-words and named entities intact. A REST API that exposes this functionality will be a <b>mill</b>.<br />
<br />
<h2>
5 constraints of a Mill</h2>
<h3>
1) It encapsulates an algorithm not a state</h3>
Which was discussed ad nauseum, however, the distinction is very important. For example let's say we have an algorithm that you provide the postcode and it returns to you houses within 1 mile radius of that postcode - this is <i>not</i> an example of a mill.<br />
<br />
<h3>
2) Raw data in, processed data out</h3>
For example you send your text and get back the translation.<br />
<br />
<h3>
3) Calls are both safe and idempotent</h3>
Calling the endpoint should not directly change any state within the server. For example, the endpoint should not be directly mapped to the ML training engine, e.g. sending a text 1000 times skew the trained model for that text. The training endpoint is usually a normal resource, not a mill - see my <a href="http://www.slideshare.net/AliKheyrollahi/topic-modelling-farsi-and-ap-is" target="_blank">slides</a><br />
<br />
<h3>
4) It has a single specialty</h3>
And as such, it accepts a single HTTP verb apart from OPTIONS, normally POST (although a GET with entity payload would be more semantically correct but not my preferred option for practical reasons).<br />
<br />
<h3>
5) It is named not as a verb but as a tool</h3>
A mill that exposes tokenisation, is to be called <i>tokeniser</i>. In a similar way, <i>classifier</i> would be the appropriate name for a system that classifies on top of a neural network, for example. Or normalising text, would have a <i>normaliser</i> mill.<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsvCWY1HUXfmloiRPf7XV_T4wUBVEKcBaehFZuWh1zmZMQtxj3ilWVEIy-euZnf696SvaMVHjKc0iUmnY5P3ApyFKwQyZ7a1AlSd-eezdldy10NyIG9K0QDGynEDHSHUR0UnEwiaygCTUi/s1600/lagfp.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="428" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsvCWY1HUXfmloiRPf7XV_T4wUBVEKcBaehFZuWh1zmZMQtxj3ilWVEIy-euZnf696SvaMVHjKc0iUmnY5P3ApyFKwQyZ7a1AlSd-eezdldy10NyIG9K0QDGynEDHSHUR0UnEwiaygCTUi/s640/lagfp.jpg" width="640" /></a></div>
<br />
No this is not the same as an RPC endpoint. No RPC smell. Honest :) That is why those 5 constraints exists.<br />
<br />
<br />
<br />aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com1tag:blogger.com,1999:blog-2889416825250254881.post-79471262325485191042015-04-22T22:11:00.000+01:002020-04-03T11:37:50.890+01:00Pilgrimage into the world of Tarkovsky: through the eyes of hope and suffering<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
[Level <a href="http://byterot.blogspot.co.uk/2012/03/post-level-description.html" target="_blank">N</a>]<br />
<br />
The world is not perfect. It has given us scientists, authors, artists and politicians - and I have lived enough to know none of them were really perfect. Among these, we have personal heroes, personalities that have made great discoveries, built wonderful things or have lived extraordinary lives. Whether it is Obama, Einstein or George Orwell, they have their deficiencies.<br />
<br />
I am saying this because the word <strong>Pilgrimage</strong> in the title can put you off. In fact it puts <em>me</em> off. But ... it is there for a reason, and I hope by the time you finish reading - if you hang on long enough - you would see it.<br />
<br />
<div style="text-align: center;">
* * * </div>
<br />
A stuttering boy who finally mutters a few words with no pause after a session of hypnotherapy, and then leading to a black screen of titles with the music of Bach, is not a typical opening scene. But this for me has been the most memorable opening among all the films I have seen. If you are looking to describe the body of work by the late director Tarkovsky, look no further, it is all there in the opening scene of <em><a href="http://en.wikipedia.org/wiki/The_Mirror_%281975_film%29" target="_blank">The Mirror (1975)</a></em>. This scene somehow encapsulates Tarkovsky's view of himself. A timid lad who can barely speak two words in sequence without constantly stuttering but with the help of "supernatural" powers can speak and tell us his stories. And the process is painful for him, it is only achieved with determination and sacrifice.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjg_AVsoOUvdLtOpLJQf6p-mt_8QsCDnw8RAI49d3ngbDSYjUzKOR-bu3ZEY_Z1sel2pOxK7c-pNmny4FRrDERPFC0mscVhXzUUHRg9xdreyggd13RSBbv-CW3a-GEiYKbe39q9jetcLpKZ/s1600/zerkalo2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjg_AVsoOUvdLtOpLJQf6p-mt_8QsCDnw8RAI49d3ngbDSYjUzKOR-bu3ZEY_Z1sel2pOxK7c-pNmny4FRrDERPFC0mscVhXzUUHRg9xdreyggd13RSBbv-CW3a-GEiYKbe39q9jetcLpKZ/s1600/zerkalo2.png" width="400" /></a></div>
<br />
<br />
<div style="text-align: center;">
* * * </div>
<br />
Stumbling a few times along the way, I find my way with difficulty through the aisles of the dark cinema. I think I have missed the first few minutes but that should be OK.<br />
<br />
I am lucky to be here. After queueing several hours in a cold sunny day on February 1988, I have managed to buy a ticket to Tarkovsky's <em><a href="http://en.wikipedia.org/wiki/Stalker_(1979_film)" target="_blank">Stalker (1979)</a></em> in <a href="http://en.wikipedia.org/wiki/Fajr_International_Film_Festival" target="_blank">Fajr Film Festival</a>. A special section of the Festival is dedicated to the memory of late Tarkovsky who died the previous year and they are showing all his films - with understandable cuts when it does not meet with "the code", at the end of the day Iran is run as an Islamic country. These are the films that intellectuals go to - and I should go to since I am planning to become one!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivxrp0v2A-p9UWQKZxp0l8p6m_2Y0RW72Ap8PZNMdLD3TIrQuPS9oUjcWNmC0qYvDn8LWFaQU17aR3m2GC60EG069sU0Yqyt1519OC9DnaQy66G7ayNl4X0HKn_NEqQk5In7SHAr1kkhdN/s1600/Still-from-Andrei-Tarkovs-001.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="384" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEivxrp0v2A-p9UWQKZxp0l8p6m_2Y0RW72Ap8PZNMdLD3TIrQuPS9oUjcWNmC0qYvDn8LWFaQU17aR3m2GC60EG069sU0Yqyt1519OC9DnaQy66G7ayNl4X0HKn_NEqQk5In7SHAr1kkhdN/s1600/Still-from-Andrei-Tarkovs-001.jpg" width="640" /></a></div>
<br />
<br />
And I sit there in the dark, watching this 220 minute epic where very few things actually happen. And the film is in fluent Russian with no subtitles!<br />
<br />
And through the confusion of barely knowing the storyline, and not getting any of the dialogues, as a young 19-year-old student, I am mesmerised. The film works its way through me, somehow, precipitates deep marks that are ingrained with me until this day. The film communicates with a strange language which I feel I have known but very remotely, as if in a previous life. It is hazy, sublime and next to impossible to translate to words.<br />
<br />
And next thing I know, I am sitting watching <em>Mirror</em> (this time it is the public screening and is translated) and incoherent images and storylets come and go, with apparently no relationship. And yet, by the end, I cannot control myself and my eyes are wet. And again, I have no explanation, when being accused of pretentious intellectualism or sentimentalist.<br />
<br />
My journey (or Pilgrimage) has started. These films, I have lived with. They grew with me, and gradually, over quarter of a century, made sense. And this post is about why and how.<br />
<br />
<div style="text-align: center;">
* * * </div>
<br />
It was not a coincident that in the same Fajr Festival 1988 there was a screening of Parajanov's <em>Colour of Pomegranates</em>. It is generally believed that films of Tarkovsky and Parajanov are very similar. Tarkovsky indeed was a fan of Parajanov works and I later found out they were in fact friends. I did manage to watch it later on the public screening but when it was even more bizarre, I did not quite like it.
Form is the vehicle to deliver the meaning and not the meaning itself. Parajanov felt overly concerned with form and while narrative and a story of love is there, the meaning did not quite live up to the novelty of the form - I don't know maybe I will think differently.<br />
<br />
Going back to Tarkovsky, "the meaning/message" is not easy to grasp. Commonly there are different interpretations and even it is said that his films are meant to take us to a personal journey to understand hence all interpretations are correct because they are true to ourselves - so post-modern!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOECjl52YLv8fjy5-7KeKsZvCyeBFm1T08gWwyqzg3Ln5cRicuQqGnzD-a4ggqbu26J-bMBuUdmJeXP2d2omgrlrsKQBbLXoVJEZFdtfmdxkMPNMJJPgGqztj9qlakHp083a5f9HJr8kHd/s1600/mirror-1.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="622" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhOECjl52YLv8fjy5-7KeKsZvCyeBFm1T08gWwyqzg3Ln5cRicuQqGnzD-a4ggqbu26J-bMBuUdmJeXP2d2omgrlrsKQBbLXoVJEZFdtfmdxkMPNMJJPgGqztj9qlakHp083a5f9HJr8kHd/s1600/mirror-1.jpg" width="640" /></a></div>
<br />
<br />
Did Tarkovsky hide specific messages for us to grasp in his often difficult and unusual films? Most works of art (and even more so for the music and modern art) are open to personal interpretation. Abstract paintings famously invite us to find our personal comprehension of the work of art. But how about Tarkovsky?<br />
<br />
Only he can answer us. And he did.<br />
<br />
<div style="text-align: center;">
* * * </div>
<br />
It is very rare for a director to uncover his tricks and spoil the meaning of his films in a book. Well, he did not quite do that in <a href="http://www.amazon.co.uk/Sculpting-Time-Reflections-Andrei-Tarkovsky/dp/0292776241" target="_blank">Sculpting in time</a> but he did reveal his vision of cinema as an art form. And more importantly, why he made his films. While for many, making film is a means of gaining fame, a career, or a vehicle to project one's intellectual viewpoints, or (as Tarkovsky refutes) as a means of self expression, <b>for Tarkovsky it was a selfless and painful endeavour to fulfil a responsibility he was trusted with.</b> While for some, making on average one film every 7 years means they were striving for perfection, for him it was painstakingly ensuring his duty in this world gets fulfilled.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhclG0uZRxaDDZD9mUJ4mviSXUPeC-ikGoiU0SlfIy_pHEFMqP2fj_kNdBNRmOmpUnRz3ffdPOgQal3D-Y4jcxNv_qENY0904QBNtQgxexlQnTEiMDqwnGMfU9m4pTfzXDKbI5XwCvY1_P3/s1600/andrei-rublev.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhclG0uZRxaDDZD9mUJ4mviSXUPeC-ikGoiU0SlfIy_pHEFMqP2fj_kNdBNRmOmpUnRz3ffdPOgQal3D-Y4jcxNv_qENY0904QBNtQgxexlQnTEiMDqwnGMfU9m4pTfzXDKbI5XwCvY1_P3/s1600/andrei-rublev.jpg" width="640" /></a></div>
<br />
<br />
What do we mean by responsibility? Hard to explain in words but easier to point you to his films. We get to meet Tarkovsky himself in his films - whom do you think <a href="http://en.wikipedia.org/wiki/Andrei_Rublev_(film)" target="_blank">Andrei Rublev</a> was then?! An artist monk, sick of the decadence of the world, taking a vow of silence only to understand at the end that he cannot forfeit his duty as an artist in order to stay pure. His work will involve suffering but that is the sacrifice he is meant to make. An artist is not free, despite the theories of modern art, artist is not solely responsible to himself and his art. Tarkovsky shunned the modern art:<br />
<blockquote>
"Modern art has taken a wrong turn in abandoning search for the meaning of existence in order to affirm the value of the individual for its own sake."</blockquote>
Tarkovsky sees the process of making art as a consummation of the artist for the cause - he called artists "sufferers". Artist is a martyr and artistic creation a sanctimonious sacrifice:<br />
<blockquote>
"Artistic creation demands that he 'perish utterly' in the full tragic sense of those words."</blockquote>
The word self-expression, this inner looking for fulfilment, utterly made him frustrated with the artistic culture of the day. Artist himself is the last person to gain from the artistic creation - very much like the character Stalker that could not benefit himself from "The Room", nor any of the other Stalkers.<br />
<blockquote>
"The artist is always a servant, and is perpetually trying to pay for the gift that has been given to him as if by miracle."</blockquote>
Also, the artist is not merely an intellectual concerned with the abstract notions of his art form, but he is an evangelist (in its literal meaning) making his art for everyone:<br />
<blockquote>
"Art addresses everybody, in the hope of making an impression, ... of winning people not by incontrovertible rational argument but through the spiritual energy with which the artist has charged the work."</blockquote>
And oh boy, that spiritual energy that sets you on fire, making you look for the answer - in my case for quarter a century, Now probably it makes a lot more sense to think of this man as a prophet.<br />
<br />
<div style="text-align: center;">
* * * </div>
<br />
Tarkovsky films are slow - for some, painfully slow. They contain many long takes, and this by itself does not signify a technique but it is a by-product of his vision and language for the cinema as an art form. This vision was used later by Bela Tarr, a true student of this vision.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpT5qbdpeK0myCYZQKUn2YHulB0my_IFCbqxAn5FbSsPilI_ZuHIB4nImpCAWX4b5NBclUXuPM73M56ixwlImrm1acmeQdOCUInTpH6lsbHzZqC4GRRSXOYhOBnoTEU4EsRqLFuXa6fWVE/s1600/stalk13.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpT5qbdpeK0myCYZQKUn2YHulB0my_IFCbqxAn5FbSsPilI_ZuHIB4nImpCAWX4b5NBclUXuPM73M56ixwlImrm1acmeQdOCUInTpH6lsbHzZqC4GRRSXOYhOBnoTEU4EsRqLFuXa6fWVE/s1600/stalk13.jpg" width="640" /></a></div>
<br />
<br />
On the surface, it could appear that this is a stylistic decision to come up with a unique formalism, a pretentious intellectual gesture. But Tarkovsky himself disdained pure experimentation to come up with a new formalism:<br />
<blockquote>
"People talk about experiment and search above all in relation to the avant-garde. But what does it mean? ... For the work of art carries within it an integral aesthetic and philosophical unity; it is an organism ... Can we talk of experiment in relation to the birth of a child? It is senseless and immoral."</blockquote>
And this again reminds of the burden of responsibility he felt in making his films. On the other hand, he is regarded as one of the proponents of "poetic cinema", a term that Tarkovsky himself find almost offensive:<br />
<blockquote>
"I find particularly irritating the pretensions of modern 'poetic cinema', which involves breaking off contact with fact and with time realism."</blockquote>
Tarkovsky talks of the works of arts that have inspired him and have shaped his artistic language. These range from late middle ages icons, Italian paintings of the renaissance period to the works of literature by Dostoevsky, Tolstoy, Goethe and finally to the films by Dovzhenko, Bresson and others. In an effort to describe an ideal piece of work, he brings an example from a relatively obscure painter of the renaissance period, whose painting had a deep effect on him. In contrast to Raphael's <a href="https://www.nationalgalleries.org/collection/artists-a-z/r/artist/raphael-raffaello-santi/object/the-virgin-and-child-the-bridgewater-madonna-ngl-065-46" target="_blank">"Virgin and Child"</a>, he was captivated by the inexplicability of the works of Carpaccio.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkwQ2o8IBycD4vQHkLWW0-ZlHXQyJ6di2jEC-ywX0ZIWAfmoCVlN4HnjQEA0gN9b00BV6veIh1_M4Eqle0CWF0TcbsGQVFpgZnM-ZJ_IeMIDkjTGi36wQ2XCUVrBG3IBrBHFgsRvb3Skj5/s1600/Vittore_Carpaccio_-_Preparation_of_Christ's_Tomb_-_Google_Art_Project.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="506" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkwQ2o8IBycD4vQHkLWW0-ZlHXQyJ6di2jEC-ywX0ZIWAfmoCVlN4HnjQEA0gN9b00BV6veIh1_M4Eqle0CWF0TcbsGQVFpgZnM-ZJ_IeMIDkjTGi36wQ2XCUVrBG3IBrBHFgsRvb3Skj5/s1600/Vittore_Carpaccio_-_Preparation_of_Christ's_Tomb_-_Google_Art_Project.jpg" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><i style="background-color: white; color: #333333; font-family: sans-serif; font-size: 20px; line-height: 36px; text-align: start;">Preparation of Christ's Tomb</i><span style="background-color: white; color: #333333; font-family: sans-serif; font-size: 20px; line-height: 36px;"> (1505) - Vittorio Carpaccio</span></td></tr>
</tbody></table>
<br />
<br />
Back to cinema, he believed that the ideal film is countless meters of celluloid capturing entire life of a person. This probably make it easier to understand why his films were usually longer than 2 hours and in case of <em>Stalker</em> 3 hours and 40 minutes! Tarkovsky believed that a work of art need to be true to life. And when we think of life, there are fast cuts and edits: it is a very long take.<br />
<blockquote>
"I want to make the point yet again that in film, every time, the first essential in any plastic composition ... is whether it is true to life."</blockquote>
Tarkovsky explains some of the techniques he used in order to make his scenes have a deeper impression on the viewer. These techniques move away from the cinematic languages of the time, from the cliches symbolisms of the common cinematic cinema. They usually enhance and magnify the <em>image</em> and make it imprint on our psyche. An example, is the scene from Mirror where the Doctor meets Mother and at the end of the scene a strong wind blows, making the Doctor look back towards the house.<br />
<br />
All in all, for Tarkovsky, a masterpiece is a work of art that you cannot remove anything from it without completely destroying the work. And that is exactly what he saw in the works of Carpaccio - a unity of that cannot be broken. As such, it is really difficult to pinpoint what makes the masterpiece exceptional as it is.<br />
<br />
<div style="text-align: center;">
* * * </div>
<br />
So where does Tarkovsky get his inspiration from? Where is his true role model?<br />
It might come as a surprise for some but Tarkovsky was a devout Christian. <a href="http://www.bfi.org.uk/features/tarkovsky/" target="_blank">He knew two of the gospels by heart</a> and would recite them in conversations. His book is full of quotations of the New Testament (<em>1 Corinthians</em> a favourite of him) and phrases that can only mean he truly believed. He was not after happiness (remember Stalker - the Black Dog of depression):<br />
<blockquote>
"Let us imagine for a moment that people have attained happiness ... Man becomes Beelzebub."</blockquote>
He saw a strong similarity between art and religion:<br />
<blockquote>
"In art, as in religion, intuition is tantamount to conviction, to faith. It is a state of mind, not a way of thinking."</blockquote>
He felt a deep connection in his role to that of an evangelist:<br />
<blockquote>
"Art ... expresses its own postulate of faith."</blockquote>
And his role model: of course, Jesus: Selfless sacrifice, Servant, For everyone, Winning people. All his films and writings points to him. It is not accidental that we hear John's<em> Revelation</em> in Stalker. As apocalyptic as the film is, this could have not been more literal. No accident that we meet God in Solaris (Ocean), or Stalker is so stricken by the lack of faith of others, and in <i>Sacrifice</i>, one can save everyone. <br />
<br />
Commonly people ask why he made his films so difficult under layers of meanings. Why? Exactly for the same reason Jesus, as a teacher, used parables to convey messages, not plainly.<br />
<br />
<div style="text-align: center;">
* * *<br />
<div>
<br /></div>
</div>
And my quest is not finished, but surely it has eased off. After believing in Jesus in 2001, I revisited Tarkovsky again lately. Now all symbols and meanings crystal clear. I feel very close to what he tried so hard to shape into images. It just makes sense.<br />
<br />
Messages and meanings ... and what are those? It will be clear by the process of your personal pilgrimage. And it could begin now...<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiU0_pxyEf_8KhlVf9INM14abbB5e3i4SPXvxhmu0n-sI4DVmHXiG9EJPbKrlQnXeHzRS1rOTtuPvnbpD27XVqpiy4rkmtqVqMsFku9pqTU-V83wR5j7u_z6COCwzqnle8EQtQqGqWka-tP/s1600/tumblr_m1xjsxMDnC1qf7r5lo1_500.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiU0_pxyEf_8KhlVf9INM14abbB5e3i4SPXvxhmu0n-sI4DVmHXiG9EJPbKrlQnXeHzRS1rOTtuPvnbpD27XVqpiy4rkmtqVqMsFku9pqTU-V83wR5j7u_z6COCwzqnle8EQtQqGqWka-tP/s1600/tumblr_m1xjsxMDnC1qf7r5lo1_500.jpg" width="456" /></a></div>
<br />
<br />aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com3tag:blogger.com,1999:blog-2889416825250254881.post-5473121936743362142015-04-03T17:00:00.000+01:002015-04-13T15:19:21.134+01:00Utilisation and High Availability analysis: Containers for Microservices<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
Microservices? Is this not the same SOA principles repackaged and sold under different label? Not this time, I will attend this question in another posts. But if you are considering Microservices for your architecture, beware of the cost and availability concerns. In this post we will look at how using containers (such as Docker) can help you improve your cloud utilisation, decrease costs and above all improve availability.<br />
<br />
<h2>
Elephant in the room: most of the cloud resources are under-utilised</h2>
We almost universally underestimate how long it takes to build a software feature. Not sure it is because our time is felt more precious than money, but for hardware almost always the reverse is true: we always <b>over</b>estimate hardware requirements of our systems. Historically this could have been useful since commissioning hardware in enterprises usually a long and painful process and on the other hand, this included business growth over the years and planned contingency for spikes.<br />
But in an elastic environment such as cloud? Well it seems we still do that. In UK alone<a href="http://blog.flux7.com/unused-cloud-infrastructure-could-be-draining-business-resources" target="_blank"> £1bn</a> is wasted on unused or under-utilised cloud resource.<br />
<br />
Some of this is avoidable, by using elasticity of the cloud and scaling up and down as needed. Many cloud vendors provide such functionality out of the box with little or no coding<b>. But many companies already do that,</b> so why waste is so high?<br />
<br />
From personal experience I can give you a few reasons why my systems do that...<br />
<br />
<h3>
Instance Redundancy</h3>
Redundancy is one of the biggest killers in the computing costs. And things do not change a lot being in the cloud: vendors' availability SLAs usually are defined in a context of redundancy and to be frank, some of it purely cloud related. For example, on Azure you need to have your VMs in an "availability set" to qualify for VM SLAs. In other words, at least 2 or more VMs are needed since your VMs could be taken for patching at any time but within an availability zone this is guaranteed not to happen on all machines in the same availability zone at the same time.<br />
<br />
The problem is, unless you are company with massive number of customers, even a small instance VM could suffice for your needs - or even for a big company with many internal services, some services might not need big resource allocation.<br />
<br />
Looking from another angle, adopting Microservices will mean you can iterate your services more quickly releasing more often. The catch is, the clients will not be able to upgrade at the same time and you have to be prepared to run <strong>multiple versions</strong> of the same service/microservice. Old versions of the API cannot be decommissioned until all clients are weaned off the old one and moved to the newer versions. Translation? Well some of your versions will have to run on the shoestring budget to justify their existence.<br />
<br />
Containerisation helps you to tap into this resource, reducing the cost by running multiple services on the same VM. A system usually requires at least 2 or 3 active instances - allowing for redundancy. Small services loaded into containers can be co-located on the same instances allowing for higher utilisation of the resources and reduction of cost.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgM9jWvEsbzp23VK68Xdh59qamR1or9HZwQEmZ3BIirT3UssogOYOXIM5xnAXSH143ahcu0jP78mTWGKEVEFZCFsunmybFvb_jU_V5JscbuEglW1egKjPWnqptj7rr_GZNjypB_dHL5tUFZ/s1600/serviceA.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgM9jWvEsbzp23VK68Xdh59qamR1or9HZwQEmZ3BIirT3UssogOYOXIM5xnAXSH143ahcu0jP78mTWGKEVEFZCFsunmybFvb_jU_V5JscbuEglW1egKjPWnqptj7rr_GZNjypB_dHL5tUFZ/s1600/serviceA.png" height="248" width="640" /></a></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjR4OM4ewvOv8Hn-K_TzW-XfiPpZo5AhW2yftTFm22HSxdArkosRL_r3JVC4veTDEDBgI53ZXxIYN-QBanjlqSs0ZSy4AtuAGz97aepQjlS7pOw67KxYsV1-YZ9NnYDHPaLDkXUHv1hkkl/s1600/serviceA+serviceB_mixed.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjR4OM4ewvOv8Hn-K_TzW-XfiPpZo5AhW2yftTFm22HSxdArkosRL_r3JVC4veTDEDBgI53ZXxIYN-QBanjlqSs0ZSy4AtuAGz97aepQjlS7pOw67KxYsV1-YZ9NnYDHPaLDkXUHv1hkkl/s1600/serviceA+serviceB_mixed.png" height="264" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Improved utilisation by service co-location</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br />
<br />
This ain't rocket science...<br />
<br />
<h3>
Resource Redundancy</h3>
Most services have different resource requirements. Whether Network, Disk, CPU or memory, some resources are used more heavily that others. A service encapsulating an algorithm will be mainly CPU-heavy while an HTTP API could benefit from local caching of resources. While cloud vendors provide different VM setups that can be geared for memory, Disk IO or CPU, a system still usually leaves a lot of redundant resources.<br />
<br />
Possible best explained in the pictures below. No rocket science here either but mixing services that have different resource allocation profiles gives us best utilisation.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbtVBDfs8kjfLYaRxCgKuX0qGjA9qoPN2cX2bBr69pFyta0YCC6bDYY7FRTAQAMzvGDU4S7QerBX3GyniJW7mKPBMV7Tu6GU_Tw6h_3MAu_WMtd7RH_lnLpltp4AcURpXq5za51eAEOXqw/s1600/CPU+memory_separate.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbtVBDfs8kjfLYaRxCgKuX0qGjA9qoPN2cX2bBr69pFyta0YCC6bDYY7FRTAQAMzvGDU4S7QerBX3GyniJW7mKPBMV7Tu6GU_Tw6h_3MAu_WMtd7RH_lnLpltp4AcURpXq5za51eAEOXqw/s1600/CPU+memory_separate.png" height="289" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUXvGZzftpsfdiuABKXNT69pOsso4cX7fnSX-1SST1dPb4VAU77msqNaapbu7AI8osckrKg3kvpCIF38gbT0vARVKu4NfR2I8m4BZMAmDWjCpyg6m5gJFyPno8Hpq2PTZ_WXd4tGZ7RAuf/s1600/CPU+memory.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUXvGZzftpsfdiuABKXNT69pOsso4cX7fnSX-1SST1dPb4VAU77msqNaapbu7AI8osckrKg3kvpCIF38gbT0vARVKu4NfR2I8m4BZMAmDWjCpyg6m5gJFyPno8Hpq2PTZ_WXd4tGZ7RAuf/s1600/CPU+memory.png" height="318" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Co-location of Microservices having different resource allocation profile</td></tr>
</tbody></table>
<br />
<br />
<h3>
And what's that got to do with Microservices?</h3>
Didn't you just see it?! Building smaller services pushes you towards building ad deploying <strong>more</strong> services many of which need the High Availability provided by the redundancy but not the price tag associated with it.<br />
<br />
Docker is absolutely a must-have if you are doing Microservices or you are paying through the nose for your cloud costs. In QCon London 2015, <a href="http://qconlondon.com/keynote/cluster-management-google">John Wilkes</a> from Google <a href="http://qconlondon.com/system/files/keynotes-slides/2015-03%20QCon%20%28john%20wilkes%29.pdf">explained</a> how they "start over 2 billion containers per week". In fact, to be able to take advantage of the spare resources on the VMs, they tend to mix their <em>Production</em> and <em>Batch</em> processes. One difference here is that the Live processes require <em>locked allocated resources</em> while the Batch processes take whatever is left. They analysed the optimum percentages minimising the errors while keeping utilisation high.<br />
<br />
<h2>
Containerisation and availability</h2>
As we discussed, optimising utilisation becomes a big problem when you have many many services - and their multiple versions - to run. But what would that mean in terms of Availability? Does containerisation improve or hinder your availability metrics? I have not been able to find much in the literature but as I will explain below, even if you do not have small services requiring VM co-location, you are better off co-locating and spreading the service onto more machines. And it even helps you achieve <em>higher utilisation</em>.<br />
<br />
By having spreading your architecture to more Microservices, availability of your overall service (the one the customer sees) is a factor of availability of each Microservice. For instance, if you have 10 Microservices with availability of 4 9s (99.99%), the overall availability drops to 3 9s (99.9%). And if you have 100 Microservice, which is not uncommon, obviously this drops to only two 9s (99%). In this term, you would need to strive for a very high Microservice availability.<br />
<br />
Hardware failure is very common and for many components it goes above 1% (Annualised Failure Rate). Defining hardware and platform availability in respect to system availability is not very easy. But for simplicity and the purpose of this study, let's assume failure risk of 1% - at the end of the day our resultant downtime will scale accordingly. <br />
<br />
If service A is deployed onto 3 VMs, and one VM goes down (1%), other two instances will have to bear the extra load until another instance is spawned - which will take some time. The capacity planning can leave enough spare resources to deal with this situation but if two VMs go down (0.01%), it will most likely bring down the service as it would not be able cope with the extra load. If the <a href="http://en.wikipedia.org/wiki/Mean_time_to_recovery" target="_blank">Mean Time to Recovery</a> is 20 minutes, this alone will dent your service Microservice availability by around <b>half of 4 9s! </b>If you have worked hard in this field, you would know how difficult it is to gain those 9s and losing them like that is not an option.<br />
<br />
So what's the solution? This diagram should speak for more words:<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3Bb8k4Xu_aGTeCcQRBzG4vc0F4J5dcDf9CSwwAEn7Zb_YWcbu9KB3OO-3nXTN-KZPSpGaGMBK7eqidKoNTD6gGzshNq4fgH4x7E1DFKaTXjkZ4X6rXcorwbP16ssrdD5aOTBJzSaJzwOC/s1600/co-located-avail.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg3Bb8k4Xu_aGTeCcQRBzG4vc0F4J5dcDf9CSwwAEn7Zb_YWcbu9KB3OO-3nXTN-KZPSpGaGMBK7eqidKoNTD6gGzshNq4fgH4x7E1DFKaTXjkZ4X6rXcorwbP16ssrdD5aOTBJzSaJzwOC/s1600/co-located-avail.png" height="300" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Service A and B co-located in containers, can tolerate more VM failures</td></tr>
</tbody></table>
<br />
By using containers and co-locating services, we spread instance more thinly and can tolerate more failures. In the example above, our services can tolerate 2 or maybe even 3 VM failures at the same time.<br />
<br />
<h2>
Conclusion</h2>
Containerisation (or Docker if you will) is a must if you are considering Microservices. It helps you with increasing utilisation, bringing down cloud costs and above all, improves your availability.<br />
<br />
<br />aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com2tag:blogger.com,1999:blog-2889416825250254881.post-74578778235338533072015-03-10T22:59:00.001+00:002015-03-11T11:52:28.226+00:00QCon London 2015: from hype to trendsetting - Part 1<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
Level [<a href="http://byterot.blogspot.co.uk/2012/03/post-level-description.html" target="_blank">C3</a>]<br />
<br />
This year I could make it to the <strong>QCon London</strong> and I felt it might be useful to write up a summary for those who liked to be there but did not make it for any reason. This will also an opportunity to get my head together and summarise a couple of themes, inspired by the conference.<br />
<br />
Quality of the talks was varied and initially pretty disappointing on the first day but rose to a real high on the last day. Not surprisingly, <strong>Microservices</strong> and <strong>Docker</strong> were the buzzwords of the conference and many of the talks had one or the other in their title. It was as if, the hungry folks were being presented Microservices with ketchup and next it would be with Mayonnaise and yet nothing as good as Docker with Salsa. In fact it is very easy to be skeptic and sarcastic about Microservices or Docker and disregard them as a pure hype.<br />
<br />
After listening to the talks, especially ones on the last day, I was convinced that <em>with or without me, this train is set to take the industry forward.</em> Yes, granularity of the Microservices (MS) have not been crisply defined yet, and there is a stampede to download and install Microservices on old systems and fix the world. Industry will abuse it as it reduced SOA to Web Services by adding just a P to the end. Yes, there are very few people talking about the cost of moving to MS and explore the cases where you should stay put. But if your Monolith (even though pays lip service to SOA) has ground the development cycle to a halt and is killing you and your company, there is a thing or two to learn here.<br />
<br />
<em>Disclaimer: This post by no means is a comprehensive account of the conference. This is my personal take on QCon London 2015 and topics discussed, peppered with some of my own views, presented as a technical writing.</em><br />
<h2 id="microservices">
Microservices</h2>
Yeah I know you are fed up with hearing the word - but bear with me for a few minutes. Microservices reminded me of my past life: it is a <i>syndrome</i>. A medical syndrome when it is first being described, does not have to have the Aetiology and Pathophysiology all clear and explained - it is just a syndrome, a collection of signs and symptoms that occur together. In the medical world, there could be years between describing a syndrome and finding what and why.<br />
<br />
And this is what we are dealing here within a different discipline: Microservice is an emerging pattern, a solution to a contextual problem that has indeed occurred. It is a phenomenon that we are still trying to figure out - a lot of head scratching is going on. So bear with it and I think we are for a good ride beyond all the hype.<br />
<br />
Its two basic benefits are mainly: smaller deployment granularity enabling you to iterate faster and smaller domain to focus, understand and improve. For me the first is the key.<br />
<br />
So here are a breakdown of few key aspects of the Microservices.<br />
<br />
<h3 id="conway-conway-where-art-thou">
Conway, Conway, Where Art Thou</h3>
A re-occurring theme (and at points, ad nauseum) was that MS is the result of reversing cause and effect in the <a href="http://en.wikipedia.org/wiki/Conway%27s_law">Conway's law</a> and using it to your advantage: build smaller teams and your software will shape like it. So in essence, turning Conway's law on its head and use it as a tool to naturally result in a more loosely coupled architecture.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://s.quickmeme.com/img/78/78753be30bb93042f2b121bbd95da110e20d982e56ed3e74ff018558ce9e15e8.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://s.quickmeme.com/img/78/78753be30bb93042f2b121bbd95da110e20d982e56ed3e74ff018558ce9e15e8.jpg" height="361" width="400" /></a></div>
<br />
<br />
This by no means is new, Amazon has been doing this for a decade. Size of the teams are nicely defined by Jeff Bezos as "Two Pizza Teams". But what is the makeup of these teams and how do they operate? As again described by Amazon, they are made up of elements of a small company, a start-up, including developers, testers, BA, business representative and more importantly operations, aka Devops.<br />
<br />
Another point stressed by <a href="http://qconlondon.com/speakers/yoni-goldberg">Yoni Goldberg</a> from Gilt and Andy Shoup was that the teams charge other teams for using their services and need to look after their finances. They found that doing this reduced costs of the team by 90% - mainly due to optimising cloud and computing costs.<br />
<br />
<h3 id="granularity-fits-in-my-head-does-it-">
Granularity: "fits in my head" (does it?)</h3>
One of the key challenges of Microservices was to define the granularity of a Microservice differentiating it from the traditional SOA. And it seems we have now up a definition: <a href="http://qconlondon.com/system/files/presentation-slides/Microservices.pdf" target="_blank">"its complexity fits one's head"</a>.<br />
<br />
What? This to me is a non-definition and on any account, it is a poor definition (sorry Dan). After all, there is nothing more subjective than what fits one's head, is it? And whose head by the way? if it is me, I cannot keep track of what I ate for breakfast and lunch at the same time (if you know me personally, you must have noticed my small head) and then we get those giants that can master several disciplines or understand the whole of an uber-domain.<br />
<br />
One of the key properties of a good definition is that it is tangible, unambiguous and objectively prescriptive. Jeff Bezos was not necessarily a Pizza lover to use it to define Amazon team sizes.<br />
<br />
<b>In the absence of any tangible definition, I am going to bring my own</b> - why not? This is really how I feel like the granularity of a MS should be, having built one or two, and I am using tangible metrics to define it.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheYjCoZxvl6EYF0D94XqD1XQZOf95Sr-2cxVsQIGxdbr_CdDkldtt0pKnV_My5IwQIwOjUPQvki4dFHoW46WpEGee8KSNhGi7Bm8RyJ8f7wHqsIf3FSjiCW2oZqzVP1OPatV4DLoutKvLg/s1600/ms.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheYjCoZxvl6EYF0D94XqD1XQZOf95Sr-2cxVsQIGxdbr_CdDkldtt0pKnV_My5IwQIwOjUPQvki4dFHoW46WpEGee8KSNhGi7Bm8RyJ8f7wHqsIf3FSjiCW2oZqzVP1OPatV4DLoutKvLg/s1600/ms.png" height="120" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Granularity of Microservices - my definition</td></tr>
</tbody></table>
<br />
As it is evident, Cross-cutting concerns of a Microservice are numerous. From security, availability, performance to routing, versioning, discovery, logging and monitoring. For a lot of these concerns, you can rely on the existing platform or common platform-wide guidelines, tools and infrastructure. So the <b>crux of the sizing of the Microservice is its core business functionality</b>, otherwise with regard to non-functional requirements, it would share the same concerns as traditional services.<br />
<br />
<h3 id="when-not-to-microservice">
When not to Microservice</h3>
<a href="http://qconlondon.com/speakers/yoni-goldberg">Yoni Goldberg</a> from Gilt covered this subject to some level. He basically said do not start with Microservice, build them when your domain complexity warrants it. He went through his own experience and how they improved upon the ball of mud to nice discreet service and then how they exploded the number of services when their<br />
So takeaways (with some personal salt and pepper) I would say is do NOT consider Microservice if:<br />
<ul>
<li>you do not have the organisation structure (small cross functional teams)</li>
<li>you are not practising Devops, automated build and deployment</li>
<li>you do not have (or cannot have) an uber monitoring system telling you exactly what is happening</li>
<li>you have to carry along a legacy database</li>
<li>your domain is not too big</li>
</ul>
<h3 id="microservices-is-an-evolutionary-process">
Microservices is an evolutionary process</h3>
<a href="https://www.linkedin.com/in/randyshoup">Randy Shoup</a> explained how the process towards Microservice has been an evolutionary one, usually starting with the Monolith. So he stressed <a href="http://qconlondon.com/system/files/presentation-slides/QConLondon2015-ServiceArchitecturesAtScale.pdf" target="_blank">"Evolution, not intelligent design"</a> and how in such an environment, Governance (oh yeah, all ye Enterprise Architects listen up) is not the same as traditional SOA and is decentralised with its adoption purely based on how useful a practice/ is.<br />
<br />
<h3 id="microservices-is-an-evolutionary-process">
Optimised message protocols now a must</h3>
<div>
Frequented in more than a couple of talks, moving to ProtoBuf, Avro, Thrift or similar seems to be a must in all but trivial Microservice setups. One of the main performance challenges in MS scenarios is network latency and cost of serialisation/deserialisation over and over across multiple hops and <b>JSON simply does not cut it anymore</b>. </div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://devres.zoomquiet.io/data/20091111011019/chart" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://devres.zoomquiet.io/data/20091111011019/chart" height="362" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Source: Thrift vs Protobuf comparison (http://devres.zoomquiet.io/data/20091111011019/index.html)</td></tr>
</tbody></table>
<div>
Be ready to move your APIs to these message protocols - yes you lose some simplicity benefits but trading it off for performance is always a necessary evil to make. Rest assured nothing stops you to use JSON while developing and testing, but if your game is serious, start changing your protocols now - and I am too, item already added to the technical backlog.</div>
<div>
<br /></div>
<div>
<h3 id="microservices-is-an-evolutionary-process">
What I was hoping to hear about and did not</h3>
</div>
Microservice registry and versioning best practices was not mentioned at all. I tried to quiz a few speakers on these but did not quite get a good answer. I suppose the space is open for grab.<br />
<br />
<h2 id="need-for-composition-services-apis">
Need for Composition Services/APIs</h2>
As experienced personally, in an MS environment you would end up with two different types of services: <strong>Functional Microservice</strong> where they own their data and are the authority in their business domain and <strong>Composition APIs</strong> which do not normally own any data and bring value by composing data from several other services - normally involving some level of business logic affecting the end user. In DDD terms, you could somehow find similarity with Facade services and Yoni used the word "mid-tier services".<br />
<br />
Composition services can bring a lot of value when it comes to caching, pagination of data and enriching the information. They practically scatter the requests and gather the results back and compose the result - Fan-out is another term used here.<br />
<br />
By inherently depending on many services, they are notoriously susceptible to <strong>performance outliers</strong> (will be discussed in the second post) and failure scenarios which might warrant a layered cache backed by soft storage with a higher expiry for fallback in case dependent service is down.<br />
<br />
In the next post, we will look into topics below. We will discover why Docker in fact is closely related to the Microservices - and it is not what you think! [Do I qualify now to become a BusinessInsider journalist?]<br />
<ul>
<li>Those pesky performance outliers</li>
<li>Containers, containers</li>
<li>Don't beat the dead Agile</li>
<li>Extra large memory computing is now a thing</li>
</ul>
aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com3tag:blogger.com,1999:blog-2889416825250254881.post-82707114750190730002015-01-01T16:54:00.002+00:002017-06-28T10:12:16.053+01:00Future of Programming - Rise of the Scientific Programmer (and fall of the craftsman)<script src="http://softxnet.co.uk/sh/js/shcore.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushjscript.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushcsharp.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
<script type="text/javascript">
SyntaxHighlighter.all();
</script>
Level [<a href="http://byterot.blogspot.co.uk/2012/03/post-level-description.html" target="_blank">C3</a>]<br />
<br />
[<i>Disclaimer: I am by no means a Scientific Programmer but I am striving to become one</i>] It is the turn of yet another year and the time is ripe for the last year reviews, predictions for the new year and its resolutions. Last year I <a href="http://byterot.blogspot.co.uk/2013/12/thank-you-microsoft-and-so-long.html" target="_blank">made</a> some bold statements and made some radical decisions to start transitioning. I picked up a Mac, learnt some Python and Bash and a year on, I think it was good and really enjoyed it. Still (as I predicted), I spent most of my time writing C#. [working on a <a href="http://www.infoq.com/articles/reactive-cloud-actors" target="_blank">Reactive Cloud Actor</a> micro-<a href="https://github.com/aliostad/BeeHive" target="_blank">Framework</a>, in case for any reason it interests you]. Now a year on, Microsoft is a different company: new CEO, moving towards Open Source and embracing non-Windows operating systems. So how it is going to shift the innovation imbalance is a wait-and-see. But anyway, that was last year and is behind us.<br />
<br />
Now let's talk about 2015. And perhaps programming in general. Are you sick of hearing Big Data buzzwords? Do you believe Data Science is a pile of mumbo jumbo to bamboozle us and actually used by a teeny tiny number of companies, and producing value even less? IoT is just another hype? I hope by reading the below, I would have been able to answer you. <i>Sorry, no TL;DR</i><br />
<br />
<div style="text-align: center;">
* * *</div>
<br />
It was a warm, sunny and all around really nice day in June. The year is 2007 and I am on a University day trip (and punting) to Cambridge along with my classmates many of whom are at least 15 years younger than me. Punting is fun but as a part time student this is one of the few times I have a leisurely access to our Image Processing lecturer - a bright and young guy - again younger than me. And I open the discussion with <i>how we have not moved much since the 80s in the field of Artificial Intelligence</i>. We improve and optimise algorithms but there is no game-changing giant leap. And he argues the state of the art usually improves little by little.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGRcdRs1fcckyOlAz6au5TFOmxA_TRyKz2xbs0aIC6nvkmMCSCcY8U7zJ8ilHJGYwG28GogTb6mPCVmDLD_z7QokVPupUcwFZO1MqwKOX2Kk43YDQhPIUcorN0aFN_QmLPGJez9BupSjZN/s1600/229193_4455157222_8368_n.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGRcdRs1fcckyOlAz6au5TFOmxA_TRyKz2xbs0aIC6nvkmMCSCcY8U7zJ8ilHJGYwG28GogTb6mPCVmDLD_z7QokVPupUcwFZO1MqwKOX2Kk43YDQhPIUcorN0aFN_QmLPGJez9BupSjZN/s1600/229193_4455157222_8368_n.jpg" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
"Day out punting in cambridge"</div>
<br />
Next year, <a href="http://link.springer.com/article/10.1007/s00138-010-0289-5" target="_blank">we work on a project</a> involving some machine learning to recognise road markings. I spend a lot of time on feature extraction and use a 2 layer Neural Network since I get the best result out of it compared to 3. I am told not to use many layers of neurons as it usually gets stuck on a local minima during training - I actually tried and saw it. Overall the result was <i>OK</i> but it involved many pre- and post- processing techniques to achieve acceptable recognition.<br />
<br />
<div style="text-align: center;">
* * *</div>
<div>
<br /></div>
I wake up and it is 2014. Many Universities, research organisations (and companies) across the world have successfully implemented Deep Learning using Deep Neural Networks - which have many layers of neurons. <a href="http://www.newscientist.com/article/dn26691-optical-illusions-fool-computers-into-seeing-things.html#.VKS0EYqsXxi" target="_blank">Watson</a> answers all the questions in Double Jeopardy. Object Recognition from image is almost a solved case - with essentially no feature extraction.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://www.ais.uni-bonn.de/deep_learning/images/Convolutional_NN.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://www.ais.uni-bonn.de/deep_learning/images/Convolutional_NN.jpg" height="324" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">A Deep Neural Network</td></tr>
</tbody></table>
Perhaps my lecturer was right: with improving training algorithms and providing many many labeled data, we suddenly have a big leap in science (or was I right?!). It seems that for the first time implementation has got ahead of the mathematics: we do not fully understand why Deep Learning works - but it works. And <a href="http://www.newscientist.com/article/dn26691-optical-illusions-fool-computers-into-seeing-things.html#.VKS0EYqsXxi" target="_blank">when they fail</a>, we still don't know why they fail.<br />
<br />
And guess what, industry and the academia have not been this close for a long time.<br />
<br />
And what has all this got to do with us? Rise of the machine intelligence is going to change programming. Forever.<br />
<br />
<div style="text-align: center;">
* * *</div>
<div>
<br /></div>
Honestly, I am sick of the amount of bickering and fanboyism that goes today in the programming world. The culture of "nah... I don't like this" or "ahhh... that is s..t" or "ah that is a killer" is what has plagued our community. One day Angular is super hot next week it is the worst thing. Be it zsh or Bash. Be in vim vs. Emacs vs. Sublime Text vs Visual Studio. Be it Ruby, Node.js, Scala, Java, C#, you name it. And same goes for technologies such as MongoDB, Redis... subjectivism instead of facts. As if we forgot we came from the line of scientists.<br />
<br />
Like children we get attached to new toys and with the attention span of a goldfish, instead of solving real world problems, ruminate over on how we can improve our coding experience. We are ninjas and what we do no one can do. And we can do whatever we want to do.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://i2.cdnds.net/12/39/618x419/screen-shot-2012-09-26-at-150053.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://i2.cdnds.net/12/39/618x419/screen-shot-2012-09-26-at-150053.jpg" height="270" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">"I have got power"</td></tr>
</tbody></table>
<br />
Yes, we are lucky. A 23-year old kid with a couple of years of programming experience can earn double of what a 45-year old retail manager with 20 years of experience earns annually. And what we do with that money? spend all of it on booze, specialty burgers, travelling and conferences, gadgets - basically whatever we want to.<br />
<br />
But those who remember the first .com crash, can tell you it has not always been like this. In fact, back in 2001-2002 it was really hard to get a job. And the problem was, there were many really good candidates. IT industry became almost impenetrable since there was this catch-22 of requiring job experience to get the job experience. But anyway, the good ones, the stubborn ones and those with little talent but a lot of passion (includes me) stayed on for the good days that we have now. Reality was many programmers of the time had read "Access in 24 hours" and landed a fat salary in a big company. And on the other hand, projects were failing since we spent most of our time writing documentation. The industry had to weed out bad coders and inefficient practices.<br />
<br />
And we have software craftsmanship movement and agile practices.<br />
<div style="-webkit-text-stroke-width: 0px; color: black; font-family: Times; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;">
</div>
<br />
<div style="-webkit-text-stroke-width: 0px; color: black; font-family: Times; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;">
<div style="margin: 0px;">
<div style="text-align: center;">
* * *</div>
</div>
</div>
<div style="text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
The opposition has already started. You might have seen <a href="https://www.youtube.com/watch?v=z9quxZsLcfo" target="_blank">discussions</a> <a href="https://twitter.com/dhh" target="_blank">DHH</a> has had with Kent Beck and Martin Fowler on TDD. I do not agree 100% with Erik Meijer says <a href="https://vimeo.com/110554082" target="_blank">here</a> (only 90%) but there is a lot of truth in it. We have replaced <i>fact-based</i> <i>data-backed</i> attitude with a <i>faith-based</i> wishy-washy peace-hug-freedom hippie agile way, forcing us mechanically to follow some steps and believe that it will be good for us. Agile has taken us a long way from where we started at the turn of the century, but there are problems. From personal experience, I see no difference in the quality of developers who do TDD and do not. And to be frank, I actually see negative effect, people who do TDD do not fully think hard about the consequence of the code they write - I know this could be inflammatory but hand on heart, that is my experience. I think TDD and agile has given us a <b>safety net</b> that as a tightrope walker, instead of focusing on our walking technique, we improve the safety net. As long as we <b>do the motions</b>, we are safe. Unit tests, coverage, planning poker, retrospective, definition of done, Story, task, creating tickets, moving tickets. How many bad programmers have you seen that are masters of agile?<br />
<br />
You know what? It is the mediocrity we have been against all the time. Mediocre developers who in the first .com boom got into the market by taking a class or reading a book are back in a different shape: those who know how to be opinionated, look cool, play the game and take the paycheck. We are in another .com boom now, and if there is a crash, sadly they are out - <i>even if it includes me</i>.<br />
<br />
<a href="https://www.blogger.com/"></a><span id="goog_1921488442"></span><span id="goog_1921488443"></span><br />
<div style="-webkit-text-stroke-width: 0px; color: black; font-family: Times; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px;">
<div style="margin: 0px;">
<div style="text-align: center;">
* * *</div>
<div>
<br /></div>
</div>
</div>
I think we have neglected the scientific side of our jobs. Our maths is rusty and those who did study CompSci do not remember a lot of what they read. We cannot calculate the complexity of our code and fall to the trap that machines are fast now - yes it didn't matter for a time but when you are dealing with petabytes of data and pay by processing hours? When our team first started working on recommendations, the naive implementation took 1000 node for 2 days, now the implementation uses 24 nodes for a few hours, and perhaps this is still way way too much.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://andersdrachen.files.wordpress.com/2011/02/world-of-warcraft-brushes-13.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="400" data-original-width="350" height="400" src="https://andersdrachen.files.wordpress.com/2011/02/world-of-warcraft-brushes-13.jpg" width="350" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">"we are craftsmen and craftswomen" (from <a href="https://andersdrachen.com/2013/09/11/fun-facts-about-world-of-warcraft-character-names/" target="_blank">Anders Drachen</a>)</td></tr>
</tbody></table>
<br />
<br />
But really, since when did our job look like a craftsman (a carpenter)? We are Ninjas? And we do code Kata to keep our skills/swords sharp. This is all gone too far into the world of fantasy. The world of warcraft. This is now a New Age full-blown religion.<br />
<br />
What an utter rubbish.<br />
<br />
<div style="text-align: center;">
* * *</div>
<div>
<br /></div>
<div>
Now back on earth, languages of the 90s and early 2000 are on the decline. Java, C#, C++ all on the decline. But they are being replaced by other languages such as Scala right? I leave that to you to decide based on the diagram below. </div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCW1oiNf9IL_9OEd6T6oACJynOATqvu0ecykX2jwrOnod5eAszBkGm2hMcocYlkVD6QIhMbxFzCMVtyOZ0rruFyC-1XhIMeZfvtX9k4KEXVX9hjBUPGl9dCa1MWL7Cgps_RpcJEY81fIuP/s1600/Languages.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="204" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCW1oiNf9IL_9OEd6T6oACJynOATqvu0ecykX2jwrOnod5eAszBkGm2hMcocYlkVD6QIhMbxFzCMVtyOZ0rruFyC-1XhIMeZfvtX9k4KEXVX9hjBUPGl9dCa1MWL7Cgps_RpcJEY81fIuP/s1600/Languages.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Google trends of "Java", "Scala", "C#" and "Python Programming" (so that it does not get mixed up with Python the <i>snake</i>) - source: google</td></tr>
</tbody></table>
<div>
The only counter trend is Python. The recent rise in Python popularity is what I call "rise of the scientific programmer" - and that is just one of the signs. Python is a very popular language in the academic space. It is easy to pick up works everywhere and has some functional aspects making it terse. But that is not all: it sits on top of a huge wealth of scientific libraries and it can talk to Java and C as well. Industry innovations have started to come straight from the Universities. From the early 2000s where the academia seemed completely irrelevant to now where it leads the innovation. <a href="https://spark.apache.org/docs/0.9.0/python-programming-guide.html" target="_blank">PySpark</a> has come fully from the heart of Berkeley's University. Many of the contributors to Hadoop code and its wide ecosystem are in the academia.</div>
<div>
<br /></div>
<div>
We are now in need of people who can scientifically argue about algorithms and data (is coding anything but code+data?) and most of them could implement an algorithm given the paper or mathematical notation. And guess what, this is the trend for jobs with "Machine Learning":</div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3R2noeMbMb-_oBbq7JKoCU6qGR6wvDNkv614KNn_ShsvxS300hrl3Kp6jOSipp0uOMD_AUdlKQs-f4VYURjSeOCGlu-qGGZ2bXoXSg2yU3-Opx0xAY1SoU043tD-7vL4ExQwXZd9z7pvn/s1600/ml.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="268" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3R2noeMbMb-_oBbq7JKoCU6qGR6wvDNkv614KNn_ShsvxS300hrl3Kp6jOSipp0uOMD_AUdlKQs-f4VYURjSeOCGlu-qGGZ2bXoXSg2yU3-Opx0xAY1SoU043tD-7vL4ExQwXZd9z7pvn/s1600/ml.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Trend of jobs containing "<b>Machine Learning</b>" - Source: <a href="http://www.itjobswatch.co.uk/jobs/uk/machine%20learning.do" target="_blank">ITJobsWatch</a></td></tr>
</tbody></table>
<div>
<br /></div>
<div>
And this is really not just Hadoop. According to the source <a href="http://www.itjobswatch.co.uk/jobs/uk/machine%20learning.do" target="_blank">above</a> Machine learning jobs have had 41% rise from 2013 to 2014 while <a href="http://www.itjobswatch.co.uk/jobs/london/hadoop.do" target="_blank">hadoop</a> jobs had only 16%.</div>
<br />
This Deep Learning thing is real. It is already here. All those existing algorithms need to be polished and integrated with the new concepts and some will be just replaced. If you can give interactions of a person with a site to a deep network, it can predict with a high confidence whether they are gonna buy, leave or indecisive. It can find patterns in diseases that we as humans cannot. This is what we were waiting for (and we were afraid of?). <i>Machine intelligence is here</i>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdERBESF9G6DnOBReQ_lfZZNf8zEr8eHqXNIfCaZigi3oseEXhLSC8Z65gWxR-EO24o7Q0rttWF_-s4IsiQyz_dvqS4ZSypR2lAZNTU5G-HLwd-INC9imYS-nxjCUJqJSDL9xWI3S4tQ-L/s1600/sci-prg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="364" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdERBESF9G6DnOBReQ_lfZZNf8zEr8eHqXNIfCaZigi3oseEXhLSC8Z65gWxR-EO24o7Q0rttWF_-s4IsiQyz_dvqS4ZSypR2lAZNTU5G-HLwd-INC9imYS-nxjCUJqJSDL9xWI3S4tQ-L/s1600/sci-prg.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
The scientific Programmer [And yes, it has to know more]</div>
<br />
<br />
Now one might say that the answer is the Data Scientists. True. But first, we don't have enough of them and second, based on first hand experience, we need people with engineering rigour to produce production ready software - something that certainly some Data Scientist have but not all. So I feel that a programmer turned Statistician can build a more robust software than the other way around. We need people who understand what it takes to build a software that you can put in front of millions of customers to use. People who understand linear scalability, SLA, monitoring and architectural constraints.<br />
<br />
<div style="text-align: center;">
* * *</div>
<div>
<br /></div>
Horizon is shifting.<br />
<br />
We can pick a new language (be it Go, Haskell, Julia, Rust, Elixir or Erlang) and start re-inventing the wheel and start from pretty much the same scratch again because hey, this is easy now, we have done it before and don't have to think. We can pick a <a href="http://www.asp.net/vnext" target="_blank">new albeit cleaner abstraction</a> and re-implement thousands of hours of hard work and sweat we and the community have suffered - since hey we can. We can rewrite the same HTTP pipeline 1000s of different ways and never be happy with what we have achieved, be it Ruby on Rails, Sinatra, Nancy, ASP.NET Web API, Flask, etc. And keep happy that we are striving for that perfection, that unicorn. We can argue about how to version APIs and how a service is such RESTful and such not RESTful. We can mull over pettiest of things such as <a href="https://github.com/twbs/bootstrap/issues/3057" target="_blank">semicolon</a> or the gender of a <a href="https://github.com/joyent/libuv/pull/1015" target="_blank">pronoun</a> and let insanely clever people leave our community. We can exchange the worst of words over "females in the industry" while we more or less are saying the same thing, Too much drama.<br />
<br />
But soon this will be no good. Not good enough. We got to grow up and go back to school, relearn all about Maths, statistics, and generally scientific reasoning. We need to man up and re-learn that being a good coder has nothing to do with the number of stickers you have at the back of your Mac. It is all scientific - <b>we come from a long line of scientists, we have got to live up to our heritage.</b><br />
<br />
We need to go and build novelties for the second half of the decade. This is what I hope to be able to do.aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com45tag:blogger.com,1999:blog-2889416825250254881.post-48007767316018442122014-11-29T12:06:00.001+00:002014-11-30T09:00:25.951+00:00Health Endpoint in API Design: slippery slope that it isLevel [<a href="http://byterot.blogspot.co.uk/2012/03/post-level-description.html" target="_blank">C3</a>]<br />
<br />
<script src="http://softxnet.co.uk/sh/js/shcore.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushjscript.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/js/shbrushcsharp.js" type="text/javascript">
</script>
<script src="http://softxnet.co.uk/sh/_ga.js" type="text/javascript">
</script>
Health Endpoint is a common <em>practice</em> in building APIs. Such an endpoint, unlike other resources of a REST API, instead of achieving a business activity, returns the status of the service and while it can gather and return some data, it is the HTTP status that defines whether the service is "Up or Down". These endpoints commonly go and check a bunch configurations and connectivity with the dependent services, and even make a few calls for a "Test Customer" to make sure business activity can be achieved.<br />
<br />
There is something above that just doesn't feel right to me - and this post is an exercise to define what I mean by it. I will explain what are the problems with the <em>Health API</em> and I am going to suggest how to "fix" it.<br />
<br />
What is the health of an API anyway? The server up and running and capable of returning the status 200? Server and all its dependencies running and returning 200? Server and all its dependencies running capable of returning 200 in a reasonable amount of time? API able to accomplish some business activity? Or API able to accomplish a <em>certain</em> activity for a <em>test user</em>? API able to accomplish all activities within <em>reasonable time</em>? API able to accomplish all activities with its 95% percentile falling within an agreed SLA?<br />
<br />
A <a href="http://byterot.blogspot.co.uk/2012/11/client-server-domain-separation-csds-rest.html">Service</a> is a complex beast. While its complexity would be nowhere near a living organism, it is useful to draw a parallel with a living organism. I remember from my previous medical life that the definition of health - provided by none other than <a href="http://who.int/about/definition/en/print.html">WHO</a> - would go like this:<br />
<blockquote>
"Health is a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity."</blockquote>
In other words, defining health of an organism is a complex and involved process requiring deep understanding of the organism and how it functions. [Well, we are lucky that we are only dealing with distributed systems and their services (or MicroServices if you like) and not living organisms.] For servies, instead of health, we define the <strong>Quality of Service as a quantitative measure of a service's health</strong>.<br />
<br />
Quality Of Servie is normally a bunch of orthogonal SLAs each defining a measurement for one aspect of the service. In terms of monitoring, <em>Availability</em> of a service is the most important aspect of the service to guage and closely follow. Availability of the service cannot simply be measured by the amount of time the servers dedicated to a service have been up. Apart from being <strong>reachable</strong>, service needs to respond within acceptable time (<strong>Low Latency</strong>) and has to be able to achieve its business activity (<strong>Functional</strong>) - no point server being reachable and return 503 error within milliseconds. So the number of error responses (as a deviation from the baseline which can be normal validation and business rule errors) also come into play.<br />
<br />
So the question is how can we, expose an endpoint <em>inside</em> a service that can aggregate all above facets and report the health of a service. Simple answer is we <strong>cannot</strong> and should not commit ourselves to do it. Why? Let's take some simple help from algebra.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRe1fUuFFMWatTvHik4DookdUuIlhjrhdR2YBI9HeNJb3ZobJPrD9qEXPVb6lBOWe-PMThyKd49WSTS5CnnaBMstuNZ5iqilTZKp52jXpCboOUTe_iUkCNyePaYQlWJpXehxdwQtG-FDIv/s1600/canary+function.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRe1fUuFFMWatTvHik4DookdUuIlhjrhdR2YBI9HeNJb3ZobJPrD9qEXPVb6lBOWe-PMThyKd49WSTS5CnnaBMstuNZ5iqilTZKp52jXpCboOUTe_iUkCNyePaYQlWJpXehxdwQtG-FDIv/s1600/canary+function.png" height="416" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">API/Service maps an input domain to an output domain (codomain). Also availability is a function of the output domain.</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
A service (<code>f</code>) is basically <strong>a function that maps the input domain (<code>I</code>) to an output domain (<code>O</code>).</strong> So:<br />
<pre><code>O = f(I)
</code></pre>
The output domain is a set of all possible responses with their status codes and latencies. Availability (<code>A</code>) is a function (<code>a</code>) of the output domain since it has to aggregate errors, latencies, etc:<br />
<pre><code>A = a(O)
</code></pre>
So in other words:<br />
<pre><code>A = a(f(I))
</code></pre>
So in other words, <code>A</code> cannot be measured without I - which for a real service is a very large set. And also it needs all of <code>f</code> - not your subset bypass-authentication-use-test-customer method.<br />
<br />
So one approach is to sit outside the service and only deal with the output domain in a sort of proxy or monitoring server logs. Netflix have done a ton of work on this and have open sourced it as <a href="https://github.com/Netflix/Hystrix" target="_blank">Hysterix</a>) and no wonder I have <i>not</i> heard anything about the magical Health Endpoint in there (now there is an alternative endpoint which I will explain later). But if you want to do it within the service you need all the input domain and not just your "Test Customer" to make assertions about the health of your service. And this kind of assertion is <strong>not just wrong, it is dangerous</strong> as I am going to explain.<br />
<br />
First of all, gradually - especially as far as the ops are concerned - that green line on the dashboard that checks your endpoint <em>becomes your availability</em>. People get used to trust it and when things go wrong out there and customers jump and shout, you will not believe it for quite a while because your eye sees that green line and trusts it.<br />
<br />
And guess what happens when you have such an incident? There will be a post-mortem meeting and all tie-and-suits will be there and they identify the root cause as the faulty health-check and you will be asked to go back and fix your Health Check endpoint. And then you start building more and more complexity into your endpoint. Your endpoint gets to know about each and every dependency, all their intricacies. And before your know it, you could build a complete application beside your main service. And you know what, you have to do it for each and every service, as they are all different.<br />
<br />
So don't do it. Don't commit yourself to what you cannot achieve.<br />
<br />
So is there no point in having a simplistic endpoint which tells us basic information about the status of the service? Of course there is. Such information are useful and many load balancers or web proxies require such an endpoint.<br />
<br />
But first we need to make absolutely clear what the responsibility of such an endpoint is.<br />
<br />
<h3 id="canary-endpoint">
Canary Endpoint</h3>
A canary endpoint (the name is courtesy of <a href="https://www.blogger.com/!!" target="_blank">Jamie Beaumont</a>) is a simplistic endpoint which gathers <strong>connectivity status and latency</strong> of all dependencies of a service. It absolutely does not trigger any business activity, there is no "Test Customer" of any kind and is not a "Health Endpoint". If it is green, it does not mean your service is <em>available</em>. But if it is red (<em>your canary is <a href="https://www.youtube.com/watch?v=4vuW6tQ0218" target="_blank">dead</a></em>) then you definitely have a problem.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbsoRvXFW4cP61fzxPR6Pks8OsCX06bbvyayU7GrXi8g7Z-I3tDZ34y6l4nbwTeuI3OQiDAzZDSZMg6FtEsk2roagtT13kLklf_upTItDBPDOZrGoeLNKak0T8k8NDorPhQ_m8F_LHgiQB/s1600/canarydead.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbsoRvXFW4cP61fzxPR6Pks8OsCX06bbvyayU7GrXi8g7Z-I3tDZ34y6l4nbwTeuI3OQiDAzZDSZMg6FtEsk2roagtT13kLklf_upTItDBPDOZrGoeLNKak0T8k8NDorPhQ_m8F_LHgiQB/s1600/canarydead.png" height="266" width="400" /></a></div>
<br />
<br />
So how does a canary endpoint work? It basically checks connectivity with its immediate dependencies - including but not limited to:<br />
<ul>
<li><em>External services</em></li>
<li>SQL Databases</li>
<li>NoSQL Stores</li>
<li>External distributed caches</li>
<li>Service brokers (Azure RabbitMQ, Service Bus)</li>
</ul>
A <em>canary result</em> contains name of the dependency, latency and the status code. If any of the results has non-success code, endpoint returns a non-success code. Status code returned is used by simple callers such as load balancers. Also in all cases, we return a payload which is aggregated canary result. Such results can be used to feed various charts and draw heuristics into significance of variability of the latencies.<br />
<br />
You probably noticed that External Services appear in <i>Italic </i>i.e. it is a bit different. Reason is if an external service has a canary endpoint itself, instead of just a connectivity check, we call its canary endpoint and add its aggregated result to the result we are returning. So usually the entry point API will generate a cascade of canary chirps that will tell us how things are.<br />
<br />
Implementation of the connectivity check is generally dependent on the underlying technology. For a Cache service, it suffices to <span style="font-family: Courier New, Courier, monospace;">Set</span> a constant value and see it succeeding. For a SQL Database a <span style="font-family: Courier New, Courier, monospace;">SELECT 1;</span> query is all that is needed. For an Azure Storage account, it would be enough to connect and get the list of tables. The point being here is that none of these are anywhere near a business activity, so that you could not - in the remotest sense - think that its success means your business is up and running.<br />
<br />
So there you have it. Don't do health endpoints, <b>do canary</b> instead.<br />
<br />
<h3>
Canary Endpoint implementation</h3>
A canary endpoint normally gets implemented as an HTTP GET call which returns a collection of connectivity check metrics. You can abstract the logic of checking various dependencies in a library and allow API developers to implement the endpoint by just declaring the dependencies.<br />
<br />
We are currently working on an implementation in ASOS (C# and ASP.NET Web API) and there is possibility of open sourcing it.<br />
<br />
<h3>
Security of the Canary Endpoint</h3>
<div>
I am in favour of securing Canary Endpoint with a constant API key - normally under SSL. This does not provide highest level of security but it is enough to make it much more difficult to break into. At the end of the day, a canay endpoint lists all internal dependencies, components and potentially technologies of a system that can be used by hackers to target components.</div>
<div>
<br /></div>
<h3>
Performance impact of Canary Endpoint</h3>
<div>
Since canary endpoint does not trigger any business activity, its performance footprint should be minimal. However, since calling the canary endpoint generates a cascade of calls, it might not be wise to iterate through all canary endpoints and just call them every few seconds since deeper canary endpoints in a highly layered architecture get called multiple times in each round. </div>
<div>
<br /></div>
<div>
<br /></div>
<script type="text/javascript">
SyntaxHighlighter.all();
</script>aliostadhttp://www.blogger.com/profile/05695786967974402749noreply@blogger.com9