Byte Rot

Thursday, 22 May 2014

BeeHive Series - Part 1 - Getting started with BeeHive

[Level T1]

I feel that BeeHive has been the most important Open Source project I have been involved so far. Not because it is my latest project, but because I feel the potential is huge.

The infoq article that came out on Monday is pretty much a theoretical braindump on the Reactor Actor Model. I had expected this to stir up so much controversy as the claims I am making are pretty big - this has not happened yet. Maybe the text does not flow well or it is just early. In any case, the idea is clear: sticking to Processor Actors and build a web of loosely connected events to fulfil a system's business requirement while maintaining fluid evolvability. However, I fear the article was perhaps too dry and long and did not fully demonstrate the potential.

Hence I have set out to start a series on BeeHive and show the idea in some tangible scenarios "Show me the codz" - with the code easily accessible from GitHub. So now let's roll on.

BeeHive

BeeHive makes it ridiculously simple (and not necessarily easy) to build decoupled actors that together achieve a business goal. By using topic-based subscription, you can easily add actors that feed on an existing event and do something extra.
Here we will build such systems will a few lines of code. You can find the full solution in the samples folder in the source. The code below is mainly is a snippet (e.g does not have the Dispose() methods for removing clutter).

Scenario 1: News Feed Keyword Notification

Let's imagine you are interested in to know all breaking news from one or several news feeds and would like to be notified when a certain keyword occurs in the news. In this case we design a series of reactive actors that achieve this while allowing for other functionality to be built on top of existing topic based queues. We use Windows Azure for this example. In order to achieve this we need to regularly (e.g. every 5 minutes), check the news feed (RSS, ATOM, etc) and examine new items arrived and look for the interested keyword(s) and then perhaps tweet, send email or SMS text.

Pulsers: Activities on regular intervals

BeeHive works on a reactive event-based model. Instead of building components that regularly do some work, we can have generic components that regularly fire an event. These events then can be subscribed to by one or more actors to fire off the processing by a chain of decoupled actors.
BeeHive Pulsers do exactly that. On regular intervals, they are woken up to fire off their events. Simplest of pulsers, are assembly attribute ones:

[assembly: SimpleAutoPulserDescription("NewsPulse", 5 * 60)]

The code above sets up a pulser that every 5 minutes sends an event of type NewsPulse with empty body.

Dissemination from a list: Fan-out pattern

Next we set up an actor to look up a list of feeds (one per each line) and send an event per each feed. We have stored this list in a blob storage which is abstracted as IKeyValueStore in BeeHive.

[ActorDescription("NewsPulse-Capture")]
public class NewsPulseActor : IProcessorActor
{
  private IKeyValueStore _keyValueStore;
  public NewsPulseActor(IKeyValueStore keyValueStore)
  {
    _keyValueStore = _keyValueStore;
  }
  public async Task<IEnumerable<Event>> ProcessAsync(Event evnt)
  {
    var events = new List<Event>();
    var blob = await _keyValueStore.GetAsync("newsFeeds.txt");
    var reader = new StreamReader(blob.Body);
    string line = string.Empty;
    while ((line = reader.ReadLine())!=null)
    {
      if (!string.IsNullOrEmpty(line))
      {
        events.Add(new Event(new NewsFeedPulsed(){Url = line}));
      }
    }
    return events;
  }
}

ActorDescription attribute used above signifies that the actor will receive its events from the Capture subscription of the NewsPulse topic. We will be setting up all topics later using a single line of code.
So we are publishing NewsFeedPulsed event for each news feed. When we construct an Event object using the typed event instance, EventType and QueueName will be set to the type of the object passed - which is the preferred approach for consistency.

Feed Capture: another Fan-out

Now we will be consuming these events in the next actor in the chain. NewsFeedPulseActor will subscribe to NewsFeedPulsed event and will use the URL to get the lastest RSS feed and look for latest news. To prevent from duplicate notifications, we need to know what was the most recent tem we checked last time. We will store this offset in a storage. For this use case, we choose ICollectionStore<T> which its Azure implementation uses Azure Table Storage.

[ActorDescription("NewsFeedPulsed-Capture")]
public class NewsFeedPulseActor : IProcessorActor
{
    private ICollectionStore<FeedChannel> _channelStore;
    public NewsFeedPulseActor(ICollectionStore<FeedChannel> channelStore)
    {
        _channelStore = channelStore;
    }
    public async Task<IEnumerable<Event>> ProcessAsync(Event evnt)
    {
      var newsFeedPulsed = evnt.GetBody<NewsFeedPulsed>();
      var client = new HttpClient();
      var stream = await client.GetStreamAsync(newsFeedPulsed.Url);
      var feedChannel = new FeedChannel(newsFeedPulsed.Url);
      var feed = SyndicationFeed.Load(XmlReader.Create(stream));
      var offset = DateTimeOffset.MinValue;
      if (await _channelStore.ExistsAsync(feedChannel.Id))
      {
        feedChannel = await _channelStore.GetAsync(feedChannel.Id);
        offset = feedChannel.LastOffset;
      }
      feedChannel.LastOffset = feed.Items.Max(x => x.PublishDate);
      await _channelStore.UpsertAsync(feedChannel);
      return feed.Items.OrderByDescending(x => x.PublishDate)
        .TakeWhile(y => offset < y.PublishDate)
        .Select(z => new Event(new NewsItemCaptured(){Item = z}));
    }
  }
}

Here we read the URL from the event and capture the RSS and then get the last offset from the strorage. We then send the captured feed items back as events for whoever is interested. At the end, we set the offset.

Keyword filtering and Notification

At this stage we need to subscribe to NewsItemCaptured and check the content for specific keywords. This is only one potential subscription out of many. For example one actor could be subscribing to the event to store these for further retrieval, another to process for trend analysis, etc.
So for the sake of simplicity, let's hardcode the keyword (in this case "Ukraine") but it could have been equally loaded from a storage or a file - as we did with the list of feeds.

[ActorDescription("NewsItemCaptured-KeywordCheck")]
public class NewsItemKeywordActor : IProcessorActor
{
    private const string Keyword = "Ukraine";
    public async Task<IEnumerable<Event>> ProcessAsync(Event evnt)
    {
        var newsItemCaptured = evnt.GetBody<NewsItemCaptured>();
        if (newsItemCaptured.Item.Title.Text.ToLower()
            .IndexOf(Keyword.ToLower()) >= 0)
        {
            return new Event[]
            {
                new Event(new NewsItemContainingKeywordIentified()
                {
                    Item = newsItemCaptured.Item,
                    Keyword = Keyword
                }),
            };
        }
        return new Event[0];
    }
}

Now we can have several actors listening for NewsItemContainingKeywordIentified and send different notifications, here we implement a simple Trace-based one:

[ActorDescription("NewsItemContainingKeywordIentified-TraceNotification")]
public class TraceNotificationActor : IProcessorActor
{
  public async Task<IEnumerable<Event>> ProcessAsync(Event evnt)
  {
    var keywordIentified = evnt.GetBody<NewsItemContainingKeywordIentified>();
    Trace.TraceInformation("Found {0} in {1}",
        keywordIentified.Keyword, keywordIentified.Item.Links[0].Uri);
    return new Event[0];
  }
}

Setting up the worker role

If you have an Azure account, you need a storage account, Azure Service Bus and a worker role (even an Extra Small instance would suffice). If not, you can use development emulators although for the Service Bus you need to use Service Bus for windows. Just bear in mind, with local emulators and Service Bus for Windows, you have to use special versions of Azure SDK - latest versions usually do not work.
We can get a list of assembly pulsers by the code below:

_pulsers = Pulsers.FromAssembly(Assembly.GetExecutingAssembly())
    .ToList()

Also we need to create an Orchestartor to set up factory actors. We need to call SetupAsync() to set up all the topics and subscriptions.
Also we need to register our classes against Dependency Injection framework.

Now we are ready!

After running the application in debug mode, here is what you see in the output window:

Obviously, we can add another actor to subscribe to this event to send email, SMS messages, you name it. Point being, it is a piece of cake. Checking the links, and they indeed are about Ukraine:

And this one:

In the next post, we will discuss another scenario.

Saturday, 19 April 2014

CacheCow 0.5 Released

[Level T1]

OK, finally CacheCow 0.5 (actually 0.5.1 as 0.5 was released by mistake and pulled out quickly) is out. So here is the list of changes, some of which are breaking. Breaking changes, however, are very easy to fix and if you do not need new features, you can happily stay at 0.4 stable versions. If you have moved to 0.5 and you see issues, please let me know ASAP. For more information, see my other post on Alpha release.

So here are the features:

Azure Caching for CacheCow Server and Client

Thanks to Ugo Lattanzi, we have now CacheCow storage in Azure Caching. This is both client and server. Considering calls to Azure Cache takes around 1ms (based on some ad hoc tests I have done, do not quote this as a proper benchmark), this makes a really good option if you have deployed your application in Azure. You just need to specify the cacheName, otherwise "default" cache will be used.

Complete re-design of cache invalidation

I have now put some sense into cache invalidation. Point is that strong ETag is generated for a particular representation of the resource while cache invalidation happens on the resource including all its representation. For example, if you send application/xml representation of the resource, ETag is generated for this particular representation. As such, application/json representation will get its own ETag. However, on PUT both need to be invalidated. On the other hand, in case of PUT or DELETE on a resource (let's say /api/customer/123) the collection resource (/api/customer) needs to be invalidated since the collection will be different.

But how we would find out if a resource is collection or single? I have implemented some logic that infers whether the item is collection or not - and this is based on common and conventional routes design in ASP.NET Web API. If your routing is very custom this will not work.

Another aspect is hierarchical routes. When a resource is invalidated, its parent resource will be invalidated as well. For example in case of PUT /api/customer/123/order/456 , customer resource will change too - if orders are returned as part of customer. So in this case, not only the order collection resource but the customer needs to be invalidated. This is currently done in CacheCow.Server and will work for conventional routes.

Using MemoryCache for InMemory stores (both server and client)

Previously I have been using dictionaries for InMemory stores. The problem with Dictionary is that it just grows until system runs out of memory while MemoryCache limits its use of memory when system deals with a memory pressure.

Dependency of CachingHandler to HttpConfiguration

As I announced before, you need to pass the configuration to he CachingHandler on server. This should be an easy change but a breaking one.

Future roadmap and 0.6

I think we are now in a point where CacheCow requires a re-write for the features I want to introduce. One of such features is enabling Content-based ETag generation and cache invalidation. Currently Content-based ETag generation is there and can be used now but without content-based invalidation is not much use especially that in fact this now generates a weak ETag. Considering the changes required for achieving this, almost a re-write is required.

Please keep the feedbacks coming. Thanks for using and supporting CacheCow!

Sunday, 13 April 2014

Reactive Cloud Actors: no-nonsense MicroServices

[Level C3]

This post is not directly about MicroServices. If that is why you are reading it, might as well stop now. Apparently, we are still waiting as for the definition to be finally ratified. The definition, as it stands now, is blurry - Martin Fowler admits. This post is about Actors - the cloud ones - you know. After finishing reading it, I hope I have made it effortlessly clear how Reactive Cloud Actors are the real MicroServices, rather than albeit light RESTful Imperative MicroServices.

Watching Fred George delivering an excellent talk on High-Performance Bus inspired me to start working on the actors. I am still working on a final article on the subject but this post is basically a primer on that - as well as announcement of BeeHive mini-framework. The next section on actors is taken from that article which covers the essential theoretical background. Before we start, let's make it clear that the term Reactive is not used in the context of Reactive Extensions (Rx) or Frameworks, only in contrast to imperative (RPC-based) actors. Also RPC-based is not used in contrast to RESTful, but it simply means a system which relies on command and query messages rather than events.

UPDATE: The article is now published on infoq here.

Actors

Carl Hewitt, along with Peter Bishop and Richard Steiger, published an article back in 1973 that proposed a formalism that identified a single class of objects, i.e. Actors, as the building blocks of systems designed to implement Artificial Intelligence algorithms.

According to Hewitt an actor, in response to a message, can:

Send a finite number of messages to other actors
create a finite number of other actors
decide on the behaviour to be used for the next message it receives

Any combination of these actions can occur concurrently and in response to messages arriving in any order - as such, there is no constraint with regard to ordering and an actor implementation must be able to handle messages arriving out of band. However, I believe it is best to separate these responsibilities as below.

Processor Actor

In a later description of the Actor Model, first constraint is re-defined as "send a finite number of messages to the address of other actors". Addressing is an integral part of the model that decouples actors and limits the knowledge of actors from each other to mere a token (i.e. address). Familiar implementation of addressing includes Web Services endpoints, Publish/Subscribe queue endpoints and email addresses. Actors that respond to a message by using the first constraint can be called Processor Actors.

Factory Actor

Second constraint makes actors capable of creating other actors that we conveniently call Factory Actors. Factory actors are important elements of a message-driven system where an actor is consuming from a message queue and create handlers based on the message type. Factory actors control the lifetime of the actors they create and have a deeper knowledge of the actors they create - compared to processor actors knowing mere an address. It is useful to separate factory actors from processing ones - in line with the single responsibility principle.

Stateful Actor

Third constraint is the Stateful Actor. Actors capable of the third constraint have a memory that allows them to react differently on subsequent messages. Such actors can be subject to a myriad of side-effects. Firstly, when we talk about "subsequent messages" we inherently assume an ordering while as we said, there is no constraint with regard to ordering: an out of band message arrival can lead to complications. Secondly, all aspects of CAP applies to this memory making a consistent yet highly available and partition tolerant impossible to achieve. In short, it is best to avoid stateful actors.

Modelling a Processor Actor

"Please open your eyes, Try to realise, I found out today we're going wrong, We're going wrong" - Cream

[Mind you there is only a glimpse of Ginger Baker visible while the song is heavily reliant on Ginger's drumming. And yeah, this goes back to a time when Eric played Gibson and not his signature Strat]

This is where most of us can go wrong. We do that, sometimes for 4 years - without realising it. This is by no means a reference to a certain project [... cough ... Orleans ... cough] that has been brewing (Strange Brew pun intended) for 4 years and coming up with an imperative, RPC-based, RESTfully coupled Micro-APIs. We know it, doing simple is hard - and we go wrong, i.e. we do the opposite: build really complex frameworks.

I was chatting away on twitter with a few friends and I was saying "if you need a full-blown and complex framework to do actors, you are probably doing it wrong". All you need is a few interfaces, and some helpers doing the boilerplate stuff. This stuff ain't no rocket science, let's not turn it into.

The essence of the Reactive Cloud Actor is the interface below (part of BeeHive mini-framework introduced below):

    /// <summary>
    /// Processes an event.
    /// 
    /// Queue name containing the messages that the actor can process messages of.
    /// It can be in the format of [queueName] or [topicName]-[subscriptionName]
    /// </summary>
    public interface IProcessorActor : IDisposable
    {
        /// <summary>
        /// Asynchrnous processing of the message
        /// </summary>
        /// <param name="evnt">Event to process</param>
        /// <returns>Typically contains 0-1 messages. Exceptionally more than 1</returns>
        Task<IEnumerable<Event>> ProcessAsync(Event evnt);

    }

Yes, that is all. All of your business can be captured by the universal method above. Don't you believe that? Just have a look a non-trivial eCommerce example implemented using this single method.

So why Reactive (Event-based) and not Imperative (RPC-based)? Because in a reactive actor system, each actor only knows about its own Step and what itself does and has no clue about the next steps or the rest of the system - i.e. actors are decoupled leading to independence which facilitates actor Application Lifecycle Management and DevOps deployment.

As can be seen above, Imperative actors know about their actor dependencies while Reactive actors have no dependency other than the queues, basic data structure stores and external systems. Imperative actors communicate with other actors via a message store/bus and invoke method calls. We have been this for years, in different Enterprise Service Bus integrations, this one only brings it to a micro level which makes the pains event worse.

So let's bring an example: fraud check of an order.

Imperative
PaymentActor, after a successful payment for an order, calls the FraudCheckActor. FraudCheckActor calls external fraud check systems. After identifying a fraud, it calls CancelOrderActor to cancel the order. So as you can see, PaymentActor knows about and depends on FraudCheckActor. In the same way, FraudCheckActor depends on CancelOrderActor. They are coupled.

Reactive
PaymentActor, upon successful payment, raises PaymentAuthorised event. FraudCheckActor is one of its subscribers and after receiving this event checks for fraud and if one detected, raises FraudDetected event. CancelOrderActor subscribers to some events, including FraudDetected upon receiving which it cancels the order. None of these actors know about the other. They are decoupled.

So which one is better? By the way, none of this is new - we have been doing it for years. But it is important to identify why we should avoid the first and not to "go wrong".

Reactive Cloud Actors proposal

After categorising the actors, here I propose the following constraints for Reactive Cloud Actors:

A reactive system that communicates by sending events
Events are defined as a time-stamped, immutable, unique and eternally-true piece of information
Events have types
Events are stored in a Highly-Available cloud storage queues allowing topics
Queues must support delaying
Processor Actors react to receiving a single and then do some processing and then send back usually one (sometimes zero and rarely more than one) event
Processor Actors have type - implemented as a class
Processing should involve minimal number of steps, almost always a single step
Processing of the events are designed to be Idempotent
Each Processor Actor can receive one or more event types - all of which defined by Actor Description
Factory Actors responsible for managing the lifetime of processor actors
Actors are deployed to cloud nodes. Each node contains one Factory Actor and can create one or more Processor Actor depending on its configuration. Grouping of actors depends on cost vs. ease of deployment.
In addition to events, there are other Basic Data Structures that contain state and are stored in Highly-Available cloud storage (See below on Basic Data Structures)
There are no Stateful Actors. All state is managed by the Basic Data Structures and events.
This forms an evolvable web of events which can define flexible workflows

Breaking down all the processes into single steps is very important. A Highly-Available yet Eventually-Consistent system can handle delays but cannot easily bind multiple steps into a single transaction.

So how can we implement this? Is this gonna work?

Introducing BeeHive

BeeHive is a vendor-agnostic Reactive Actor mini-framework I have been working over the last three months. It is implemented in C# but frankly could be done in any language supporting Asynchronous programming (promises) such as Java or node.

The cloud implementation has been only implemented for Azure but implementing another cloud vendor is basically implementing 4-5 interfaces. It also comes with an In-Memory implementation too which is only targeted at implementing demos. This framework is not meant to be used as an in-process actor framework.

It implements Prismo eCommerce example which is an imaginary eCommerce system and has been implemented for both In-Memory and Azure. This example is non-trivial has some tricky scenarios that have to implement Scatter-Gather sagas. There is also a Boomerang pattern event that turns a multi-step process into regurgitating an event a few times until all steps are done (this requires another post).

An event is model as:

[Serializable]
public sealed class Event : ICloneable
{

    public static readonly Event Empty = new Event();

    public Event(object body)
        : this()
    {        
       ...
    }

    public Event()
    {
        ...
    }

    /// <summary>
    /// Mrks when the event happened. Normally a UTC datetime.
    /// </summary>
    public DateTimeOffset Timestamp { get; set; }

    /// <summary>
    /// Normally a GUID
    /// </summary>
    public string Id { get; private set; }

    /// <summary>
    /// Optional URL to the body of the message if Body can be retrieved 
    /// from this URL
    /// </summary>
    public string Url { get; set; }

    /// <summary>
    /// Content-Type of the Body. Usually a Mime-Type
    /// Typically body is a serialised JSON and content type is application/[.NET Type]+json
    /// </summary>
    public string ContentType { get; set; }
        
    /// <summary>
    /// String content of the body.
    /// Typically a serialised JSON
    /// </summary>
    public string Body { get; set; }

    /// <summary>
    /// Type of the event. This must be set at the time of creation of event before PushAsync
    /// </summary>
    public string EventType { get; set; }

    /// <summary>
    /// Underlying queue message (e.g. BrokeredMessage in case of Azure)
    /// </summary>
    public object UnderlyingMessage { get; set; }

    /// <summary>
    /// This MUST be set by the Queue Operator upon Creation of message usually in NextAsync!!
    /// </summary>
    public string QueueName { get; set; }


    public T GetBody<T>()
    {
       ...
    }

    public object Clone()
    {
        ...
    }
}

As can be seen, Body has been defined as a string since BeeHive uses JSON serialisation. This can be made flexible but in reality events should normally contain small amount of data and mainly basic data types such as GUID, integer, string, boolean and DateTime. Any binary data should be stored in Azure Blob Storage or S3 and then path referenced here.

BeeHive Basic Data Structures

This is a work-in-progress but nearly done part of the BeeHive to define a minimal set of Basic Data Structures (and their stores) to cover all required data needs of Reactive Actors. These structures are defined as interfaces that can be implemented for different cloud platforms. This list as it stands now:

Topic-based and simple Queues
Key-Values
Collections
Keyed Lists
Counters

Some of these data structures hold entities within the system which should implement a simple interface:

public interface IHaveIdentity
{
    Guid Id { get; }
}

Optionally, entities could be Concurrency-Aware for updates and deletes in which case they will implement an additional interface:

public interface IConcurrencyAware
{
    DateTimeOffset? LastModofied { get; }

    string ETag { get; }
}

Prismo eCommerce sample

Prismo eCommerce is an imaginary eCommerce company that receives orders and processes them. The order taking relies on an eventually consistent (and delayed) read model of the inventory hence orders can be accepted for which items are out of stock. Process waits until all items are in stock or if out of stock, they arrive back in stock until it sends them to fulfilment.

Prismo eCommerce states (solid), transitions (arrows) and processes (dashed)

This sample has been implemented both In-Memory and for Windows Azure. In both cases, the tracing can be used to see what is happening outside. In this sample all actors are configured to run in a single worker role, although they can each run in their own roles. I might provide a UI to show the status of each order as they go through statuses.

Conclusion

Reactive Cloud Actors are the way to implement decoupled MicroServices. By individualising actor definitions (Processor Actor and Factory Actor) and avoiding Stateful Actors, we can build resilient and Highly Available cloud based systems. Such systems will comprise a evolvable webs of events which each web defines a business capability. I don't know about you but this is how I am going to build my cloud systems.

Watch the space, the article is on its way.

Saturday, 5 April 2014

Web APIs and n+1 problem

[Level C3]
You are probably familiar with n+1 problem: when a request to load 1 item turns into n+1 requests since the item has n other associated items. The terms has been mainly described in the context of Object Relational Mappers (ORMs) when lazy loading associated records have caused additional calls to the RDBMS database resulting in poor performance, locks, deadlocks and timeouts.

This article now has been published on @infoq. Please view the article here.

This article is humbly dedicated to Adrian Cockcroft and Daniel Jacobson for their inspiring yet ground breaking works.

Saturday, 18 January 2014

Am I going to keep supporting my open source projects?

I have recently been asked by a few whether I will be maintaining my open source projects, mainly CacheCow and PerfIt.

I think since my previous blog post, some have been thinking that I am going to completely drop C#, Microsoft technologies and whatever I have been building so far. There are also some concerns over the future of the Open Source projects and whether they will be pro-actively maintained.

If you read my post again, you will find out that I have twice stressed that I will keep doing what I am doing. In fact I have lately been doing a lot C# and Azure - like never before. I am really liking Windows Azure and I believe it is a great platform for building Cloud Applications.

So in short, the answer is a BIG YES. I just pushed some code in CacheCow. and released CacheCow.Client 0.5.0-alpha which has a few improvements. I have plans for finally releasing 0.5 but it has been delayed due to my engagement with this other project which I am working on - as I said C# and Azure.

So please keep PRs, Issues, Bugs and wish lists coming in.

With regard to my decision and recent blog, this is a strategy change hoping to position myself in 2 years to be able to do equally non-Microsoft work if I have to. I hope it is clear now!

Saturday, 21 December 2013

Thank you Microsoft, and so long ...

[Level C1]

Ideas and views expressed here are purely my own and none of my employer or any related affiliation.

Yeah, I know. It has been more than 13 years. Still remember how it started. It was New Years Eve of the new millennium (yeah year 2000) and I was standing on the shore of the Persian Gulf in Dubai. With my stomach full of delicious Lebanese grilled food and a local newspaper in hand looking for jobs. I was a frustrated Doctor, hurt by the patient abuse and discouraged by the lack of appreciation and creativity in my job. And the very small section of medical jobs compared to pages, after pages, after pages of computer jobs. Well, it was the dot.com boom days, you know, it was the bubble - so we know now. And that was it. A life-changing decision to self-teach myself network administration and get MCSE certificate.

Funny thing is I never did. I got distracted, side-tracked by something much more interesting. Yeah programming, you know, when it gets its hooks in you, it doesn't leave you - never. Hence it started by Visual Basic, and then HTML, and obviously databases which started from Microsoft Access. Then you had to learn SQL Server, And not to forget web programming: ASP 3.0. I remember I printed some docs and it was 14 pages of A4. It had everything I needed to write ASP pages, connect to database, store, search, everything. With that 14 pages I built a website for some foreign Journalists working in Iran: IranReporter - It was up until a few years ago. I know it looks so out-dated and primitive but for a 14-page documentation not so bad.

Then it was .NET, and then move to C# and the rest of the story - I don't bore you with. But what I knew was that the days of 14-page documentation were over. You had to work really hard to keep up. ASP.NET Webforms which I never liked and never learnt. With .NET 3.0 came LINQ, WCF, WPF and there was Silverlight. And there was ORMs, DI frameworks, Unit Testing and mocking frameworks. And then there was Javascript explosion.

And for me, ASP.NET Web API was the big thing. I have spent best part of the last 2 years, learning, blogging, helping out, writing OSS libraries for it and then co-authoring a book on this beautiful technology. And I love it. True I have not been recognised for it and am not an MVP - and honestly not sure what else could do to deserve to become one - but that is OK, I did what I did because I loved and enjoyed it.

So why Goodbye?

Well, I am not going anywhere! I love C# - I don't like Java. I love Visual Studio - and eclipse is a pain. I adore ASP.NET Web API. I love NuGet. I actually like Windows and dislike Mac. So I will carry on writing C# and use ASP.NET Web API and Visual Studio. But I have to shift focus. I have no other option and in the next few lines I try to explain what I mean.

Reality is our industry is on the cusp of a very big change. Yes, a whirlwind of the change happened already over the last few years but well, to put it bluntly I was blind.

So based on what I have seen so far I believe IN THE NEXT 5 YEARS:

1. All data problems become Big Data problems

If you don't like generalisations you won't like this one too - when I mean all I mean most. But the point is industries and companies will either shrink or grow based on their IT agility. The ones shrinking not worth talking about. The ones growing will have big data problems if they do not have them yet. This is related to the growth of internet users as well as realization of Internet of Things. And the other reason is the adoption of Event-Based Architecture which will massively increase the amount of data movement and data crunching.

Data will be made of what currently is not data. Your mouse move and click is the kind of data you would have to deal with. Or user eye movement. Or input from 1000s of sensors inside your house. And this stuff is what you will be working on day-in day-out.

Dear Microsoft, Big Data solutions did not come from you. And despite my wishes, you did not provide a solution of your own. I love to see an alternative attempt and a rival to Hadoop to choose from. But I am stuck with it, so better embrace it. There are some alternatives emerging (real-time, in-memory) but again they come from the Java communities.

2. Any server-side code that cannot be horizontally scaled is gonna die

Is that even a prediction? We have had web farms for years and had to think of robust session management all these years. But something has changed. First of all, it is not just server-stickiness. You have to recover from site-stickiness and site failures too. On the other hand, this is again more about data.

So the closer you are to the web and further from data, the more likely that your technology will survive. For example, API Layer technologies are here to stay. As such ASP.NET Web API has a future. Cannot say the same thing for middleware technologies, such as BizTalk or NServiceBus. Databases? Out of question.

3. Data locality will still be an issue so technologies closer to data will prevail

Data will still be difficult to move in the years to come. While our bandwidth is growing fast, our data is growing faster so this will probably be even a bigger problem. So where your data is will be a key in defining your technologies. If you use Hadoop, using Java technologies will make more and more sense. If you are Amazon S3, likelihood of using Kafka than Azure Service Bus.

4. We need 10x or 100x more data scientists and AI specialists

Again not a prediction. We are already seeing this. So perhaps I can sharpen some of my old skills in computer vision.

So what does it mean for me?

As I said, I will carry on writing C#, ASP.NET Web API and read or write from SQL Server and do my best to write the best code I can. But I will start working on my Unix skills (by using it at home) and pick up one or two JVM languages (Clojure, Scala) and work on Hadoop. I will take Big Data more seriously. I mean real seriously... I need to stay close to where innovation is happening which sadly is not Microsoft now. From the Big Data to Google Glass and cars, it is all happening in the Java communities - and when I mean Java communities I do not mean the language but the ecosystem that is around it. And I will be ready to jump ships if I have to.

And still wish Microsoft wakes up to the sound of its shrinking, not only in the PC market but also in its bread and butter, enterprise.

PS: This is not just Microsoft's problem. It is also a problem with our .NET community. The fight between Silverlight/XAML vs. Javascript took so many years. In the community that I am, we waste so much time fighting on OWIN. I had enough, I know how to best use my time.

Sunday, 1 December 2013

CacheCow 0.5.0-alpha released for community review: new features and breaking changes

[Level T3]

CacheCow 0.5 was long time coming but with everything that was happening, I am already 1 month behind. So I have now released it as alpha so I can get some feedback from teh community.

As some of you know, a few API improvements had been postponed due to breaking changes. On the other hand, there were a few areas I really needed to nail down - the most important one being resource organisation. While resource organisation can be anything, most common approach is to stick with the standards Web API routing, i.e. "/api/{controller}/{id}" where api/{controller} is a collection resource and api/{controller}/{id} is an instance resource.

I will go through some of the new features and changes and really appreciate if you spend a bit of time reviewing and sending me feedbacks.

No more dependency on WebHost package

OK, this has been asked by Darrel and Tugberk and only made sense. So in version 0.4 I added dependency to HttpConfiguration but in order not to break the compatibility, it would use GlobalConfiguration.Configuration in WebHost by default. This has been removed now so an instance of HttpConfiguration needs to be passed in. This is a breaking change but fixing it should be a minimal change in your code from:

var handler = new CachingHandler(entityTagStore);
config.MessageHandlers.Add(handler);

You would write:

var handler = new CachingHandler(config, entityTagStore);
config.MessageHandlers.Add(handler);

CacheKeyGenerator is gone

CacheKeyGenerator was providing a flexibility for generating CacheKey from resource and Vary headers. Reality is, there is no need to provide extensibility points. This is part of the framework which needs to be internal. At the end of the day, Vary headers and URL of the resource are important for generating the CacheKey and that is implemented inside CacheKey. Exposing CacheKeyGenerator had lead to some misuses and confusions surrounding its use cases and had to go.

RoutePatternProvider is reborn

A lot of work has been focused on optimising RoutePatternProvider and possible use cases. First of all, signature has been changed to receive the whole request rather than bits and pieces (URL and value of Vary headers as a SelectMany result).

So the work basically looked at the meaning of Cache Key. Basically we have two connected concepts: representation and resource. A resource can have many representations (XML, image, JSON, different encoding or languages). In terms of caching, representations are identified by a strong ETag while resource will have a weak ETag since it cannot be guaranteed that all its representations are byte-by-byte equal - and in fact they are not.

A resource and its representations

It is important to note that each representation needs to be cached separately, however, once the resource is changed, the cache for all representations are invalidated. This was the fundamental area missing in CacheCow implementation and has been added now. Basically, IEntityTagStore has a new method: RemoveResource. This method is pivotal and gets called when a resource is changed.

The change in RoutePatternProvider is not just a delegate, it is now an interface: IRoutePatternProvider. Reason for the change to delegate is that we have a new method in there too: GetLinkedRoutePatterns. So what is a RoutePattern?

RoutePattern is basically similar to a route and its definition conventions in ASP.NET, however, it is meant for resources and their relationships. I will write a separate post on this but basically RoutePattern is in relation to collection resources and instance resources. Let's imagine a car API hosted at http://server/api. In this case, http://server/api/car refers to all cars while http://server/api/car/123 refers to a car with Id of 123. In order to represent this, we use * sign for collection resources and + sign for instance:

http://server/api/car         collection         http://server/api/car/*
http://server/api/car/123     instance           http://server/api/car/+

Now normally:

POST to collection invalidates collection
POST to instance invalidates both instance and collection
PUT or PATCH to instance invalidates both instance and collection
DELETE to instance invalidates both instance and collection

So in brief, change to collection invalidates collection and change to instance invalidates both.

In a hierarchical resource structure this can get even more complex. If /api/parent/123 gets deleted, /api/parent/123/* (collection) and /api/parent/123/+ will most likely get invalidated since they do not exist any more. However, implementing this without making some major assumptions is not possible hence its implementation is left to the CacheCow users to tailor for their own API.

However, since a flat resource organisation (defining all resources using "/api/{controller}/{id}") is quite common, I have implemented IRoutePatternProvider for flat structures in ConventionalRoutePatternProvider which is the default.

Prefixing SQL Server Script names with Server_ and Client_

The fact was that SQL Server EntityTagStore (on the server package) and CacheStore (on the client package) had some stored procedure name clashes preventing use of the same database for both client and server component. This was requested by Ben Foster and has been done now.

Please provide feedback

I hope this makes sense and waiting for your feedbacks. Please ping me on twitter or ask your questions in Github by raising issue.

Happy coding ...