Byte Rot: Docker

Microservices? Is this not the same SOA principles repackaged and sold under different label? Not this time, I will attend this question in another posts. But if you are considering Microservices for your architecture, beware of the cost and availability concerns. In this post we will look at how using containers (such as Docker) can help you improve your cloud utilisation, decrease costs and above all improve availability.

Elephant in the room: most of the cloud resources are under-utilised

We almost universally underestimate how long it takes to build a software feature. Not sure it is because our time is felt more precious than money, but for hardware almost always the reverse is true: we always overestimate hardware requirements of our systems. Historically this could have been useful since commissioning hardware in enterprises usually a long and painful process and on the other hand, this included business growth over the years and planned contingency for spikes.
But in an elastic environment such as cloud? Well it seems we still do that. In UK alone £1bn is wasted on unused or under-utilised cloud resource.

Some of this is avoidable, by using elasticity of the cloud and scaling up and down as needed. Many cloud vendors provide such functionality out of the box with little or no coding. But many companies already do that, so why waste is so high?

From personal experience I can give you a few reasons why my systems do that...

Instance Redundancy

Redundancy is one of the biggest killers in the computing costs. And things do not change a lot being in the cloud: vendors' availability SLAs usually are defined in a context of redundancy and to be frank, some of it purely cloud related. For example, on Azure you need to have your VMs in an "availability set" to qualify for VM SLAs. In other words, at least 2 or more VMs are needed since your VMs could be taken for patching at any time but within an availability zone this is guaranteed not to happen on all machines in the same availability zone at the same time.

The problem is, unless you are company with massive number of customers, even a small instance VM could suffice for your needs - or even for a big company with many internal services, some services might not need big resource allocation.

Looking from another angle, adopting Microservices will mean you can iterate your services more quickly releasing more often. The catch is, the clients will not be able to upgrade at the same time and you have to be prepared to run multiple versions of the same service/microservice. Old versions of the API cannot be decommissioned until all clients are weaned off the old one and moved to the newer versions. Translation? Well some of your versions will have to run on the shoestring budget to justify their existence.

Containerisation helps you to tap into this resource, reducing the cost by running multiple services on the same VM. A system usually requires at least 2 or 3 active instances - allowing for redundancy. Small services loaded into containers can be co-located on the same instances allowing for higher utilisation of the resources and reduction of cost.

Improved utilisation by service co-location

This ain't rocket science...

Resource Redundancy

Most services have different resource requirements. Whether Network, Disk, CPU or memory, some resources are used more heavily that others. A service encapsulating an algorithm will be mainly CPU-heavy while an HTTP API could benefit from local caching of resources. While cloud vendors provide different VM setups that can be geared for memory, Disk IO or CPU, a system still usually leaves a lot of redundant resources.

Possible best explained in the pictures below. No rocket science here either but mixing services that have different resource allocation profiles gives us best utilisation.

Co-location of Microservices having different resource allocation profile

And what's that got to do with Microservices?

Didn't you just see it?! Building smaller services pushes you towards building ad deploying more services many of which need the High Availability provided by the redundancy but not the price tag associated with it.

Docker is absolutely a must-have if you are doing Microservices or you are paying through the nose for your cloud costs. In QCon London 2015, John Wilkes from Google explained how they "start over 2 billion containers per week". In fact, to be able to take advantage of the spare resources on the VMs, they tend to mix their Production and Batch processes. One difference here is that the Live processes require locked allocated resources while the Batch processes take whatever is left. They analysed the optimum percentages minimising the errors while keeping utilisation high.

Containerisation and availability

As we discussed, optimising utilisation becomes a big problem when you have many many services - and their multiple versions - to run. But what would that mean in terms of Availability? Does containerisation improve or hinder your availability metrics? I have not been able to find much in the literature but as I will explain below, even if you do not have small services requiring VM co-location, you are better off co-locating and spreading the service onto more machines. And it even helps you achieve higher utilisation.

By having spreading your architecture to more Microservices, availability of your overall service (the one the customer sees) is a factor of availability of each Microservice. For instance, if you have 10 Microservices with availability of 4 9s (99.99%), the overall availability drops to 3 9s (99.9%). And if you have 100 Microservice, which is not uncommon, obviously this drops to only two 9s (99%). In this term, you would need to strive for a very high Microservice availability.

Hardware failure is very common and for many components it goes above 1% (Annualised Failure Rate). Defining hardware and platform availability in respect to system availability is not very easy. But for simplicity and the purpose of this study, let's assume failure risk of 1% - at the end of the day our resultant downtime will scale accordingly.

If service A is deployed onto 3 VMs, and one VM goes down (1%), other two instances will have to bear the extra load until another instance is spawned - which will take some time. The capacity planning can leave enough spare resources to deal with this situation but if two VMs go down (0.01%), it will most likely bring down the service as it would not be able cope with the extra load. If the Mean Time to Recovery is 20 minutes, this alone will dent your service Microservice availability by around half of 4 9s! If you have worked hard in this field, you would know how difficult it is to gain those 9s and losing them like that is not an option.

So what's the solution? This diagram should speak for more words:

Service A and B co-located in containers, can tolerate more VM failures

By using containers and co-locating services, we spread instance more thinly and can tolerate more failures. In the example above, our services can tolerate 2 or maybe even 3 VM failures at the same time.

Conclusion

Containerisation (or Docker if you will) is a must if you are considering Microservices. It helps you with increasing utilisation, bringing down cloud costs and above all, improves your availability.

Level [C3]

This year I could make it to the QCon London and I felt it might be useful to write up a summary for those who liked to be there but did not make it for any reason. This will also an opportunity to get my head together and summarise a couple of themes, inspired by the conference.

Quality of the talks was varied and initially pretty disappointing on the first day but rose to a real high on the last day. Not surprisingly, Microservices and Docker were the buzzwords of the conference and many of the talks had one or the other in their title. It was as if, the hungry folks were being presented Microservices with ketchup and next it would be with Mayonnaise and yet nothing as good as Docker with Salsa. In fact it is very easy to be skeptic and sarcastic about Microservices or Docker and disregard them as a pure hype.

After listening to the talks, especially ones on the last day, I was convinced that with or without me, this train is set to take the industry forward. Yes, granularity of the Microservices (MS) have not been crisply defined yet, and there is a stampede to download and install Microservices on old systems and fix the world. Industry will abuse it as it reduced SOA to Web Services by adding just a P to the end. Yes, there are very few people talking about the cost of moving to MS and explore the cases where you should stay put. But if your Monolith (even though pays lip service to SOA) has ground the development cycle to a halt and is killing you and your company, there is a thing or two to learn here.

Disclaimer: This post by no means is a comprehensive account of the conference. This is my personal take on QCon London 2015 and topics discussed, peppered with some of my own views, presented as a technical writing.

Microservices

Yeah I know you are fed up with hearing the word - but bear with me for a few minutes. Microservices reminded me of my past life: it is a syndrome. A medical syndrome when it is first being described, does not have to have the Aetiology and Pathophysiology all clear and explained - it is just a syndrome, a collection of signs and symptoms that occur together. In the medical world, there could be years between describing a syndrome and finding what and why.

And this is what we are dealing here within a different discipline: Microservice is an emerging pattern, a solution to a contextual problem that has indeed occurred. It is a phenomenon that we are still trying to figure out - a lot of head scratching is going on. So bear with it and I think we are for a good ride beyond all the hype.

Its two basic benefits are mainly: smaller deployment granularity enabling you to iterate faster and smaller domain to focus, understand and improve. For me the first is the key.

So here are a breakdown of few key aspects of the Microservices.

Conway, Conway, Where Art Thou

A re-occurring theme (and at points, ad nauseum) was that MS is the result of reversing cause and effect in the Conway's law and using it to your advantage: build smaller teams and your software will shape like it. So in essence, turning Conway's law on its head and use it as a tool to naturally result in a more loosely coupled architecture.

This by no means is new, Amazon has been doing this for a decade. Size of the teams are nicely defined by Jeff Bezos as "Two Pizza Teams". But what is the makeup of these teams and how do they operate? As again described by Amazon, they are made up of elements of a small company, a start-up, including developers, testers, BA, business representative and more importantly operations, aka Devops.

Another point stressed by Yoni Goldberg from Gilt and Andy Shoup was that the teams charge other teams for using their services and need to look after their finances. They found that doing this reduced costs of the team by 90% - mainly due to optimising cloud and computing costs.

Granularity: "fits in my head" (does it?)

One of the key challenges of Microservices was to define the granularity of a Microservice differentiating it from the traditional SOA. And it seems we have now up a definition: "its complexity fits one's head".

What? This to me is a non-definition and on any account, it is a poor definition (sorry Dan). After all, there is nothing more subjective than what fits one's head, is it? And whose head by the way? if it is me, I cannot keep track of what I ate for breakfast and lunch at the same time (if you know me personally, you must have noticed my small head) and then we get those giants that can master several disciplines or understand the whole of an uber-domain.

One of the key properties of a good definition is that it is tangible, unambiguous and objectively prescriptive. Jeff Bezos was not necessarily a Pizza lover to use it to define Amazon team sizes.

In the absence of any tangible definition, I am going to bring my own - why not? This is really how I feel like the granularity of a MS should be, having built one or two, and I am using tangible metrics to define it.

Granularity of Microservices - my definition

As it is evident, Cross-cutting concerns of a Microservice are numerous. From security, availability, performance to routing, versioning, discovery, logging and monitoring. For a lot of these concerns, you can rely on the existing platform or common platform-wide guidelines, tools and infrastructure. So the crux of the sizing of the Microservice is its core business functionality, otherwise with regard to non-functional requirements, it would share the same concerns as traditional services.

When not to Microservice

Yoni Goldberg from Gilt covered this subject to some level. He basically said do not start with Microservice, build them when your domain complexity warrants it. He went through his own experience and how they improved upon the ball of mud to nice discreet service and then how they exploded the number of services when their
So takeaways (with some personal salt and pepper) I would say is do NOT consider Microservice if:

you do not have the organisation structure (small cross functional teams)
you are not practising Devops, automated build and deployment
you do not have (or cannot have) an uber monitoring system telling you exactly what is happening
you have to carry along a legacy database
your domain is not too big

Microservices is an evolutionary process

Randy Shoup explained how the process towards Microservice has been an evolutionary one, usually starting with the Monolith. So he stressed "Evolution, not intelligent design" and how in such an environment, Governance (oh yeah, all ye Enterprise Architects listen up) is not the same as traditional SOA and is decentralised with its adoption purely based on how useful a practice/ is.

Optimised message protocols now a must

Frequented in more than a couple of talks, moving to ProtoBuf, Avro, Thrift or similar seems to be a must in all but trivial Microservice setups. One of the main performance challenges in MS scenarios is network latency and cost of serialisation/deserialisation over and over across multiple hops and JSON simply does not cut it anymore.

Source: Thrift vs Protobuf comparison (http://devres.zoomquiet.io/data/20091111011019/index.html)

Be ready to move your APIs to these message protocols - yes you lose some simplicity benefits but trading it off for performance is always a necessary evil to make. Rest assured nothing stops you to use JSON while developing and testing, but if your game is serious, start changing your protocols now - and I am too, item already added to the technical backlog.

What I was hoping to hear about and did not

Microservice registry and versioning best practices was not mentioned at all. I tried to quiz a few speakers on these but did not quite get a good answer. I suppose the space is open for grab.

Need for Composition Services/APIs

As experienced personally, in an MS environment you would end up with two different types of services: Functional Microservice where they own their data and are the authority in their business domain and Composition APIs which do not normally own any data and bring value by composing data from several other services - normally involving some level of business logic affecting the end user. In DDD terms, you could somehow find similarity with Facade services and Yoni used the word "mid-tier services".

Composition services can bring a lot of value when it comes to caching, pagination of data and enriching the information. They practically scatter the requests and gather the results back and compose the result - Fan-out is another term used here.

By inherently depending on many services, they are notoriously susceptible to performance outliers (will be discussed in the second post) and failure scenarios which might warrant a layered cache backed by soft storage with a higher expiry for fallback in case dependent service is down.

In the next post, we will look into topics below. We will discover why Docker in fact is closely related to the Microservices - and it is not what you think! [Do I qualify now to become a BusinessInsider journalist?]

Those pesky performance outliers
Containers, containers
Don't beat the dead Agile
Extra large memory computing is now a thing

Byte Rot

Friday, 3 April 2015

Utilisation and High Availability analysis: Containers for Microservices