Saturday 24 November 2012

Introducing Client-Server Domain Separation


[Level C4]

If you have followed my posts on REST and its client-server implications, you already know I have a thing for the client-server relationship.

I have been thinking about Client-Server Domain Separation CSDS for a while. And I think it is time to do a brain dump.

TLDR;

So here is the definition CSDS - if you don't want to read the whole post. CSDS defines a single constraint which is just an expansion on REST's client-server constraint:
Client and server must define and live in their own bounded context
This will lead to 1) cleaner separation of concerns among clients and servers and 2) adoption of API as the building blocks of complex systems in an SOA world. CSDS is also not compatible with HATEOAS - as we will see. If you need to find out how such a seemingly trivial constraint can have such an impact, read the rest.

Background

REST defines a set of constraints that will lead to better architecture and design - well that is the claim but I personally do believe in it. One of those constrains is client-server. As far as REST is concerned, client and server are decoupled entities - these ideas were successfully used in the design of HTTP. Yet considering limitations of the clients back in the day REST dissertation was being written, I think we need to re-visit this constraint.

I suppose it all started with smartphones. We now have more processing power in our pockets than the Apollo that landed on the moon for the first time. Native apps allow for developing pretty complex applications while HTML/JS app has become a reality (better browsers, adoption of HTML5, better javascript runtime and development tools). 

The dilemma we are faced now is to decide where to implement a functionality (in other words put the business logic): client or server. Back in the late 90s or early 2000s we did not have a choice - we had to implement most of the functionality on the servers and use client-side code mainly for limited validation. We lived in a time of server dominance. Now we have the liberty to implement a sizeable chunk of functionality in both places but getting the balance right is difficult and has lead to mainly two opposing camps: Single Page Application followers and server domination supporters. I am more inclined towards the first but shall explain below why CSDS will lean more on the SPA side rather than server domination - although not as a matter of taste but as a matter of principle - question is whose concern is a functionality.

Other changes in the industry have contributed to the need to define client and server. Nowadays, it is only incidental that the client code in an html/js application is served from the server - we can package up javascript files with the application as in PhoneGap or Windows 8 metro applications. In mash-up applications, there is no single server defining the flow as such HATEOAS is meaningless.

Introduction

In here I briefly re-iterate what I explained in two related posts.

First of all, in order to decide where a functionality belongs - server or client - we need to understand whose concern it is. I talked about server-concern, client-server and mixed-concern and I explained their anti-patterns, each with an example.

In this post I tried to define client and server - as they stand now. So I am going to go back to the same definitions - with client definition slightly changed.

Server is responsible for defining a domain (server domain) and maintaining its state and consistency/integrity. Server is usually very complex but a good server hides its complexity behind its API. Server should not expose its internals to the outside world.

Server in CSDS


Client is responsible for using server(s) services to provide value to the end user - either directly if it is a client device or indirectly if client itself is a server. Client can also maintain a state but it is not its primary function.

We also touched on Application. For me application => value => user => client. Application, use and usability is mainly a client concern. Having said that, server will define a secondary level of API which could use underlying basic APIs and present a more useful representation of its state. As such, server is not completely oblivious to the user/value. For example, a high-street bank could have an API returning 10 most recent transactions on your account defined as a resource at /account/{id}/transaction/mostrecent,since this is a very common query. This is instead/or in addition to providing an API which allows client to define date ranges and number of transactions returned. The application sitting on a client device does have the liberty to show only 6 of them if its usability mandates such restriction.

CSDS definition

So CSDS can be seen as a superset style on the top of REST with a single constraint on REST's client-server constraint. This is similar to, for example, HATEOAS which builds upon hypermedia constraint. So the constraint is:
Client and server must define and live within their own bounded context.
In other words, decision to where put a functionality is to ascertain whose concern it is. Is that it? Yes, that is it. Yet, this is going to have quite a big impact as well as important repercussions.

First of all since each defines its own boundary, client's domain is separate from server's domain. Their interaction is only through the API. "Domain objects" in the diagram above are usually regarded as view-models which are a translated version of the corresponding models in these two different domains - which in DDD terms are called context map.

By keeping client and server in their own bounded context, internals of each can be changed independently. By separating the domains, we achieve the client-server decoupling which is the goal of the client-server constraint - as Fielding puts it.

So here are some of the aspects and implications of CSDS:

Client has full coherence of server's public domain

This means that client is free to have full coherence of server's public domain including all its public API, domain objects and schemata. It is able to call, discover and make full use of the public API in any order or fashion it needs.

Server is responsible for versioning its public domain

In CSDS, client building dependency on the top of the public domain is not regarded harmful and actually is seen as essential. Server already knows that by changing the public domain, it will be breaking clients as such server is responsible for visioning its public domain.

Server has got no clue about the client

In CSDS, server has no reliance on its knowledge about the client calling it. Of course, in HTTP, it can use user-agent header for statistical purposes. Or in the case of OAuth, it can know the name of the application and perhaps even limit the scope of the public API according to that but this is an authorization concern - authentication and authorization of the calls are server concerns. In other words, it should not make any assumptions about the client, client device or its capabilities.

Server got no clue about the client - one of the clients could be a server itself (not shown, could not find the original visio to add the server :( )

CSDS, HATEOAS and hypermedia

CSDS is not compatible with HATEOAS. Why? Well, HATEOAS talks about hypermedia (a server concern but part of public domain) as the engine of the application. What application? Server got no clue about it. When I am listening to Spotify, I can tweet the song I am listening to. Publishing this tweet is no different to doing this from a twitter client, TuneIn radio client, etc. Server does not know what application is using it (although as we said it could know the name of the application in OAuth as a string) or where in the application this tweet happens. As such it cannot be the engine of the application. Also in a mash-up application, no single server could be the engine - there are multiple servers. 

CSDS regards hypermedia an important aspect of REST; it is a semantic web of interconnected resources. Client will have full coherence of the axes of such relationships and can effectively use to navigate the semantic web - since it is part of the public domain. But for it to become the engine of the application is server dominance.

Server has a lot to worry about

CSDS acknowledges utmost complexity of the server. Reliable storage, big data, high availability HA, sharding, resilience, redundancy, etc are all server concerns. Implementing the right server-side architecture is not easy as such server is best to focus on its own concerns rather than dominating the client by implementing client's concerns too.

CSDS leads to a cleaner SOA, especially when client itself is a server

Recent server-side challenges and trends in achieving a scalable and highly available architecture has added the focus for achieving the right balance in the client-server separation.

Listening to Adrian Cockcroft's talk in Cambridge on Netflix's architecture and having read Daniel Jacobson's book, I have a lot of appreciation for what these guys are doing and I think this will become a roadmap for a cleaner and more decoupled SOA. Adrian explained how in Netflix, they have used a web of micro-SOA services through REST APIs to create a resilient architecture whereby they even send chaos monkeys and gorillas to bring down servers or even server zones. I believe this is only possible with separating domains of each micro-SOA service. So a lot of kudos to them and it is a place to watch.

Sunday 11 November 2012

NoSQL Benchmarking - Redis, MongoDB, Cassandra, RavenDB and SQL Server

Introduction

[Level C2] In the last post, I explained how limitation can lead to a better solution. This is an integral part of the NoSQL offering for me: the fact that we cannot abuse it by storing logic as well as data.

In this post I am going to report my NoSQL benchmark results. There are quite a few benchmarks already reported and available out there but this one focuses on NoSQL offerings available on windows. If you are not a windows developer, you might still find the results useful. My benchmark treats all these technologies as key/value store - although most of them have many other features.

The code used for benchmarking is available in GitHub.

Disclaimer

In a distributed system, performance is not as important of scalability - which is not compared here. Take this for whatever it is worth. I have used a method (key/value storage/retrieval described below) which might or might not match the way you intend to use these technologies. 

Use a storage system that suits you best. This report does not necessarily recommend or disapprove a particular technology. Performance of the NoSQL stores are affected also by the client technology used. However, this is a price we normally pay so I think it is relevant to be included in the measurement. The variety in usage of these technologies mean some results might have been skewed by the serialisation techniques.

Each of these technologies have different degrees of availability, consistency and partition tolerance. They also present different settings that can affect these variables. As such, the result of this benchmark must be interpreted in the light of them.

 Contenders

Here I briefly explain the technologies compared.

Redis

Redis is a high throughput caching/nosql technology which is written in C. This is mainly available on linux but windows ports can be used although its replication currently not supported on windows. Client library of choice is fully-async library by uber-geek Marc Gravell called BookSleeve. There is an alternate library available which is part of ServiceStack.

I used Redis port by MSOpenTech which can be downloaded from here. Full instruction for installation and running it is provided in there. I have used all the settings out of the box. Redis provides two different persistence mechanisms: RDB and AOF. RDB is faster and default but affects reliability depending on what you need from it.

To clear the data, just delete inst1 folder. Version used was 2.4.11.

MongoDB

MongoDB is written in C++ and it has a stable port for windows. It is a classic document database and provides its own query language which has been abstracted away nicely by NoRM library. It provides many nice querying features which we do not use here.

Downloading and installation easy - unzip it. You just need to create the folder C:\data\db which is the default storage area and then run the mongod.exe.

To wipe out the data, just delete contents of C:\data\db folder. Version used was 2.0.2.

RavenDB

RavenDB is an emerging document database fully written in C#. It comes with its own client library which uses HTTP for communication. It is a transactional database which is an important feature. It also has features such as map-reduce.

Downloading easy and no installation required. Just unzip the package and run Raven.Server.exe. To wipe out the data you just need to delete data folder. Version used was 1.0.960.

UPDATE
RavenDB's recommended approach is to open and close the session every time which I also used for tests (there is a default cap of 30 operations per session). I also tried a single session for the whole lot but performance was actually worse.

Cassandra

Out of all NoSql stores I know, this one looks more like RDBMS - and I know it the least. It is written in Java and is schema-full. Its power is high throughput and ability for unlimited scale-out unlike conventional RDBMS.

There are currently two client libraries available for accessing Cassandra which I used Fluent Cassandra by Nick Berardi. Version of the Cassandra used was 1.1.6.

SQL Server

OK, this one is a conventional RDBMS! But nothing stops you from using it as a key/value store and as we will see it competes very well with NoSql stores - while being transactional.

There are tens, if not hundreds, of libraries for accessing SQL Server and I used none of them. My approach was raw ADO.NET over store procedures. I kept the connection open - normally we would not do that but with a single connection we would not be using connection pooling so I think my approach more realistic.

I have a script that generates the table and stored procedures for database "benchmark". Please find/replace if your database is called something else. To empty the table I used truncate. Version of the SQL Server was SQL Server Express 2008.

Methods

I used my personal laptop for test which had 6GB of RAM and a 256 GB Samsung SSD. CPU would never reach 100% as the test was single-threaded. I am planning to run another sets of tests in a multi-thread fashion.

SQL Server Express and Cassandra were running as service while others where running as normal exe (similar to daemon). All of servers were used with out of the box settings. They all run on their standard port. All servers running on localhost so no network latency incurred.

I used a GUID string as the key (with no hyphens) and a randomised byte array of 4-20KB (random size) as the value. The process without storage had negligible effect - taking 0.2 millisecond per operation. Each operation consists of inserting the value against the key, retrieving value using the key and then asking for a non-existent key.

I ran operations for 10,000 and got the average for each operation measured in milliseconds (see results).

Serialisation or conversion to base64 would happen for all but SQL Server and Redis.

If you are running the test yourselves, make sure you run the exe outside IDE (Visual Studio) since RavenDB will look to perform very poorly in this case as non-existence key searches throw exception.

Results

Since performance degrades when there are more items in database, I generated results in two scenarios: empty database, database with 200,000 items in the same collection/table/familyColumn, etc.

This is the result for the empty database:



As can be seen, Redis is ultra fast (0.7ms) while RavenDB is slowest with 11.3ms:



Performance degrades in some stores when we have more items in the store but ordering do not change. Redis still shines with the best performance and RavenDB is slow compared to the rest. So this is the result when each store already contains 200,000 items (this is especially marked in SQL Server):



This is the breakdown of the results:




Conclusion

First of all, it is not about who is the fastest, it is about making an informed decision considering all parameters including speed. When choosing a NoSQL you would consider other factors which do not come into this benchmark and in fact some cannot be benchmarked.

In terms of results, Redis in all scenarios provides the best performance. I think this also in part is due to its excellent totally async client library. Marc Gravell has done a ton of work to make the client efficient.

Performance of MongoDB, Cassandra and SQL Server are close. SQL Server proves to be a valuable tool as a simple key/value scenario, if you already pay for its license. RavenDB is the slowest of all  - considering it is still under development.

Feel free to run the test yourselves. Find the code on GitHub here.

In my next series of tests, I will run the tests in a multi-threaded fashion.