Monday, 7 January 2013

Performance series: Is thread contention bad?

[Level T2] For better or worse, I have recently been working a lot on the area of performance optimisation and bottleneck hunting.

Performance monitoring is an art as much as it is a science. On one hand you have all various metrics (simple, complex or derived) that a system can produce (at each level, from the client to presentation layer to middleware to database) and on the other hand you require the art of comparison and the ability to decide to ignore or act upon a particular change. The work can be exciting at times while slow and frustrating at other times - finding performance bottlenecks could be difficult but when found they are extremely gratifying!

I am going to have a separate post on performance monitoring an ASP.NET web application but I will use this post to look at a particular scenario and that is the thread contention. This is especially important in the face of increasing popularity of asynchronous programming and use of .NET 4.0's TPL and .NET 4.5's async/await.

So let's imagine this is the problem statement:
From version 1 of the software to version 2 of the software, thread contention has increased substantially. 
And imagine you are responsible for your application's performance. What would you do?

What is Thread Contention?

In .NET, contention rate/second is the performance counter to measure the level of contention as far as the managed threads are concerned. This measures number of unsuccessful managed lock attempts per second. Where would you take a lock? At the time of synchronisation. So contention rate is more a measure of synchronisation rate rather than the multi-threading. 

To confirm the above statement, let's use a console application and monitor the contention rate / sec in the performance monitor for the console application process. Initially, let's try this code:

static void Main(string[] args)
{
    Parallel.For(1, 1000000, (i) => Math.Asin(Math.Log(i)));
    Console.Read();
}

In this case contention will stay at 0 since we have used multi-threading yet we do not have a shared resource requiring locking hence contention is zero. Second scenario is when we write the iterator/counter to the console:

static void Main(string[] args)
{
    Parallel.For(1, 1000000, (i) => Console.WriteLine(i));
    Console.Read();
}

Now performance monitor will show a very high number for contention rate (on my machine it is average of 200). But wait, we did not use any locking here?! Well, we did not but Console did - all calls to Console class is synchronised so it internally uses locking.

Can Thread Contention be healthy?

Absolutely! Contention is just a measurement and you should not base your judgement on Guide Metrics. Guide metrics such as "% Processor Time", "Contention Rate/sec", "Number of Threads" are to be looked at only in the presence of a deteriorated Goal Metrics such as throughput (related to scalability) and performance. For example increase in response time or a drop maximum number of concurrent requests/users is something to look at but increase in % Processor Time can be a sign of higher/better utilisation of the system in the case of removing a logical bottleneck (for example coarse locking).

So always look at guide metrics in the light of goal metrics.

An important example of this in case of contention is use of IO completion ports. Windows defines two types of threads: worker threads and IO completion ports. Worker threads are known to most developers and can be used for processing a piece of work in the background. .NET framework (and .NET runtime) has a ThreadPool which contains worker threads. IO completion ports are Windows's low overhead threads that can carry out an IO-bound operation and notify the commissioning thread when the job is done. These threads will be used by the .NET runtime as soon as you use .BeginXXX and .EndXXX on an IO related class such as FileStream.

ContentionTest is a simple project that illustrates use of IO completion ports. Just try it for yourself and see that setting async variable to true will improve the performance but will make the contention rate to go rocket high.

Conclusion

Always look at guide metrics objectively. Do not try to improve guide metrics, only focus on goal metrics. If a drop in goal metrics coincides with a change in a particular guide metrics, review the relationship.

Thread contention rate is a measurement of thread synchronisation. This rate can go up in various conditions such as increased throughput, use of IO completion ports or removal of a bottleneck in which case there is no cause for concern. If high contention coincides low throughput, consider refactoring the synchronisation.

Saturday, 8 December 2012

5 levels of media type

[Level C4HTTP spec defines use of media type in the value of serveral headers including content-type and Accept. Server and client can engage in the process of content negotiation to decide on the best suitable media type.

With the rising popularity of the REST systems and adoption of pure HTTP APIs, we are using media type not only for delivering content formatting information but also metadata for domain level application data. Advanced use of the media type and its controversies has been discussed before.

IANA is responsible for registering internet media types. RFC 4288 defines the process of media type registration through IANA.

As can be seen below, registration of media types increased by 2000 dot-com boom and then declined. This trend was reversed again by the REST awareness and resurgence of web 2.0 around 2006. Recently, we see a slight decline in the registrations - partly explained by the use of private media types.

Source: IANA - http://www.iana.org/assignments/contact-people.html
While there will always be a need for registering new formats, media type has been used to described not only the format but also the schema and domain level description of the application data. Use of media type for versioning resources is a controversial yet fairly popular trend.

The problem with creating new media types for describing anything other than format is that you you will be requiring the clients to understand the new media type - hence the client burden of the media type. Such clients could be very well capable of handling the format (and all they need might be to use understand the format) but unable to comprehend the media type. For example an XmlSerializer is capable of handling the XML format and that is all it cares about.

One such attempt to conserve the formatting information of the media types yet provide higher level constructs is the use of + in the second part of the media types such as application/rss+xml which combines formatting with schema (see below). But as I explained before many systems use a dictionary-based media type processing and cannot separate format and schema information and also this is rather a convention and not a canonical implementation.

I will review the logical levels of media type and then propose a solution for the current issues we are experiencing in the industry.

5 levels of media type

Media type deals with different levels of information. Any solution needs to take into account backward-compatibility, interoperability and extensibility. Media type can provide information at different levels. As the level goes up, the number of clients able to comprehend and interact with that level diminishes.

Lowest level of information is whether the content is human-readable. This was initially envisaged in the text/* media type but there again, it was mixing human-readability with the formatting. As we know text/xml and text/javascript later were converted to application/xml and application/javascript.


Next level is formatting, i.e. how a parser/processor can read and understand the media type. This is the most important aspect of a media type. Examples are application/xml, image/* and video/*.

Schema is a common superset of the formatting. Here we define different schema commonly within the same  format. Examples of this are application/rss+xml, application/atom+xml and application/collection+json.

Domain level is where we have a lot of new interest. Many companies are using private and public APIs for exposing their data and services. As such, a recent trend is to take the schema to the next level where it defines a domain object model. As we have described before here, domain model is part of the server's public domain and could be in the format of command or query messages.

Including version information for a domain object model is the highest level of a media type. This is useful by only a subset of the clients capable of version coherence and version content negotiation.

In fact clients each can use the media type at a particular level:
So forcing he higher levels of media type upon clients will reduce interoperability.

Solution

I propose using additional properties for preserving the interoperability and backward-compatibility yet allowing rich higher level information. Currently HTTP 1.1 allows for custom additional parameters to be defined.




For example, if I am using application/atom+xml for passing customer domain object model in a CRM API, I can keep the format in the value of the content-type header and include the rest of the information (Note: we have to replace / with _ since it is not allowed in the parameter value according to HTTP spec):
content-type: application%2fxml;schema=application%2fatom+xml;is-text=true;domain-model=MyDomain.Customer;version=1.0.2.0
Or alternatively use the schema level information as the main value:
content-type: application%2fatom+xml;format=application%2fxml;is-text=true;domain-model=MyDomain.Customer;version=1.0.2.0
Please note the values of parameters need to be UrlEncoded.

Compared to single-token approach, this will preserve the interoperability and backward-compatibility while allowing for extensibility.
Difference of single-token vs. 5LMT. Please note the the values need to be UrlEncoded so application/xml will appear as application%2fxml

Conclusion

Registering media types should be done mainly for new formats. The problem with using a single token is that by setting the token at a any level, lower levels need to be inferred - as such client needs to understand the exact media type.

5-level media type compared to the single token approach provides a more robust solution for extensibility and interoperability of clients and servers in private and public APIs.

Saturday, 24 November 2012

Introducing Client-Server Domain Separation


[Level C4]

If you have followed my posts on REST and its client-server implications, you already know I have a thing for the client-server relationship.

I have been thinking about Client-Server Domain Separation CSDS for a while. And I think it is time to do a brain dump.

TLDR;

So here is the definition CSDS - if you don't want to read the whole post. CSDS defines a single constraint which is just an expansion on REST's client-server constraint:
Client and server must define and live in their own bounded context
This will lead to 1) cleaner separation of concerns among clients and servers and 2) adoption of API as the building blocks of complex systems in an SOA world. CSDS is also not compatible with HATEOAS - as we will see. If you need to find out how such a seemingly trivial constraint can have such an impact, read the rest.

Background

REST defines a set of constraints that will lead to better architecture and design - well that is the claim but I personally do believe in it. One of those constrains is client-server. As far as REST is concerned, client and server are decoupled entities - these ideas were successfully used in the design of HTTP. Yet considering limitations of the clients back in the day REST dissertation was being written, I think we need to re-visit this constraint.

I suppose it all started with smartphones. We now have more processing power in our pockets than the Apollo that landed on the moon for the first time. Native apps allow for developing pretty complex applications while HTML/JS app has become a reality (better browsers, adoption of HTML5, better javascript runtime and development tools). 

The dilemma we are faced now is to decide where to implement a functionality (in other words put the business logic): client or server. Back in the late 90s or early 2000s we did not have a choice - we had to implement most of the functionality on the servers and use client-side code mainly for limited validation. We lived in a time of server dominance. Now we have the liberty to implement a sizeable chunk of functionality in both places but getting the balance right is difficult and has lead to mainly two opposing camps: Single Page Application followers and server domination supporters. I am more inclined towards the first but shall explain below why CSDS will lean more on the SPA side rather than server domination - although not as a matter of taste but as a matter of principle - question is whose concern is a functionality.

Other changes in the industry have contributed to the need to define client and server. Nowadays, it is only incidental that the client code in an html/js application is served from the server - we can package up javascript files with the application as in PhoneGap or Windows 8 metro applications. In mash-up applications, there is no single server defining the flow as such HATEOAS is meaningless.

Introduction

In here I briefly re-iterate what I explained in two related posts.

First of all, in order to decide where a functionality belongs - server or client - we need to understand whose concern it is. I talked about server-concern, client-server and mixed-concern and I explained their anti-patterns, each with an example.

In this post I tried to define client and server - as they stand now. So I am going to go back to the same definitions - with client definition slightly changed.

Server is responsible for defining a domain (server domain) and maintaining its state and consistency/integrity. Server is usually very complex but a good server hides its complexity behind its API. Server should not expose its internals to the outside world.

Server in CSDS


Client is responsible for using server(s) services to provide value to the end user - either directly if it is a client device or indirectly if client itself is a server. Client can also maintain a state but it is not its primary function.

We also touched on Application. For me application => value => user => client. Application, use and usability is mainly a client concern. Having said that, server will define a secondary level of API which could use underlying basic APIs and present a more useful representation of its state. As such, server is not completely oblivious to the user/value. For example, a high-street bank could have an API returning 10 most recent transactions on your account defined as a resource at /account/{id}/transaction/mostrecent,since this is a very common query. This is instead/or in addition to providing an API which allows client to define date ranges and number of transactions returned. The application sitting on a client device does have the liberty to show only 6 of them if its usability mandates such restriction.

CSDS definition

So CSDS can be seen as a superset style on the top of REST with a single constraint on REST's client-server constraint. This is similar to, for example, HATEOAS which builds upon hypermedia constraint. So the constraint is:
Client and server must define and live within their own bounded context.
In other words, decision to where put a functionality is to ascertain whose concern it is. Is that it? Yes, that is it. Yet, this is going to have quite a big impact as well as important repercussions.

First of all since each defines its own boundary, client's domain is separate from server's domain. Their interaction is only through the API. "Domain objects" in the diagram above are usually regarded as view-models which are a translated version of the corresponding models in these two different domains - which in DDD terms are called context map.

By keeping client and server in their own bounded context, internals of each can be changed independently. By separating the domains, we achieve the client-server decoupling which is the goal of the client-server constraint - as Fielding puts it.

So here are some of the aspects and implications of CSDS:

Client has full coherence of server's public domain

This means that client is free to have full coherence of server's public domain including all its public API, domain objects and schemata. It is able to call, discover and make full use of the public API in any order or fashion it needs.

Server is responsible for versioning its public domain

In CSDS, client building dependency on the top of the public domain is not regarded harmful and actually is seen as essential. Server already knows that by changing the public domain, it will be breaking clients as such server is responsible for visioning its public domain.

Server has got no clue about the client

In CSDS, server has no reliance on its knowledge about the client calling it. Of course, in HTTP, it can use user-agent header for statistical purposes. Or in the case of OAuth, it can know the name of the application and perhaps even limit the scope of the public API according to that but this is an authorization concern - authentication and authorization of the calls are server concerns. In other words, it should not make any assumptions about the client, client device or its capabilities.

Server got no clue about the client - one of the clients could be a server itself (not shown, could not find the original visio to add the server :( )

CSDS, HATEOAS and hypermedia

CSDS is not compatible with HATEOAS. Why? Well, HATEOAS talks about hypermedia (a server concern but part of public domain) as the engine of the application. What application? Server got no clue about it. When I am listening to Spotify, I can tweet the song I am listening to. Publishing this tweet is no different to doing this from a twitter client, TuneIn radio client, etc. Server does not know what application is using it (although as we said it could know the name of the application in OAuth as a string) or where in the application this tweet happens. As such it cannot be the engine of the application. Also in a mash-up application, no single server could be the engine - there are multiple servers. 

CSDS regards hypermedia an important aspect of REST; it is a semantic web of interconnected resources. Client will have full coherence of the axes of such relationships and can effectively use to navigate the semantic web - since it is part of the public domain. But for it to become the engine of the application is server dominance.

Server has a lot to worry about

CSDS acknowledges utmost complexity of the server. Reliable storage, big data, high availability HA, sharding, resilience, redundancy, etc are all server concerns. Implementing the right server-side architecture is not easy as such server is best to focus on its own concerns rather than dominating the client by implementing client's concerns too.

CSDS leads to a cleaner SOA, especially when client itself is a server

Recent server-side challenges and trends in achieving a scalable and highly available architecture has added the focus for achieving the right balance in the client-server separation.

Listening to Adrian Cockcroft's talk in Cambridge on Netflix's architecture and having read Daniel Jacobson's book, I have a lot of appreciation for what these guys are doing and I think this will become a roadmap for a cleaner and more decoupled SOA. Adrian explained how in Netflix, they have used a web of micro-SOA services through REST APIs to create a resilient architecture whereby they even send chaos monkeys and gorillas to bring down servers or even server zones. I believe this is only possible with separating domains of each micro-SOA service. So a lot of kudos to them and it is a place to watch.

Sunday, 11 November 2012

NoSQL Benchmarking - Redis, MongoDB, Cassandra, RavenDB and SQL Server

Introduction

[Level C2] In the last post, I explained how limitation can lead to a better solution. This is an integral part of the NoSQL offering for me: the fact that we cannot abuse it by storing logic as well as data.

In this post I am going to report my NoSQL benchmark results. There are quite a few benchmarks already reported and available out there but this one focuses on NoSQL offerings available on windows. If you are not a windows developer, you might still find the results useful. My benchmark treats all these technologies as key/value store - although most of them have many other features.

The code used for benchmarking is available in GitHub.

Disclaimer

In a distributed system, performance is not as important of scalability - which is not compared here. Take this for whatever it is worth. I have used a method (key/value storage/retrieval described below) which might or might not match the way you intend to use these technologies. 

Use a storage system that suits you best. This report does not necessarily recommend or disapprove a particular technology. Performance of the NoSQL stores are affected also by the client technology used. However, this is a price we normally pay so I think it is relevant to be included in the measurement. The variety in usage of these technologies mean some results might have been skewed by the serialisation techniques.

Each of these technologies have different degrees of availability, consistency and partition tolerance. They also present different settings that can affect these variables. As such, the result of this benchmark must be interpreted in the light of them.

 Contenders

Here I briefly explain the technologies compared.

Redis

Redis is a high throughput caching/nosql technology which is written in C. This is mainly available on linux but windows ports can be used although its replication currently not supported on windows. Client library of choice is fully-async library by uber-geek Marc Gravell called BookSleeve. There is an alternate library available which is part of ServiceStack.

I used Redis port by MSOpenTech which can be downloaded from here. Full instruction for installation and running it is provided in there. I have used all the settings out of the box. Redis provides two different persistence mechanisms: RDB and AOF. RDB is faster and default but affects reliability depending on what you need from it.

To clear the data, just delete inst1 folder. Version used was 2.4.11.

MongoDB

MongoDB is written in C++ and it has a stable port for windows. It is a classic document database and provides its own query language which has been abstracted away nicely by NoRM library. It provides many nice querying features which we do not use here.

Downloading and installation easy - unzip it. You just need to create the folder C:\data\db which is the default storage area and then run the mongod.exe.

To wipe out the data, just delete contents of C:\data\db folder. Version used was 2.0.2.

RavenDB

RavenDB is an emerging document database fully written in C#. It comes with its own client library which uses HTTP for communication. It is a transactional database which is an important feature. It also has features such as map-reduce.

Downloading easy and no installation required. Just unzip the package and run Raven.Server.exe. To wipe out the data you just need to delete data folder. Version used was 1.0.960.

UPDATE
RavenDB's recommended approach is to open and close the session every time which I also used for tests (there is a default cap of 30 operations per session). I also tried a single session for the whole lot but performance was actually worse.

Cassandra

Out of all NoSql stores I know, this one looks more like RDBMS - and I know it the least. It is written in Java and is schema-full. Its power is high throughput and ability for unlimited scale-out unlike conventional RDBMS.

There are currently two client libraries available for accessing Cassandra which I used Fluent Cassandra by Nick Berardi. Version of the Cassandra used was 1.1.6.

SQL Server

OK, this one is a conventional RDBMS! But nothing stops you from using it as a key/value store and as we will see it competes very well with NoSql stores - while being transactional.

There are tens, if not hundreds, of libraries for accessing SQL Server and I used none of them. My approach was raw ADO.NET over store procedures. I kept the connection open - normally we would not do that but with a single connection we would not be using connection pooling so I think my approach more realistic.

I have a script that generates the table and stored procedures for database "benchmark". Please find/replace if your database is called something else. To empty the table I used truncate. Version of the SQL Server was SQL Server Express 2008.

Methods

I used my personal laptop for test which had 6GB of RAM and a 256 GB Samsung SSD. CPU would never reach 100% as the test was single-threaded. I am planning to run another sets of tests in a multi-thread fashion.

SQL Server Express and Cassandra were running as service while others where running as normal exe (similar to daemon). All of servers were used with out of the box settings. They all run on their standard port. All servers running on localhost so no network latency incurred.

I used a GUID string as the key (with no hyphens) and a randomised byte array of 4-20KB (random size) as the value. The process without storage had negligible effect - taking 0.2 millisecond per operation. Each operation consists of inserting the value against the key, retrieving value using the key and then asking for a non-existent key.

I ran operations for 10,000 and got the average for each operation measured in milliseconds (see results).

Serialisation or conversion to base64 would happen for all but SQL Server and Redis.

If you are running the test yourselves, make sure you run the exe outside IDE (Visual Studio) since RavenDB will look to perform very poorly in this case as non-existence key searches throw exception.

Results

Since performance degrades when there are more items in database, I generated results in two scenarios: empty database, database with 200,000 items in the same collection/table/familyColumn, etc.

This is the result for the empty database:



As can be seen, Redis is ultra fast (0.7ms) while RavenDB is slowest with 11.3ms:



Performance degrades in some stores when we have more items in the store but ordering do not change. Redis still shines with the best performance and RavenDB is slow compared to the rest. So this is the result when each store already contains 200,000 items (this is especially marked in SQL Server):



This is the breakdown of the results:




Conclusion

First of all, it is not about who is the fastest, it is about making an informed decision considering all parameters including speed. When choosing a NoSQL you would consider other factors which do not come into this benchmark and in fact some cannot be benchmarked.

In terms of results, Redis in all scenarios provides the best performance. I think this also in part is due to its excellent totally async client library. Marc Gravell has done a ton of work to make the client efficient.

Performance of MongoDB, Cassandra and SQL Server are close. SQL Server proves to be a valuable tool as a simple key/value scenario, if you already pay for its license. RavenDB is the slowest of all  - considering it is still under development.

Feel free to run the test yourselves. Find the code on GitHub here.

In my next series of tests, I will run the tests in a multi-threaded fashion.

Sunday, 28 October 2012

How limitation can be source of goodness - NoSQL, REST and more

[Level C3]

It just dawned on me. I came to a startling realisation that limitation/restriction/constraint  - which are words with negative connotation - can/will generate creativity and lead to goodness. This is more or less is saying "less is more". But looking at it from another angle.

Story of twitter

I do not know about you but I think Twitter is one of the biggest inventions of recent decades, somewhere along the lines of Gutenberg's printing or Priestly's Soda. Regardless of what most think about the revolution of social networking, I believe Twitter is not of the same breed - it is not a compressed facebook.

Twitter is centred around a stupidly simple idea: you have 140 characters to express yourself. No more. Yes, it also has re-tweet, follow, favourite, etc but these are features, if you remove them twitter will be more or less twitter although its usefulness will be limited. But if you remove the 140 character limitation suddenly it is not twitter anymore. Twitlonger is not twitter. With all due respect to @daltonc app.net with a limitation much different from 140 characters will not make a new twitter.

But why? Because by limiting you to use only 140 characters, you are forced to express yourself more succinctly. Makes you think really hard about what you want to say and remove all that matters less. Limitation leads you towards the ethos of twitter. You cannot write a line in the terms and condition of twitter  asking users to write only intelligent tweets but twitter has achieved this by enforcing its 140 characters limitation.

Why NoSQL matters

They say behind every successful man, there is a powerful woman. And I would say behind every inflexible and crippling architecture in the enterprise there is a big legacy database. Database becomes legacy soon after the first release, since other layers change but database cannot keep up. What I mean by database is an RDBMS database (SQL Server, Oracle, you name it.

We have cut down our processes, we do agile, we do lean, we have got continuous integration, we do unit testing and TDD, we do BDD and continuous deployment, we do ... we have built a process to minimise the risk and impact of change. Yet it is so difficult to change and the hardest to change is database.

The crux of this issue goes to the fact that business logic creeps into database. I recently witnessed how a complex calculation had to be done in an inaccurate rounding since it had to match the database calculation logic - yes there was calculation in database. 

We all believe that we should not put the business logic in database. But why do we keep doing it? Because we can. The problem is best practices and disciplines are difficult to enforce when project is late, we have a critical bug to fix or we need to make a quick change to the system and easiest solution is to change the stored procedure.

As far as SQL Server is concerned, we can have intra-model logic (calculated fields), domain-wide logic (stored procedure and user-defined functions) and cross-boundary logic (service broker) - and you can even deploy compiled code (SQL CLR). And since we can, we will.

And here is why I think NoSQL is useful - apart from all the hype around it. You simply cannot put business logic in it, so you don't. And this will lead into better design. So I like NoSQL not because of what I can do but because of what I cannot do. I would agree with Stonebraker's article and believe that NoSQL technologies could be low-tech compared to hi-tech RDBMS, but I cannot abuse them in the same way. NoSQL fully focuses on storage and retrieval and not replicating all those features that should not be implemented in a database (XML Manipulation, Message Bus, logic etc)

A table in SQL can translate to 5-6 Redis data structures so that you could effectively query and access data. So there is more work to be done but I like it since I would really think about what I need to store and what I need to query back (remember twitter?).

Lessons from REST

Regardless of all the bloated hype and endless controversies in interpreting REST, it works. It just simply works. So we have a set of constraints and if you follow them it will lead to goodness. Example? HTTP.

REST is an absolute example of how constraints can lead to a better design.

Limitation in creative arts

I am big fan of minimalism and Philip Glass is one of my favourite contemporary composers. In minimalism, a compact musical idea is repeated to create rhythm, melody and ultimately harmony - and this is usually created using layers of repetitive chords. For me who love minimal music, Satyagraha opera is the pinnacle of minimal musical expression through a limiting set of musical material. 

Apart from musical material, limitation in number of instruments is also an important aspect. String quartet is one of the most expressive forms of music - and I just love it be it Beethoven or Shostakovich. Rock counterpart for the string quartet is probably the rock trio (singer as an instrument?) where some of the best music ever produced (from Jimi Hendrix and Cream to Rage Against The Machine and Nirvana). 

In social and political terms, we have experienced an explosion of modern and beautiful art in the eastern bloc during the oppression of communist governments. Composers and film directors had to find their own language to express their art. Since they could no longer look outside for inspiration, they turned inside and a new era of creativity and great art flourished: Andrei Tarkovsky, Istvan SzaboAndrzej Wajda, Shepitko and many more. Shostakovich arguably produced his best works during the fierce Stalinist oppression. It is interesting that when the restrictions are lifted, artist is no longer able to produce the same quality of works. Wajda's Iron Man seemed like just a shabby copy of Marble Man. And Tarkovsky's two last films made outside Russia did not feel like the previous ones. 

The same oppression created the new wave of Iran Cinema with the likes of Makhmalbaf, Mehrjui, Kiarostami and others.

Now should we create an oppressive government so that we get a great artistic output?! No, but perhaps we can have a 60's style drug revolution which does the same :)


Monday, 22 October 2012

Media type: how much can you cram into a single token?

[Level C4]

Introduction

This post discusses the problems associated with the use of a single token as media type (usually as the main value of the Content-Type header in HTTP response or Accept header in request) to describe all attributes of the content.

Motivation and background

This has been bugging me for a while. But recently I engaged in a discussion on twitter with Glenn Block @gblock and the rest of the REST enthusiast community on the options in versioning RESTful services. There are generally 2 camps: those advocating using Content Negotiation for versioning (putting version number in Content-Type header) and those preferring to stick to classic resource based versioning (including version number in the URL). Regardless of which one is better, MediaType lacks the richness required to express a media type and adding version information to a media type is not possible considering current status of the media type.

One of the main problems associated with the use of media type is its current implementation in various systems is key based, i.e. it involves matching all or none of the media type. As we will see this causes considerable problems in effective consumption of media types.

Media Type

Media type has been described in various RFCs (main one being RFC 2046) while historically these have been limited what is known as MIME types. RFC 4288 defines the procedure for registering the media types describing a formal process which needs to be followed to publicly register.

Registering a media type for a public API is all well and good but as described by this book, use of private APIs far exceeds use of public ones and registering all media types exposed within private APIs is impractical and unwarranted.

Also with popularity of REST-based APIs, there are going to be more and more service endpoints exposed. If all such services are to define new media types, we would have an explosion of media types rendering current implementation of content negotiation 

Media type is a case of an extreme semantic mix-up. A single token has been used to express many different facets of a media type. In fact the semantic space with all its axes will contain many useful points yet industry currently uses a very sparse set of points defined as media type values. Rest of this space is unusable - as such a very inefficient solution.

We will now have a look at facets/axes.

1- Human-illegibility

This is the lowest and least specific level of semantic definition of a media type. It is very simple: content of a media type can be read by a human (for example text/plain, application/xml or application/json) or the data is meant for the machine comprehension or rendering (for example image/png or video/mpeg)

Having this information separate to the actual media type can help tools such as Fiddler to decide whether they can display text of the content whose media type is unknown to the tool. Media types initially used "text" to denote such information (e.g. text/xml or text/javascript) but these have been replaced with 

2- Formatting

This is the most common and important axis of a media type information which informs the tools/clients which parser/interpreter/renderer to use for consuming such content. text/plainapplication/xml, application/json, image/png or video/mpeg are all examples of such use of the media type. 

There are several known vendor-specific media types in this space such as application/vnd.ms-excel.

3- Schema

This is a further specialisation of the formatting. Common examples include application/rss+xml or application/hal+json. Basically these mean that in terms of formatting, they are the same as their parent (application/xml or application/json) yet they follow a superset schema. Use of + sign - as far as I know - is not canonical and is merely a convention followed by the industry to add schema to the established formats. Comprehension of this convention would be crucial to correct interpretation of the media type without the need for having a dictionary of all possible values, however, I believe most tools we have at the moment lack such features.

4- Domain/Vendor specific

This is where we see most of the expansion in the media type space. Basically you could output your own media type via your private API. Since you will be the main consumer of the API, integration could be easy but it is very common for private APIs to go public - especially if they are successful. An example of such media types can be found here.

5- Versioning

Versioning is the highest aspect of a media type which is normally added to Domain-specific media types. This is a popular solution to the Web API versioning problem.

For example, you could have application/mydomain.customer.1.1 as opposed to application/mydomain.customer or application/mydomain.customer.1.0

So where is the problem?

Basically information gets lost.

First problem is that clients might be interested in a lower order of these aspects of media type while in order to consume the resource, they are forced to comprehend higher order and extract the axes they are interested in. For example, a tool such as fiddler could be only interested in only whether it could display the information for the end user as plain text. A client capable of consuming XML and deserialising to objects is only interested at knowing whether it is XML while it might be represented with a media type which is essentially XML but has a different value. On the other hand, if a server uses HAL to send domain objects/view models to the client, either it has to use the standard application/hal+json or use the domain level name of the media type (with or without a version).

Another problem is that the content negotiation process will become more complex. In the lack of a standard in defining multi-axial media types, most systems implement a dictionary based rule on content negotiation as such maintaining list of possible content types becomes a burdensome task.

A solution

Basically I believe we can solve this by keeping the common media types but use media type extensions in the Content-Type header (or in the Accept header). For example:
Content-Type: application/xml; human-illegible=true; domain-name=customer; domain-version=1.1
This will ensure that existing clients and servers will not break while new clients and servers can use new extensions for content negotiation and more loosely coupled resource consumption. I will try to expand upon this idea in another post.

Conclusion

Cramming as much as information into a single token and then try parsing that one token is not a good idea especially when it comes to media type which is the communication bridge between loosely coupled world of HTTP clients and servers.

Media type token value covers 5 different aspects of the resource and separating the concerns of breaking these aspects into their own tokens can result in more robust and decoupled systems.

Saturday, 20 October 2012

How PayPal is helping Iranian government's internet censorship

[Level N]

I have difficulty believing what I read in Hacker News around the same time my account was blocked and closed by PayPal trying to pay 3$ for internet proxy to combat filtering which has rendered internet pretty much useless in Iran. Did he read my email? I do not know but it left me frustrated and hopeless in one of the most difficult times in my life.

As some of you might know, I have been going through rough times for the last 18 months. My mother in law   passed away 3 weeks ago after a long battle during which my wife spent mostly with her - which is a consolation.

So when I visited her and family in Iran (that's where I am from) around 2 months ago, I realised that internet censorship is so bad that it has become really unusable. Sites such as twitter which I am addicted to are obviously blocked as they played an important role in Iran's suppressed Green Revolution, arguably first Twitter Revolution in world . In fact even this blog that you are reading and anything hosted on Google's blogspot is filtered - Iran had highest number of bloggers in the Middle East and a number of them are in prison. But even Gmail gets its share and gets blocked from time to time. According to some reports, 40% of Iranians (30 million) use internet which is second in Middle East after Israel, and as far as I know, most of them use either internet proxies, anti-filters, anonymisers and VPNs to bypass the censorship. So even if we say only half use anti-censorship tools, we are talking about 15 million people. I had actually setup a dedicated PC in UK for my wife to use as remote desktop connection (RDP) since sometimes these proxies are found by the government and blocked. But slow speed of internet (intentionally kept low) makes it almost useless since each refresh of the screen takes a few seconds.

So how does PayPal come into this? Most of these companies only accept PayPal. And PayPal blocks all accounts if it realises the IP is from Iran. Regardless of the amount, who is the receiver, how long the account has been used or behaviour of the account. Why? That is a a very good question, but maybe because someone is trying to purchase nuclear equipments or funding terrorism or ...! Honestly is that not silly? Paying 3$ for anti-censorship filter is illegal because you are connected from Iran? If anyone wants to use their PayPal account for illegal activities, they sure will use a proxy first so that they mask their IP. This will only affect ordinary people like me that are trying to pay for proxies as surely with the sanctions, you cannot buy anything that ships to Iran.

Now out of everyone, I am among the people least would wish to help Iran's government. My family was struck by this very government when my uncle working as Political Analyst for British Embassy was arrested by the authorities charged with spying back in 2009. He was then released after months but due to constant pressure and persecution from the authorities he had to flee from Iran and now has resumed his work in the Foreign Office in UK.

So where does this leave me?

Well it leaves my account blocked and closed. Having come back, I cannot use my account anymore. Emails I have sent have been responded with utter disinterent and "We don't care" attitude. And I have really hard time believing whether PayPal CEO does read complaint emails. I think I might have been rash with my tone in some emails but my frustration was extreme because of the unjustice.

But I am only one in many. Lives of many millions of Iranians have been affected by sanctions. Ordinary people suffer from the hands of the brutal government yet they find no consolation by the way they are treated outside Iran especially PayPal. For them, internet is the only way out of the oppression but blocking purchase of anti-censorship accounts is standing side-by-side with the Iranian regime. Does this make you happier Mr. David Marcus?