Sunday, 11 November 2012

NoSQL Benchmarking - Redis, MongoDB, Cassandra, RavenDB and SQL Server

Introduction

[Level C2] In the last post, I explained how limitation can lead to a better solution. This is an integral part of the NoSQL offering for me: the fact that we cannot abuse it by storing logic as well as data.

In this post I am going to report my NoSQL benchmark results. There are quite a few benchmarks already reported and available out there but this one focuses on NoSQL offerings available on windows. If you are not a windows developer, you might still find the results useful. My benchmark treats all these technologies as key/value store - although most of them have many other features.

The code used for benchmarking is available in GitHub.

Disclaimer

In a distributed system, performance is not as important of scalability - which is not compared here. Take this for whatever it is worth. I have used a method (key/value storage/retrieval described below) which might or might not match the way you intend to use these technologies. 

Use a storage system that suits you best. This report does not necessarily recommend or disapprove a particular technology. Performance of the NoSQL stores are affected also by the client technology used. However, this is a price we normally pay so I think it is relevant to be included in the measurement. The variety in usage of these technologies mean some results might have been skewed by the serialisation techniques.

Each of these technologies have different degrees of availability, consistency and partition tolerance. They also present different settings that can affect these variables. As such, the result of this benchmark must be interpreted in the light of them.

 Contenders

Here I briefly explain the technologies compared.

Redis

Redis is a high throughput caching/nosql technology which is written in C. This is mainly available on linux but windows ports can be used although its replication currently not supported on windows. Client library of choice is fully-async library by uber-geek Marc Gravell called BookSleeve. There is an alternate library available which is part of ServiceStack.

I used Redis port by MSOpenTech which can be downloaded from here. Full instruction for installation and running it is provided in there. I have used all the settings out of the box. Redis provides two different persistence mechanisms: RDB and AOF. RDB is faster and default but affects reliability depending on what you need from it.

To clear the data, just delete inst1 folder. Version used was 2.4.11.

MongoDB

MongoDB is written in C++ and it has a stable port for windows. It is a classic document database and provides its own query language which has been abstracted away nicely by NoRM library. It provides many nice querying features which we do not use here.

Downloading and installation easy - unzip it. You just need to create the folder C:\data\db which is the default storage area and then run the mongod.exe.

To wipe out the data, just delete contents of C:\data\db folder. Version used was 2.0.2.

RavenDB

RavenDB is an emerging document database fully written in C#. It comes with its own client library which uses HTTP for communication. It is a transactional database which is an important feature. It also has features such as map-reduce.

Downloading easy and no installation required. Just unzip the package and run Raven.Server.exe. To wipe out the data you just need to delete data folder. Version used was 1.0.960.

UPDATE
RavenDB's recommended approach is to open and close the session every time which I also used for tests (there is a default cap of 30 operations per session). I also tried a single session for the whole lot but performance was actually worse.

Cassandra

Out of all NoSql stores I know, this one looks more like RDBMS - and I know it the least. It is written in Java and is schema-full. Its power is high throughput and ability for unlimited scale-out unlike conventional RDBMS.

There are currently two client libraries available for accessing Cassandra which I used Fluent Cassandra by Nick Berardi. Version of the Cassandra used was 1.1.6.

SQL Server

OK, this one is a conventional RDBMS! But nothing stops you from using it as a key/value store and as we will see it competes very well with NoSql stores - while being transactional.

There are tens, if not hundreds, of libraries for accessing SQL Server and I used none of them. My approach was raw ADO.NET over store procedures. I kept the connection open - normally we would not do that but with a single connection we would not be using connection pooling so I think my approach more realistic.

I have a script that generates the table and stored procedures for database "benchmark". Please find/replace if your database is called something else. To empty the table I used truncate. Version of the SQL Server was SQL Server Express 2008.

Methods

I used my personal laptop for test which had 6GB of RAM and a 256 GB Samsung SSD. CPU would never reach 100% as the test was single-threaded. I am planning to run another sets of tests in a multi-thread fashion.

SQL Server Express and Cassandra were running as service while others where running as normal exe (similar to daemon). All of servers were used with out of the box settings. They all run on their standard port. All servers running on localhost so no network latency incurred.

I used a GUID string as the key (with no hyphens) and a randomised byte array of 4-20KB (random size) as the value. The process without storage had negligible effect - taking 0.2 millisecond per operation. Each operation consists of inserting the value against the key, retrieving value using the key and then asking for a non-existent key.

I ran operations for 10,000 and got the average for each operation measured in milliseconds (see results).

Serialisation or conversion to base64 would happen for all but SQL Server and Redis.

If you are running the test yourselves, make sure you run the exe outside IDE (Visual Studio) since RavenDB will look to perform very poorly in this case as non-existence key searches throw exception.

Results

Since performance degrades when there are more items in database, I generated results in two scenarios: empty database, database with 200,000 items in the same collection/table/familyColumn, etc.

This is the result for the empty database:



As can be seen, Redis is ultra fast (0.7ms) while RavenDB is slowest with 11.3ms:



Performance degrades in some stores when we have more items in the store but ordering do not change. Redis still shines with the best performance and RavenDB is slow compared to the rest. So this is the result when each store already contains 200,000 items (this is especially marked in SQL Server):



This is the breakdown of the results:




Conclusion

First of all, it is not about who is the fastest, it is about making an informed decision considering all parameters including speed. When choosing a NoSQL you would consider other factors which do not come into this benchmark and in fact some cannot be benchmarked.

In terms of results, Redis in all scenarios provides the best performance. I think this also in part is due to its excellent totally async client library. Marc Gravell has done a ton of work to make the client efficient.

Performance of MongoDB, Cassandra and SQL Server are close. SQL Server proves to be a valuable tool as a simple key/value scenario, if you already pay for its license. RavenDB is the slowest of all  - considering it is still under development.

Feel free to run the test yourselves. Find the code on GitHub here.

In my next series of tests, I will run the tests in a multi-threaded fashion.

10 comments:

  1. Great post and hey from Toronto.

    This would be immensely helpful if you posted versions of redis, mongo and others please :)
    I am a big supporter of mongodb , 2.2 has a lot of improvements and i love redis for what it is, but its a very basic key/value store (yes with lists, hashes) but its basic configuration is set to "eventual persistence" , default is set to 10,000 or something records after which it'll start to persist. and also redis and others should really be tested on linux , ported versions DO NOT perform the same way.

    cheers

    ReplyDelete
    Replies
    1. Thanks! Yes, I intended to but forgot. I will update the post now.

      Delete
    2. The most *significant* difference between redis on linux vs windows is the persistence and sync performance - basically the missing "fork". I don't think the test here is looking at persistence performance. While I agree that there may be slight differences in terms of standard running performance (network stack, memory allocation, etc), I wouldn't be *too* concerned by those factors.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. If someone is interested to further tweaks and my private conclusions on RavenDB tests then go to the raven google group: http://bit.ly/ZaLXHn

      Delete
  3. For the redis performance when measuring the "get", it isn't actually the client making it fast - I say this because you are running things sequentially and waiting for the result (via .Result) - which means it isn't even *starting* to show off the pipeline / multiplexer performance. The time you are seeing is simply: redis is really, really fast.

    The time that BookSleeve's async design really shines is when you are either:

    a: multiplexing - for example, issuing concurrent commands on a single connection from multiple threads; for reference, at StackOverflow / StackExchange we have a single BookSleeve connection per AppDomain - and it keeps up just fine
    b: pipelining - i.e. issuing lots of commands *without* checking for the result of each before issuing the next - and then gathering the results afterwards (i.e. issue 40 "get" before accessing the .Result of the first one)

    ReplyDelete
  4. I guess what I'm saying (above comment) is that I think your last-but-one paragraph needs a tweak: I think the credit there goes to Salvatore (et al), not the specific client.

    ReplyDelete
  5. Thanks for this post! I'd love to see an update of this post with the new RavenDB 2.0 and their BulkInsert feature.

    ReplyDelete
  6. Interesting article but I feel that you probably hampered sql server insert speed because your choice of clustered index. Non sequential keys suchas a guid will introduce page splits slowing each insert.
    This will in turn slow the reads as you have more datapages to traverse. A combination of sequential PK and sensible fill factor should improve the situation for sql server.

    ReplyDelete
  7. what about the scripts that generates the collections in mongoDB

    ReplyDelete