Sunday, 20 January 2013

Review of .NET Framework cryptography and symmetric algorithms benchmark

[Level T2] .NET Framework provides a set of cryptography services in System.Security.Cryptography namespace. This namespace has been there for years and I have been using some of its classes on and off. Yet, I felt I need to make some research into all the services it provides. And this post is the result.

Symmetric and asymmetric algorithms

Symmetric algorithms use a secret key/password (along with an Initialisation Vector IV) to encrypt and decrypt data. A number of these algorithms have been described and documented. Secret key needs to be stored safely as whoever gets hold of the key can encrypt/decrypt the data. These algorithms are efficient and fast and can encrypt/decrypt large amount of data - they can even encrypt streams of data.

Initialisation Vector is used for encrypting the first block of data. It is generally not considered to be a secret and can be passed as plain text. However, the same IV needs to be used for both encryption and decryption. More information can be found here.

Algorithms available in .NET Framework:

  • DES: One of the oldest. Uses 8-byte keys.
  • RC2: Can use keys of 5-16 bytes.
  • TripleDES or 3DES: An advanced and more secure DES. Uses keys of 16 or 24 bytes.
  • Rijndael: A purely managed implementation. Can use keys of 16, 24 or 32 bytes.
  • AES: A purely managed implementation. Uses 16-byte keys. [Please see @leastprivilege comment at the end as AES is an alternative Rijndael implementation]

Performance of these algorithms are compared here (code to re-produce this can be found on GitHub):


As can be seen, AES and Rinjdael provide the best performance, partly due to the fact that they are purely managed. RC2, DES and 3DES are older algorithms and in terms of security not the most reliable so best to be avoided. So when it comes to choosing between an algorithm, use AES or Rijndael.

HMACSHA1 is a symmetric algorithm but is not used for encryption. This is mainly used for creating hash-based message authentication code which is a kind of signing.

Asymmetric algorithms on the other hand use a public/private key pair. These algorithms can be used for encryption as well as signing. Public key is used for encryption and verifying signatures while private key is used for decryption and generating signatures. So the idea is that the public key is shared publicly while the private key has to be stored securely. X509 certificates are a common type of public/private key pairs and usually installed on the machine's keystore so the operating system (and resource ACL) is responsible for its security.

Asymmetric algorithms are used for encrypting small pieces of data. So unlike symmetric algos, they cannot be used for encrypting large data or streams. So they are usually used to encrypt symmetric algo secret keys. Signing is an important use of asymmetric algorithms. .NET Framework itself uses it for signing assemblies (strong naming assemblies).

The most widely used algorithm is RSA which is implemented in.NET framework.

An example of using symmetric and asymmetric algorithms

SSL/TLS is an example of using both symmetric and asymmetric algorithms in the same security context session. In this protocol, client and server negotiate a size for the encryption key (as well as other aspects including the algorithm to use) and then client generates a random secret key of the agreed size for symmetric encrypting the communication. Then it uses public key of the server to encrypt the secret key and sends it to the server. Server, being the only entity in possession of the private key, decrypts the secret key. This secret key is used for the lifetime of the secure transmission to symmetrically encrypt (and decrypt) the data.

Other cryptography services

Random generation

Most likely you have used the class Random in the .NET Framework. This class is capable of generating random bytes that can be used as secret keys:
var random = new Random();
var buffer = new byte[1024];
random.NextBytes(buffer);
While this code works, Random class is incapable of generating completely random values, i.e. the values it produces are pseudo-random as it chooses from a finite set of numbers.

For producing numbers with cryptography-level randomness use RNG:
var buffer = new byte[1024];
using (var rng = new RNGCryptoServiceProvider())
{
    rng.GetBytes(buffer);
}

Hashing

Hashing is the process by which a signature of small fixed size is created for a much larger piece of data. Hashing algorithms such as MD5 or SHA can be used to generate such hashes. Hashing algorithms can also use a secret key which can be used while generating the hash. Ability to produce the hash with a given secret key can be used as a sign of the ownership of the secret without having to send the secret. This technique is used in OAuth using HMACSHA1 algorithm. [OAuth2 is different as per @leastprivilege comment. Please see comments]

There are a number of algorithms implemented in .NET Framework. MD5 is best to be avoided and one of the alternatives of SHA algo should be used. In case of a need for key-based hashing, use HMACSHA1.

How to store the secret key of the encryption

We are commonly in need of encrypting pieces of sensitive data and storing them. These could include content of cookies, user sensitive information, etc. For using a symmetric encryption, we need to use a secret key. So where should we store the secret key?

I have seen many cases where the secret key is hard-coded within the code, usually as a string. First of all, such secret keys can be easily retrieved using reverse engineering tools such as Reflector. Also, these strings are commonly meaningful strings, that usually can be guessed easily.

One way to have a secure secret key is to:
  1. Create a random secret key (of required size) using algorithms such as RNG
  2. Install an X509 certificate on the servers (private and public keys)
  3. Use asymmetric encryption to encrypt the key and store it for example as a file
  4. At the time of using the key, use the private key to decrypt the secret key and then use the key for symmetric encryption/decryption


Conclusion

Avoid older encryption or hashing algorithms such as MD5, DES, RC2 or 3DES. Use newer and managed-only implementations such as AES, Rijndael (and SHA for hasing). AES provides the best performance out of all symmetric encryption algorithms. Use combination of symmetric and asymmetric algorithms for secure encryption. Use RNG if you need crypto-level randomness.

Special thanks to Dominick Baier @leastprivilege for reviewing this article and posting comments which you can read in the comments section.

Monday, 7 January 2013

Performance series: Is thread contention bad?

[Level T2] For better or worse, I have recently been working a lot on the area of performance optimisation and bottleneck hunting.

Performance monitoring is an art as much as it is a science. On one hand you have all various metrics (simple, complex or derived) that a system can produce (at each level, from the client to presentation layer to middleware to database) and on the other hand you require the art of comparison and the ability to decide to ignore or act upon a particular change. The work can be exciting at times while slow and frustrating at other times - finding performance bottlenecks could be difficult but when found they are extremely gratifying!

I am going to have a separate post on performance monitoring an ASP.NET web application but I will use this post to look at a particular scenario and that is the thread contention. This is especially important in the face of increasing popularity of asynchronous programming and use of .NET 4.0's TPL and .NET 4.5's async/await.

So let's imagine this is the problem statement:
From version 1 of the software to version 2 of the software, thread contention has increased substantially. 
And imagine you are responsible for your application's performance. What would you do?

What is Thread Contention?

In .NET, contention rate/second is the performance counter to measure the level of contention as far as the managed threads are concerned. This measures number of unsuccessful managed lock attempts per second. Where would you take a lock? At the time of synchronisation. So contention rate is more a measure of synchronisation rate rather than the multi-threading. 

To confirm the above statement, let's use a console application and monitor the contention rate / sec in the performance monitor for the console application process. Initially, let's try this code:

static void Main(string[] args)
{
    Parallel.For(1, 1000000, (i) => Math.Asin(Math.Log(i)));
    Console.Read();
}

In this case contention will stay at 0 since we have used multi-threading yet we do not have a shared resource requiring locking hence contention is zero. Second scenario is when we write the iterator/counter to the console:

static void Main(string[] args)
{
    Parallel.For(1, 1000000, (i) => Console.WriteLine(i));
    Console.Read();
}

Now performance monitor will show a very high number for contention rate (on my machine it is average of 200). But wait, we did not use any locking here?! Well, we did not but Console did - all calls to Console class is synchronised so it internally uses locking.

Can Thread Contention be healthy?

Absolutely! Contention is just a measurement and you should not base your judgement on Guide Metrics. Guide metrics such as "% Processor Time", "Contention Rate/sec", "Number of Threads" are to be looked at only in the presence of a deteriorated Goal Metrics such as throughput (related to scalability) and performance. For example increase in response time or a drop maximum number of concurrent requests/users is something to look at but increase in % Processor Time can be a sign of higher/better utilisation of the system in the case of removing a logical bottleneck (for example coarse locking).

So always look at guide metrics in the light of goal metrics.

An important example of this in case of contention is use of IO completion ports. Windows defines two types of threads: worker threads and IO completion ports. Worker threads are known to most developers and can be used for processing a piece of work in the background. .NET framework (and .NET runtime) has a ThreadPool which contains worker threads. IO completion ports are Windows's low overhead threads that can carry out an IO-bound operation and notify the commissioning thread when the job is done. These threads will be used by the .NET runtime as soon as you use .BeginXXX and .EndXXX on an IO related class such as FileStream.

ContentionTest is a simple project that illustrates use of IO completion ports. Just try it for yourself and see that setting async variable to true will improve the performance but will make the contention rate to go rocket high.

Conclusion

Always look at guide metrics objectively. Do not try to improve guide metrics, only focus on goal metrics. If a drop in goal metrics coincides with a change in a particular guide metrics, review the relationship.

Thread contention rate is a measurement of thread synchronisation. This rate can go up in various conditions such as increased throughput, use of IO completion ports or removal of a bottleneck in which case there is no cause for concern. If high contention coincides low throughput, consider refactoring the synchronisation.