Byte Rot: Benchmarking

Showing posts with label Benchmarking. Show all posts

Sunday, 19 October 2014

Performance Series - How poor performance of HttpContent.ReadAsAsync can affect your API/site

Level [T2]

This has been a revelation - what I am about to reveal here, deeply surprised me - it might surprise you too. This post is mainly about consuming restful APIs using HttpClient and when the payload is JSON.

UPDATE: I got in touch with the ASP.NET team and they confirmed this as a performance bug which has now been fixed but the fix yet not available.

As you probably know performance and benchmarking is very close to my heart and I have been recently focusing on benchmarking a few APIs at work. One of my observations was that the Web APIs/Web Sites which have historically been IO-bound, they show sign of CPU strain and have become CPU-bound.

When you think logically about it, there is no magic here: by using async/await, you end up putting your CPU into some use unlike the old times when the threads are blocked waiting for the IO to return and CPU would be twiddling its thumb. However, I found the CPU overhead of the operations excessive so I set out to benchmark a few different scenarios.

Test Setup

Two APIs were created where one was using the other. These two APIs were part of the same cloud service which was deployed to two separate Medium (A2) web roles. I used 2 different deployments of the same code, one dependent upon version 4.0.30506.0 of the API and the ther one with the latest version which was 5.2.2. Difference between two versions of the Web API is the topic of another post, but the differences were not huge although newer versions showed improved performance.

API being called returns a customer with its orders. Every customer has between 1 to 3 orders and each order between 1-3 items. On the long run, these randomisation gets evened out. Each document returned is between 1-2 KB. So the more superficial API, for every customer, makes one call to get the customer and for each customer will separately call the deeper API once for each order. Then it combines the result and sends back the response. Both APIs are deployed in the same Azure Data Centre.

You can find the whole code at GitHub. The code takes 4 different approaches as below:

public class CustomerController : ApiController
{
    public FullCustomer GetSync(int id)
    {
        var webClient = new WebClient();
        var customerString = webClient.DownloadString(BuildUrl(id));
        var customer = JsonConvert.DeserializeObject<Customer>(customerString);
        var fullCustomer = new FullCustomer(customer);
        var orders = new List<Order>();
        foreach (var orderId in customer.OrderIds)
        {
            var orderString = webClient.DownloadString(BuildUrl(id, orderId));
            var order = JsonConvert.DeserializeObject<Order>(orderString);
            orders.Add(order);
        }
        fullCustomer.Orders = orders;
        return fullCustomer;
    }

    public async Task<FullCustomer> GetASync(int id)
    {
        var webClient = new WebClient();
        var customerString = await webClient.DownloadStringTaskAsync(BuildUrl(id));
        var customer = JsonConvert.DeserializeObject<Customer>(customerString);
        var fullCustomer = new FullCustomer(customer);
        var orders = new List<Order>();
        foreach (var orderId in customer.OrderIds)
        {
            var orderString = await webClient.DownloadStringTaskAsync(BuildUrl(id, orderId));
            var order = JsonConvert.DeserializeObject<Order>(orderString);
            orders.Add(order);
        }
        fullCustomer.Orders = orders;
        return fullCustomer;
    }

    public async Task<FullCustomer> GetASyncWebApi(int id)
    {
        var httpClient = new HttpClient();
        httpClient.DefaultRequestHeaders.Add("Accept", "application/json"); 
        var responseMessage = await httpClient.GetAsync(BuildUrl(id));
        var customer = await responseMessage.Content.ReadAsAsync<Customer>();
        var fullCustomer = new FullCustomer(customer);
        var orders = new List<Order>();
        foreach (var orderId in customer.OrderIds)
        {
            responseMessage = await httpClient.GetAsync(BuildUrl(id, orderId));
            var order = await responseMessage.Content.ReadAsAsync<Order>();
            orders.Add(order);
        }
        fullCustomer.Orders = orders;
        return fullCustomer;
    }

    public async Task<FullCustomer> GetASyncWebApiString(int id)
    {
        var httpClient = new HttpClient();
        httpClient.DefaultRequestHeaders.Add("Accept", "application/json"); 
        var responseMessage = await httpClient.GetAsync(BuildUrl(id));
        var customerString = await responseMessage.Content.ReadAsStringAsync();
        var customer = JsonConvert.DeserializeObject<Customer>(customerString);
        var fullCustomer = new FullCustomer(customer);
        var orders = new List<Order>();
        foreach (var orderId in customer.OrderIds)
        {
            responseMessage = await httpClient.GetAsync(BuildUrl(id, orderId));
            var orderString = await responseMessage.Content.ReadAsStringAsync();
            var order = JsonConvert.DeserializeObject<Order>(orderString);
            orders.Add(order);
        }
        fullCustomer.Orders = orders;
        return fullCustomer;
    }

    private string BuildUrl(int customerId, int? orderId = null)
    {
        string baseUrl = string.Format("http://{0}:8080/api/customer/{1}", Request.RequestUri.Host, customerId);
        return orderId.HasValue
            ? string.Format("{0}/order/{1}", baseUrl, orderId.Value)
            : baseUrl;
    }

}

So as you can see, we use 4 different methods:

1) Using WebClient in the sync fashion
2) Using WebClient in the async fashion

3) Using HttpClient in the async fashion with ReadAsAsync on HttpContent

4) Using HttpClient in the async fashion with reading content as string and then using JsonConvert to deserialise

I used SuperBenchmarker to invoke the main API which gathers the data from the other API. I used the tool within the same Azure Data Centre from another machine (none of the APIs) to make the tests more realistic yet eliminate network idiosyncrasies.

I used 5000 requests with concurrency of 10 - although I tried other number as well which did not make any material difference in the results.

Results

Here is the result for scenario 1 (sync using WebClient):

TPS:    394 (requests/second)
Max:    199ms
Min:    8ms
Avg:    25ms

50%     below 24ms
60%     below 25ms
70%     below 27ms
80%     below 28ms
90%     below 30ms
95%     below 32ms
98%     below 36ms
99%     below 55ms
99.9%   below 185ms

The result for scenario 2 (Async using WebClient) usually shows better throughput but higher CPU

TPS:    485 (requests/second)
Max:    291ms
Min:    5ms
Avg:    20ms

50%     below 19ms
60%     below 21ms
70%     below 23ms
80%     below 25ms
90%     below 27ms
95%     below 29ms
98%     below 32ms
99%     below 36ms
99.9%   below 284ms

The CPU difference is not huge and can be explained by the increase throughput:

CPU usage during Scenario 1 and 2

Now what surprised me greatly was the result of the third scenario (using HttpContent.ReadAsAsync<T>). Apart from CPU of 100% and signs of queueing, here is the result:

TPS:    41 (requests/second)
Max:    12656ms
Min:    26ms
Avg:    240ms

50%     below 170ms
60%     below 178ms
70%     below 187ms
80%     below 205ms
90%     below 256ms
95%     below 296ms
98%     below 370ms
99%     below 3181ms
99.9%   below 12573ms

Yeah, shocking. The diagram below compares CPU usage between scenario 1 and 3:

CPU usage in scenario 1 (arrow) and 3 (box)

Scenario 4 is definitely better and is not too far from scenario 1 and 2:

TPS:    230 (requests/second)
Max:    7068ms
Min:    7ms
Avg:    43ms

50%     below 20ms
60%     below 22ms
70%     below 24ms
80%     below 26ms
90%     below 29ms
95%     below 34ms
98%     below 110ms
99%     below 144ms
99.9%   below 7036ms

The CPU usage is around 80% and definitely worse that scenario 1 and 2 (which requires further analysis).

Analysis

Where is the problem? It appears that JSON Deserialization when reading from a stream is not efficient. It is possible that the JSON Deserialization has to optimise for memory efficiency rather than CPU efficiency since when the whole string is passed, it is surely much faster.

Profiling proves that the problem is indeed JSON Deserialization:

Profiling scenario 3 is showing that the most of the CPU time is spent in JSON Deserialisation

So in order to prove that, we do not have to invoke an API. The whole operation can be done inside a Console application. So I used the same code that was generating customers and orders. Here I am comparing

private static void Main(string[] args)
{
    const int TotalRun = 10*1000;

    var customerController = new CustomerController();
    var orderController = new OrderController();
    var customer = customerController.Get(1);

    var orders = new List<Order>();
    foreach (var orderId in customer.OrderIds)
    {
        orders.Add(orderController.Get(1, orderId));
    }

    var fullCustomer = new FullCustomer(customer)
    {
        Orders = orders
    };

    var s = JsonConvert.SerializeObject(fullCustomer);
    var bytes = Encoding.UTF8.GetBytes(s);
    var stream = new MemoryStream(bytes);
    var content = new StreamContent(stream);

    content.Headers.ContentType = new MediaTypeHeaderValue("application/json");

            
    var stopwatch = Stopwatch.StartNew();
    for (int i = 1; i < TotalRun+1; i++)
    {
        var a = content.ReadAsAsync<FullCustomer>().Result;
        if(i % 100 == 0)
            Console.Write("\r" + i);
    }
    Console.WriteLine();
    Console.WriteLine(stopwatch.Elapsed);

    stopwatch.Restart();
    for (int i = 1; i < TotalRun+1; i++)
    {
        var sa = content.ReadAsStringAsync().Result;
        var a = JsonConvert.DeserializeObject<FullCustomer>(sa);
        if (i % 100 == 0)
            Console.Write("\r" + i);
    }
    Console.WriteLine();
    Console.WriteLine(stopwatch.Elapsed);

    Console.Read();

}

As expected, the result shows uncomparable difference, in the order of ~120:

10000
00:00:06.2345493
10000
00:00:00.0509763

So this result basically confirms what we have seen. I will get in touch with James Newton King and try to shed more light on the subject.

Conclusion

HttpContent.ReadAsAsync on JSON payloads is really slow - in the order of 120x compared to JsonConvert. I guess it might to do with the memory efficiency of reading from streams (keeping memory footprint at zero) but that is a guess and I have been in touch with James Newton King (creator of Json.Net) to get to the bottom of it.

For the meantime, if you know your content is not going to be huge and always in JSON, you might as well forget about content negotiation and read it as a string and then use JsonConvert to deserialize.

Thursday, 16 October 2014

SuperBenchmarker v0.4 released

Level [T2]

This is a quick shoutout on the release of version 0.4 of SuperBenchmarker, a Web and/or Web API performance benchmarking command line tool for Windows.

You might have heard about and used Apache Benchmark (ab.exe) in the past which is a very useful tool but on Windows it is very limited (e.g cannot make POST, PUT, etc requests and only supports GET). SuperBenchmarker (sb.exe) supports PUT, DELETE, POST or any arbitrary method and allows you to parameterise the URL and headers using a data file, a .NET DLL plugin and the new feature is the randomisation feature which removes the need for any setup when all needed is random data.

Getting started

The best way to get SuperBenchmarker is to use awesome Chocolatey which is Windows' equivalent of apt-get tool on Linux.

To get Chocolatey, just run this command in your Powershell console (in Administrative mode):

iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1'))

And then install SuperBenchmarker in the command line shell:

c:\> cinst SuperBenchmarker

And now you are ready to load test:

c:\> sb -u http://google.com

Note: if you are using Visual Studio's command line shell, you cannot use ampersand character (&) and you have to escape it using hat (^).

Using SuperBenchmarker

Normally you would define total number of requests and concurrency:

c:\> sb -u http://google.com -c 10 -n 2000

Statement above runs 2000 requests with concurrency of 10. At the end, you are shown important metrics of the test:

Status 503:    1768
Status 200:    232

TPS: 98 (requests/second)
Max: 11271.1890515802ms
Min: 3.15724613377097ms
Avg: 497.181240820346ms

50% below 34.0499543844287ms
60% below 41.8178295863705ms
70% below 48.7612961478952ms
80% below 87.4385213898198ms
90% below 490.947293319644ms

So the breakdown of the statuses returned, TPS (transaction per second), minimum, maximum and average of the time taken. But more importantly, your percentiles that really should be driving your performance SLAs (90% or 99%). [Never use the average for anything].

In case you need to dig deeper, a log file gets created in the current directory with the name run.log which you can change using -l parameter:

c:\> sb -u http://google.com -c 10 -n 2000 -l c:\temp\mylog.txt

log file is a tab separated file which contains these columns: order number (based on the time started not the time ended), status code, time taken in ms and then any custom parameters that you might have had - see below.

Sometimes when running a test for the first time, something might not have been quite right in which case you can make a dry run/debug using -d parameter that makes a single request and the body of the response will be shown at the end. If you want to see the headers as well, use -h parameter.

c:\> sb -u http://google.com -c 10 -n 2000 -d -h

Supplying request headers or a payload for POST, PUT and DELETE

In order to pass your tailored request headers, a template file needs to be defined which is basically the HTTP request template (minus the first line defining verb and URL and version):

c:\> sb -u http://google.com -t template.txt

And the template.txt contains our custom headers (from the second line of the HTTP request):

User-Agent: SuperBenchmarker
MyCustomHeader: foo-bar;baz=biz

Please note that you don't have to provide headers such as Host and Content-Length - in fact it will raise errors. These headers will be automatically added by the underlying framework.

For using POST, PUT and DELETE we need to supply the verb parameter:

c:\> sb -u http://google.com -v POST

But this request would require a payload as well which we need to supply. We use the template file to supply HTTP payload as well as any headers. Similar to an HTTP request, there must be an empty line between headers and body:

User-Agent: WhateverValueIWant
Content-Type: x-www-formurlencoded

name=value&age=25

Parameterising your requests

Basically you can parameterise your requests using a CSV file containing values, your plugin DLL or by specifying randomisation.

You would define parameters in URL and headers (payload not yet supported but coming soon in 0.5) using SuperBenchmarker's syntax:

{{{MyParameter}}}

As you can see, we use three curly brackets to denote a parameter. For example the statement below defines a customerId parameter:

c:\> sb -u "http://myserver.com/api/customer?customerid={{{customerId}}}^&ignore=false"

Please note quoting the URL and use of ^ to escape & character - if you are using Visual Studio command prompt. To run the test successfully, you need to provide a CSV file containing customerId:

customerId
123,
245,

and use -f option to run the test:

c:\> sb -u "http://myserver.com/api/customer?customerid={{{customerId}}}&ignore=false" -f c:\mypath\values.csv

Alternatively, you can use a plugin DLL to provide values:

c:\> sb -u "http://myserver.com/api/customer?customerid={{{customerId}}}&ignore=false" -p c:\mypath\myplugin.dll

This DLL must have a single public class implementing IValueProvider interface which has a single method:

public interface IValueProvider
{
    IDictionary<string, object> GetValues(int index);
}

For every request implementation of the interface is called and the index of the request is passed to and in return a dictionary of field names with their respective values is passed back.

Now we have a new feature that in most cases alleviates the need for CSV file or plugin and that is the ability to setup random value provider in the definition of the parameter itself:

c:\> sb -u "http://myserver.com/api/customer?customerid={{{customerId:RAND_INTEGER:[1000:2000]}}}&ignore=false"

The parameter above is set up to be filled by a random integer between 1000 and 2000.
Possible value types are:

String: using RAND_STRING. Will output random words
Date: using RAND_DATE (accepts range)
DateTime: using RAND_DATETIME (accepts range)
DateTimeOffset: using RAND_DATETIMEOFFSET which outputs ISO dates (accepts range)
Double: using RAND_DOUBLE (accepts range)
Name: using RAND_NAME. Will output random names

Feedback

Don't forget to feedback with issues and feature requests in the GitHub page. Happy load testing!