Byte Rot: HttpResponseMessage

Showing posts with label HttpResponseMessage. Show all posts

Monday, 7 October 2013

Beware of undisposed or unconsumed HttpResponseMessage

[Level T2] This is just a short post to bring something important to the reader's attention. I guess if you are well into ASP.NET Web API, you are bound to see it - especially if you are consuming APIs using HttpClient.

Let's look at this innocent looking delegating handler (useful on the client side):

public class InnocentLookingDelegatingHandler : DelegatingHandler
{

    protected override Task SendAsync(HttpRequestMessage request, 
        CancellationToken cancellationToken)
    {
        return base.SendAsync(request, cancellationToken)
            .ContinueWith(t =>
                {
                    var response = t.Result;
                    if (response.StatusCode == HttpStatusCode.NotModified)
                    {
                        var cachedResponse = request.CreateResponse(HttpStatusCode.OK);
                        cachedResponse.Content = new StringContent("Just for display purposes");
                        return cachedResponse;
                    }
                    else
                    return response;
                });
    }   
}

There isn't really a lot happening. This delegating handler sniffs the incoming responses and if it intercepts a 304 response, it replaces this with a cached response - of course the implementation is a minimal one just for display purposes.

Now let's use it with a resource which returns 304:

class Program
{
    static void Main(string[] args)
    {
        var client = new HttpClient(new InnocentLookingDelegatingHandler()
                        {
                            InnerHandler = new HttpClientHandler()
                        });
        for (int i = 0; i < 1000 * 1000; i++)
        {
            var request = new HttpRequestMessage(HttpMethod.Get, 
                "http://ajax.googleapis.com/ajax/libs/angularjs/1.0.7/angular.min.js");
            request.Headers.IfModifiedSince = DateTimeOffset.Now;
            var response = client.SendAsync(request).Result;
            response.Dispose();
            Console.Write("\r" + i);
        }
            
    }
}

You can find a ready project to test this on GitHub. Please note that System.Net's maxConnection needs to be set up in the app.config.

So what do we see?

Application starts to leak memory.
After a while you get an error telling you have run out of sockets.

We can see the effect using SciTech tool:

Well, reason is the server response in the Task continuation is not getting consumed. What does it mean? Well, adding a line to consume the content is enough:

var cachedResponse = request.CreateResponse(HttpStatusCode.OK);
cachedResponse.Content = new StringContent("Just for display purposes");
response.Content.ReadAsByteArrayAsync().Result; // just read the content which is empty!!
return cachedResponse;

Or simply dispose the response:

var cachedResponse = request.CreateResponse(HttpStatusCode.OK);
cachedResponse.Content = new StringContent("Just for display purposes");
response.Dispose(); // Dispose the response
return cachedResponse;

This should be enough to solve the problem. Now one would think that we have got the response and 304 response actually would never have a content so it is silly to read the content - but well, until you hit this bug.

Thanks to Carl Duguay who reported the issue on CacheCow, the resultant memory leak on 304 responses is fixed now. I think it is very likely that you might run into similar problem so beware when getting a response - always consume or dispose it.

This brings the question that should we always dispose HttpRequestMessage and HttpResponseMessage? Recently, there has many classes that implement IDisposable but they do not require use of dispose pattern. Examples include HttpClient, Task or MemoryStream. On the other hand, you cannot find a single sample that either request or response is used in a Dispose pattern - and the fact that the API is fully async makes use of Dispose pattern very difficult.

In any case, ideas are welcome. But as for the bug we had in CacheCow, it is fixed now.

Monday, 17 September 2012

Server-side Async: Careful with that Axe, Eugene

[Level T3]

In a previous post, I talked about the dangers lurking in doing server-side async operations in .NET 4.0. As you know, .NET 4.5 provides a much better syntax allowing async/await keywords to take your TPL Task-Soups to a much more readable and organised code. But even so, async will make debugging your application more difficult and bugs could take much longer to be reproduced, isolated and fixed.

Task-Soup

In .NET 4.0, when we add up continuations to create a chained task, we could end up with a few problems:

We could end up with an unobserved exception problem. This is nicely described by Ayende here
Nested lambda expressions could create unexpected problems with closure of variables
The code becomes hard to read.

On the third note, I will just bring an example from my own code in CacheCow. What is it that we are actually returning here?

return response.Then(r =>
{
 if (r.Content != null)
 {
  TraceWriter.WriteLine("SerializeAsync - before load",
   TraceLevel.Verbose);

  return r.Content.LoadIntoBufferAsync()
   .Then(() =>
   {
    TraceWriter.WriteLine("SerializeAsync - after load", TraceLevel.Verbose);
    var httpMessageContent = new HttpMessageContent(r);
    // All in-memory and CPU-bound so no need to async
    return httpMessageContent.ReadAsByteArrayAsync();
   })
   .Then( buffer =>
      {
       TraceWriter.WriteLine("SerializeAsync - after ReadAsByteArrayAsync", TraceLevel.Verbose);
       return Task.Factory.FromAsync(stream.BeginWrite, stream.EndWrite,
        buffer, 0, buffer.Length, null, TaskCreationOptions.AttachedToParent);                                                        
      }
     );

   ;
 }

Even looking at brackets gives me headache.

Is Async worth it at all?

Now we talk a lot about Async operations and its role in improving scalability. But really, is it worth it? How much scalability would it bring? Would it help or hinder?

The answer to these questions is yes, it does help. The more IO you do on your server-side actions, the more you benefit from improvement from scalability. So it is highly advisable to implement your ApiController actions as Async by returning Task or Task<T>.

The truth is, it will help even with your non-IO-bound operations although it is not advisable to use Async in such scenarios. You can test it for yourself, create a sync and an async controller to do exactly the same operation and use a benchmarking tool to compare the performance.

I have a CarManager sample on GitHub which I use for testing CacheCow.Server and it contains two simple controllers: CarController and CarAsyncController. All these do is to use an in-memory repository and their GET only looking up the dictionary by its key:

// sync version
public Car Get(int id)
{
 return _carRepository.Get(id);
}


// async version (on another controller)
public Task<Car> GetAsync(int id)
{
 return Task.Factory.StartNew(() => _carRepository.Get(id));
}

So if you use a benchmarking tool such as Apache Benchamrk ab.exe, you could see a slight increase in throughput using the async controller. In my case, there was a 10% increase in throughput using async.

My ordeal with a bug

Development of CacheCow has been marred by existent of a problem which as we will see, turns out to be not in my code. I have been battling with this for a few weeks (on and off) and could not progress CacheCow development because of that.

OK, here is how my story begins; I think the Sherlock Holmes nature of this troubleshooting could be amusing for others too. After realising that using simple ContinueWith will not flow the context (see previous post) I was tasked with changing all such cases with Then in the TaskHelpers which checks existence of SynchronizationContext and flows the context if it exists.

On the other hand, lostdev, one of CacheCow's most loyal users, informed me of an occasional null reference exception in CacheCow.Server. Now, I had already fixed a bug related to null reference exception when the a resource was being retrieved for the first time. I attributed the problem to the fix I had made and reported that the problem is fixed in the current version.

So I started developing file-based cache storage for CacheCow.Client (which will have its own post very soon) and replaced all ContinueWith cases with Then.

And then I started to experience deadlocks in CacheCow.Client when I was using file-based caching and sending concurrent GET requests to the server. As soon as I would remove FileStore, and replace with InMemoryCacheStore, it would work. So I started searching through the client code, debug, look at the threads, debug again, change code, debug... to no avail. As soon as I was using file-based caching it would start to appear so it had to be on the client.

Then I noticed a strange thing: I could only run 4 concurrent calls and rest would be blocked. Why? Then I started playing with the maxconnection property of the system.net configuration:

  <system.net>
 <connectionManagement>
   <add address = "*" maxconnection = "N" />
 </connectionManagement>
  </system.net>

and interestingly, by setting the N to a high number, I would get more concurrent connections - but only up to the number defined. Hmmm... so the requests do not quite finish. OK, I fired up Sysinternals' TcpView but unfortunately these connections did not show up (and I do not know why).

I was getting nowhere until I accidentally loaded an earlier version of the server code. To my surprise, I did not get the deadlock but this error which @Tugberk separately reported earlier but attributed to order of handlers:

[NullReferenceException: Object reference not set to an instance of an object.]
System.Web.Http.WebHost.HttpControllerHandler.EndProcessRequest(IAsyncResult result) +112
System.Web.Http.WebHost.HttpControllerHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result) +10
System.Web.CallHandlerExecutionStep.OnAsyncHandlerCompletion(IAsyncResult ar) +129

OK, so it is probably happening on the server but the continuation code gets deadlocked on unhandled exception. I am close! So it was time to go to bed and I was positive that I would nail it the day after.

It was funny that I woke up the day after and with my in-bed reading on tweets, stumbled on @Tugberk's tweet on issue he had just created. That sounds exceedingly similar, so we just doubled checked our scenarios and it turned out that an HttpResponseMessage with empty RequestMessage property is not handled in Web API and a null reference exception is thrown at the end of the response clean-up code. And the reason I was seeing it only with file-based cache store was that the part of server-side code to return such responses was being triggered only using file-based store (since it was capable of persisting caches and was trying to validate the cache).

So as you can see, a seemingly unrelated problem can really confuse the nature of the bugs in async scenarios.

Conclusion

First of all, always use request.CreateResponse() instead of using new HttpResponseMessage. I googled for cases of new HttpResponseMessage and found +3000 entries. This is really dangerous and I think this is a bug in Web API and needs to be fixed. If you are using new, make sure you set the RequestMessage property.

And in general, be careful with doing server-side async operations. It is really a powerful axe but with it you are not quite sure what a slightly off swing could bring. Careful with that axe Eugene.

Monday, 30 July 2012

Serialising request and response in ASP.NET Web API

Introduction

[Level T3] This is a short post on serialising/deserialising HTTP request and response messages in ASP.NET Web API. Serialising messages manually can be achieved but is hard-work and you can run into various problems. ASP.NET Web API provides a means of achieving this through HttpMessageContent. This post is a follow-up to this discussion.

Background

There are many cases where you could be interested in serialising HttpRequestMessage or HttpResponseMessage. For me, I needed this to implement caching features on the HttpClient in CacheCow framework.

Technically speaking HTTP messages arrive in serialised format and all we need is access to the raw stream coming from server - as such no processing would be required. Unfortunately this is not possible since Web API does not read the message as a raw stream and then process it, instead it starts by reading various chunks, parsing it as it goes.

However, ASP.NET team implemented a feature that could be used for serialisation/deserialisation of request and response messages. If you have read Brad Wilson's batching post, you probably have seen noticed that HttpMessageContent can be used for implementing client-server batching. Now we will use this for serialisation.

HttpMessageContent

RFC 2616 in its appendices defines content types "message/http" and "application/http". application/http is a content-type that can contain more than one request or response.

Did we not have this in multi-part content-type? As we know, we can include different request or response parts in the same message and each part gets its own share of the headers, so what is the difference?

Well the difference is with multi-part, each part can only have headers related to content. And above all, they share the same status code. In application/http, each "part", as it were, is a complete request or response. For example, the requests each will have their own URI and responses their own status code.

HttpMessageContent can encapsulate multiple HttpRequestMessage or HttpResponseMessage but in our case we just need a single request or response.

Serialiser interface

Let's define an interface for our serialiser:

public interface IHttpMessageSerializer
{
 void Serialize(HttpResponseMessage response, Stream stream);
 void Serialize(HttpRequestMessage request, Stream stream);
 HttpResponseMessage DeserializeToResponse(Stream stream);
 HttpRequestMessage DeserializeToRequest(Stream stream);
}

UPDATE: Latest implementation is fully async and can be found as part of CacheCow library here.

Serialisation

In order to serialise, we need to create a new HttpMessageContent passing request or response and then use ReadAsByteArrayAsync to read the whole message as a byte array:

var httpMessageContent = new HttpMessageContent(request);
var buffer = httpMessageContent.ReadAsByteArrayAsync().Result;

As you can see it is very easy to serialise. Now the only caveat is that if you are serialising in a delegating handler, this will consume the message content stream so that it cannot be read further down the stream. If you do, you will see this error message:

The stream was already consumed. It cannot be read again.

The trick (for now) is to call the method ReadAsByteArrayAsync to force the content to be loaded into the buffer. Although we would not need the buffer we read (since the actual reading will happen inside HttpMessageContent), next time the content will be read from the buffer and not from the network. In my implementation I have made it optional whether to pre-read the content into the buffer.

Deserialisation

The trick with deserialisation is to create a normal HttpRequestMessage or HttpResponseMessage and set the content-type header into"application/http;msgtype=request" or "application/http;msgtype=response", accordingly". Then we use the special extension method to read into the an HttpMessageContent:

var request = new HttpRequestMessage();
request.Content = new ByteArrayContent(memoryStream.ToArray());
request.Content.Headers.Add("Content-Type", 
    "application/http;msgtype=request");
return request.Content.ReadAsHttpRequestMessageAsync().Result;

As you can see, all the heavy lifting happens inside the HttpMessageContent and there is really little code that we need to write.

Conclusion

We can use HttpMessageContent to serialise/deserialise request/response in ASP.NET Web API. Full implementation can be found ~~as a GitHub gist here~~ as part of CacheCow library here. This implementation is fully Async and takes advantage of IO completion ports exposed in Begin/End methods.

One word of caution is on cases where message needs to be used after serialisation - which would comprise many cases including serialisation in DelegatingHandler. In these cases we need to invoke ReadAsByteArrayAsync (or similar) to ensure the content is read into the buffer.