Tuesday, 3 June 2014

Cancelling an async HTTP request Task sends TCP RESET packet

Level [T4]

This blog post did not just happen. In fact, never, if ever, something just happens. There is a story behind everything and this one is no different. Looking back, it feels like a nice find but as the story was unfolding, I was running around like a headless chicken. Here we have the luxury of the hindsight so let's take advantage of it.

TLDR; If you are a sensible HTTP client and make your HTTP requests using cancellable async Tasks by passing a CancellationToken, you could find your IP blocked by legacy bridge devices blacklisting clients sending TCP RESET packets.

So here is how it started ...

So we were supposed to go live on Monday - some Monday. Talking of live, it was not really live - it was only to internal users but considering the high profile of the project, it felt like the D-Day. All VPs knew of the release and were waiting to see a glimpse of the project. Despite the high profile, it was not properly resourced, I despite being so called architect , pretty much singled handedly did all the API and the middleware connecting the Big Data outputs with the Single Page Application.

We could not finish going live on Monday so it moved to Tuesday. Now on Tuesday morning we were all ready and I set up my machine's screen like traders with all performance monitors up on the screen looking at users. With using the cloud Azure, elasticity was the option although the number of internal users could hardly make a dent on the 3 worker roles. So we did go live, and, I could see traffic building up and all looked fine. Until ... it did not.

I saw requests queuing up and loading the page taking longer and longer. Until it was completely frozen. And we had to take the site down. And that was not good.

Server Analysis

I brought up DebugView and was lucky to see this (actual IP and site names anonymised):

[1240] w3wp.exe Error: 0 :
[1240] <html>
[1240] <h1>Access Administratively Blocked</h1>
[1240] <br>URL : 'http://www.xyz.com/services/blahblah'
[1240] <br>Client IP address : 'xyz.xx.yy.zzz'
[1240] </html>

So we are being blocked! Something is blocking us and this could be because we used an UI data endpoint as a Data API. Well I knew it is not good but as I said we had a limited time and in reality that data endpoint was meant to support our live traffic.

So after a lot of to and fro with our service delivery and some third party support, we were told that our software was recognised as malicious since it was sending way too many TCP RESET packets. Whaa?? No one ain't sending no TCP whatever packets, we are using a high level language (C#) and it is the latest HttpClient implementation. We are actually using many optimising techniques such as async calls, parallelisation, etc to make the code as efficient as possible. We also used short timeout+ retry which is Netflix's approach to improve performance.

But what is TCP RESET packets? Basically a RESET packet is one that has the RESET flag set (which is otherwise unset) and tells the server to drop the TCP connection immediately and reclaim all the resources associated with it. There is an RFC from back in 2002 that considers RESET harmful. Wikipedia's article argues that when used as designed, it is useful but forged RESET can disrupt the communication between the client and server. And Microsoft's technet blog on the topic says "RESET is actually a good thing".

And in essence, I would agree with the Microsoft (and Wikipedia's) account that sending RESET packet is what a responsible client would do. Let's imagine you are browsing a site using a really bad wifi connection. The loading of the page takes too long and you frustrated by the slow connection, cancel browsing by pressing the X button. At this point, a responsible browser should tell the server it has changed its mind and is not interested in the response. This will let the server use its resources for a client that is actually waiting for the server's response.

Now going back to the problem at hand, I am not a TCP expert by any stretch - I have always used higher level constructs and never had to go down so deep in the OSI model. But my surprise was, what is different now with my code that with a handful calls I was getting blocked while the live clients work well with no problem with significantly larger number of calls?

I had a hunch that it probably has to do with the some of the patterns I have been using on the server. And to shorten the suspense, the answer came from the analysis of TCP packets when cancelling an async HTTP Task. The live code uses the traditional synchronous calls - none of the fancy patterns I used. So let's look at some sample code that cancels the task if it takes too long:

var client = new HttpClient();
var buffer = new byte[5 * 1000 * 1000];
// you might have to use different timeout or address
var cts = new CancellationTokenSource(TimeSpan.FromMilliseconds(300)); /
try
{
    var result = client.GetAsync("http://www.google.com",
    cts.Token).Result;
    var s = result.Content.ReadAsStreamAsync().Result;

    var result1 = s.ReadAsync(buffer, 0, buffer.Length, cts.Token).Result;
    ConsoleWriteLine(ConsoleColor.Green, "Got it");
}
catch (Exception e)
{
    ConsoleWriteLine(ConsoleColor.Red, "error! " + e);
}

In this snippet, we are calling the google server and set a 300ms timeout (which you might have to modify the timeout or the address based on your connection speed, in order to see the cancellation). Here is a WireShark proof:



As you can see above a TCP RESET packet has been sent - if you have set the parameters in a way that the request does not complete before its timeout and gets cancelled. You can try this with a longer timeout or use a WebClient which is synchronous and make sure you will never ever see this RST packet.

Now the question is, should a network appliance pick on this responsible cancellation and treat it as an attack? By no means. But in my case, it did and it is very likely that it could do that with yours.

My solution came by whitelisting my IP against "TCP RESET attacks". After all, I was only trying to help the server.

Conclusion

Cancelling an HTTP async Task in the HttpClient results in sending TCP RESET which is considered malicious by some network appliances resulting in blacklisting your IP.

PS. The network appliance belonged to our infrastructure 3rd party provider whose security managed by another third party - it was not in Azure. The real solution would have been to remove such crazy rule, but anyhow, we developers don't always get what we want.


2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete