Wednesday, 20 July 2016

Singleton HttpClient? Beware of this serious behaviour and how to fix it

If you are consuming a Web API in your server-side code (or .NET client-side app), you are very likely to be using an HttpClient.

HttpClient is a very nice and clean implementation that came as part of Web API and replaced its clunky predecessor WebClient (although only in its HTTP functionality, WebClient can do more than just HTTP).

HttpClient is usually meant to be used with more than just a single request. It conveniently allows for default headers to be set and applied to all requests. Also you can plug in a CookieContainer to allow for all sessions.

Now, ironically it also implements IDisposable suggesting a short-lived lifetime and disposing it as soon as you are done with. This lead to several discussions in the community (here from Microsoft Patterns and Practices, Darrel Miller in here and a few references in StackOverflow here) to discuss whether it can be used with longer lifetime and more importantly whether it needs disposal.

Singleton HttpClient matters, especially when it comes to the performance [Dragan Brankovich - Flickr]

HttpClient implements IDisposable only indirectly through HttpMessageHandler and only as a result of in-case not an immediate need - I am not aware of an implementation of HttpMessageHandler that holds unmanaged resources (the mere reason for implementing IDisposable).

In short, the community agreed that it was 100% safe, not only not disposing the HttpClient, but also to use it as Singleton. The main concern was thread safety when making concurrent HTTP calls - and even official documentations said there is no risk doing that.

But it turns out there is a serious issue: DNS changes are NOT honoured and HttpClient (through HttpClientHandler) hogs the connections until socket is closed. Indefinitely. So when does DNS change occur? Everytime you do blue-green deployment (in Azure cloud services when you deploy to staging slot and then swap production/staging slots). Everytime you change settings in your Azure Traffic Manager. Failover scenarios. Internally in a myriad of PaaS offerings.

And this has been going on for more than 2 years without being reported... makes me wonder what kind of applications we build with .NET?

Now if the reason for DNS change is failover, your connection would have been faulted anyway so this time connection would open against the new server. But if this were the blue-black deployment, you swap the staging and production and your calls would still go to the staging environment - a behaviour we had seen but had fixed it by bouncing the dependent servers thinking possibly this was an Azure oddity. What a fool was I - it was there in the code! Whose code? Well debateable...

Analysis

All of this goes back to the implementation in HttpClientHandler that uses HttpWebRequest to make connections none of which code is open sourced. But obviously using Jetbrain’s dotPeek we can look into the decompiled code and see that HttpClientHandler creates a connection group (named with its hashcode) and does not close the connections in the group until getting disposed. This basically means the DNS check never happens as long as a connection is open. This is really terrifying...
protected override void Dispose(bool disposing)
{
    if (disposing && !this.disposed)
    {
        this.disposed = true;
        ServicePointManager.CloseConnectionGroups(this.connectionGroupName);
    }
    base.Dispose(disposing);
}
As you can see, ServicePoint class plays an important role here: controlling number of concurrent connects to a ‘service point/endpoint’ as well as keep-alive behaviours.

Solution

A naive solution would be to dispose the HttpClient (hence the HttpClientHandler) every time you use it. As explained this is not how HttpClient is intended to be used.

Another solution is to set ConnectionClose property of DefaultRequestHeaders on your HttpClient:
var client = new HttpClient();
client.DefaultRequestHeaders.ConnectionClose = true;
This will set the HTTP’s keep-alive header to false so the socket will be closed after a single request. It turns out this can add roughly extra 35ms (with long tails, i.e amplifying outliers) to each of your HTTP calls preventing you to take advantage of benefits of re-using a socket. So what is the solution then?

Well, courtesy of my good friend Andy Jutton of Amido, the solution lies in an obscure feature of the ServicePoint class. Basically, as we said, ServicePoint controls many aspects of TCP connections and one of the properties is ConnectionLeaseTimeout which controls how many milliseconds a TCP socket should be kept open. Its default value is -1 which means connections will be stay open indefinitely… well in real terms, until the server closes the connection or there is a network disruption - or the HttpClientHandler gets disposed as discussed.

So the root cause is basically that with the default value of -1, which is IMHO, wrong and potentially dangerous setting.

Now to fix it, all we need to do is to get hold of the ServicePoint object for the endpoint by passing the URL to it and set the ConnectionLeaseTimeout:
var sp = ServicePointManager.FindServicePoint(new Uri("http://foo.bar/baz/123?a=ab"));
sp.ConnectionLeaseTimeout = 60*1000; // 1 minute
So this is something that you would want to do only at the startup of your application, once and for all endpoints your application is going to hit (if endpoints decided at runtime, you would be setting that at the time of discovery). Bear in mind, path and query strings are ignored and only the host, port and schema are important. Depending on your scenario, values of 1-5 minutes probably make sense.

Conclusion

Using Singleton HttpClient results in your instance not to honour DNS changes which can have serious implications. The solution is to set the ConnectionLeaseTimeout of the ServicePoint object for the endpoint.

12 comments:

  1. Disturbing, especially since ServicePointManager isn't available on all the platforms HttpClient is (UWP/WPSL etc).

    ReplyDelete
  2. If the ServicePointManager is not available then it cannot be used. I downloaded the Microsoft.Net.Http package from NuGet and decompiled both, the net40 and portable versions and the implementations there, although similar, are not the same. The portable version's HttpClientHandler.Dispose() method is essentially empty and does not use the ServicePointManager.

    I don't know what it means though. Their behavior might be different perhaps.

    ReplyDelete
  3. Great post Ali. I was halfway reading this and thought "ooh I better show AJ this" and then the man himself had already given you the solution!

    ReplyDelete
    Replies
    1. Oh yeah, the man fixed it and no one else had found it, I did discuss this with some key figures in the community.

      Delete
  4. Hey Ali,
    Just wanted to add a few more comments to this:
    a) I don't believe that HttpClient should not be disposed. It does allocate Cancellation Tokens and request and response bodies can be unmanaged streams. I just believe that it should only be disposed when your application is finished making HTTP requests.
    b) ServicePointManager doesn't hold connections open indefinitely, despite the default value of ConnectionLeaseTimeout. Inactive connections will be closed after the MaxIdleTime (100 secs default).
    c) ConnectionLeaseTimeout is a strange beast, that I wasn't aware of until Oren mentioned it today. It will cause outgoing requests to include Connection:Close headers after the lease expires, causing the connection to be terminated after the server sends its response. It makes sense why this is set to infinite to default because it is a strange way to force connections to be closed when a client is actively making requests.
    d)If your server side has done a production/staging switch then the ideal way stop any more requests going to the outdated app server is to tell the app server to return connection:close headers in all future requests it receives. That means no-one has to wait for any lease timeout. At most one request will hit the outdated server and that can only occur within the MaxIdleTime. Assuming DNS caching doesn't delay the switch any longer.
    e) I guess this could be implemented in cloud infrastructure. As soon as something become switched to staging, then all returned requests get a connection:close header added. I'll see if I can find someone in Azure AppService who knows this stuff.

    ReplyDelete
    Replies
    1. Hi Darrel, thanks for your points. First of all, an application does not just stop sending HTTP (can you even think of an example of this exceptional case?) - an active application will keep sending HttpClient when it has an active user hence the point that HttpClient lifetime is the same as the application. In a server scenario (our case), our APIs are almost never idle so the client connection would be open for hours and hours if not days and weeks. Production/Staging is only one scenario, there are traffic manager scenarios and many many more outside Microsoft world. Also your suggestion about load balancer sending connection-close will only cover cloud server scenarios while there are many other server scenarios where the server, rightly trusts that the DNS TTL is going to be observed.

      I will look at the Java implementation to try to understand how they have handled it but for now, the only reliable solution is the one recommended in the post.

      Delete
  5. Could you use ServicePointManager.SetTcpKeepAlive(...) to achieve the same goal?

    ReplyDelete
  6. I think there's some confusion here... connections are by default set to last forever. Open HTTP (and underlying TCP connections) will not do a DNS lookup again once opened. You can send multiple HTTP requests over the same connection, but will need to send them over a new connection to get a new DNS lookup. This is standard HTTP behavior.

    If the server becomes unavailable, then the connection will close automatically, resulting in a new connection and DNS lookup. If you just want to cause a DNS lookup yourself then you'll have to force a new connection, and your fix of setting a max-lifetime for the underlying TCP connection is just one way to do that. You can also dispose the entire HttpClient, or find the ServicePoint and dispose/recreate that as well.

    ReplyDelete
    Replies
    1. Good for you if you have known it.

      Yes, it makes sense once you know it. But have you been protecting yourself from this?

      All in all, my view is that this should have been communicated clearly in the docs with ways around it.

      Delete
  7. I just see the post i am so happy to the communication science post of information's.So I have really enjoyed and reading your blogs for these posts.Any way I’ll be replay for your great thinks and I hope you post again soon....

    SEO Company in Chennai

    ReplyDelete
  8. How would one resolve this issue in .NET Core? Do we have to wait for .NET Standard 2.0? Thanks.

    ReplyDelete
    Replies
    1. Frankly do not know about .NET core. It is still in flux and have not had the time to look at.

      Delete