Sunday, 15 April 2012

ASP.NET Web API Series - Part 3: Async Deep Dive

[Level T4]

OK, in this post we will have a look at the internals of ASP.NET Web API and its async implementation. [For a newer related post see here]

As covered some ground in our previous post, we will continue our journey while this time we will have a look at the async used in the Web API Source Code was that was released recently. For more information about building and using the source code, please see my earlier post.

First some myth-busting and de-mystifying.

What exactly is Task<T>

If you start browsing through the source code of the ASP.NET Web API, you will find very few classes where they do not have xxxAsync() method or do not contain methods returning varieties of Task<T> (which from here on, I am gonna use to generically refer to all tasks including all variations such as Task and Task<T>).

Task allows the operations to be scheduled asynchronously, however, the question is now that most methods return Task<T>, would the context switching of all these asynchronous operations not have an adverse effect on the performance of the Web API? At the end of the day, async was added to improve the scalability but can adversely damage the performance? As we will see below, ASP.NET Web API runs actions asynchronously only if you tell it to - hence no context-switching happens within the framework for non-Task actions. And for Task actions, since we are dealing with long running processes, asynchronous processing provides scalability.

And here is the myth-busting:
TASK       ASYNC 
While Task<T> was designed and built to serve as an easy, approachable and consistent asynchronous programming model, it does not necessarily mean a Task<T> will run on a background thread or in the ThreadPool. In fact, this is very much the case for xxxAsync methods in the framework, as we will see shortly.

Have a look at the code below:

Thread.CurrentThread.Name = "Main Thread";
Func<string> f = () =>
 {
  Console.WriteLine(Thread.CurrentThread.Name);
  return "Ali";
 };
Task<string> t = new Task<string>(f);
t.RunSynchronously();
Console.WriteLine(t.Result);
// OUTPUT:
// Main Thread
// Ali


So as it can be seen above RunSynchronously will run the task in the current thread - hence the thread names are the same.

So what is Task then? I would like to describe Task as below:
Task is nothing but a wrapper class that encapsulates a delegate and its state (result, status, exception, etc)
In fact, it is the Scheduler (which can be passed as an argument to Start) which defines how the task should run. Default scheduler (when constructing the task without passing scheduler or using Task.Factory.StartNew()) runs the task in the ThreadPool. This is different from the original scheduler initially designed that used to have a dedicated scheduler with as many threads as CPUs on the box (see here for discussion on the subject).


Web API Pipeline

This subject deserves its own post which I will get to but in the meantime, let's look at the processing pipeline from the moment request is received by the server until action is called on the controller. 

This is an simplified view of the pipeline (which for now serves our purpose):


In terms of servers, currently there are two out-of-the box server scenarios in Web API:
  • ASP.NET Pipeline: this involves using IIS, IIS Express or Visual Studio Development Web Server (Cassini)
  • Self hosting
In terms of Self Hosting, HttpSelfHostServer receives the request and process it using 
ProcessRequestContext which passes a ChannelContext and RequestContext. This is reminiscent of good old ISAPI entry pointHttpSelfHostServer will call its base class's (HttpServer) SendAsync which is the first place we see the Task<T>:
protected override Task<HttpResponseMessage> 
    SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)

This method is responsible for copying some context to the Thread Storage Area and HTTP request properties. But the returned task is not created in this method and is returned from its base class DelegatingHandler. This class does nothing but to call SendAsync on its inner handler (i.e. it delegates the call to another handler). This object is one of the classes inherited from HttpMessageHandler (in fact DelegatingHandler itself is such a derived class). In a typical Web API scenario, this class is HttpControllerDispatcher.

Dispatchers are responsible for finding the actual handler (in our case the controller) and preparing the state for its execution and final release of the handler. In this case, HttpControllerDispatcher does exactly that. But it is important that dispatcher here only returns the Task<HttpResponseMessage> returned to it by the controller, so again does not create the task.

Controllers are the ones doing the actual work hence the method name changes from SendAsync to ExecuteAsync. The bulk of the work is done in the ApiController. This class gathers controllerDescriptor, actionDescriptor and filters and in essence, returns HttpActionInvoker's InvokeActionAsync. Note that SendAsync changed to ExecuteAsync and now it is InvokeActionAsync.

We have not yet seen the creation of the task! But we are not too far off. Most of the heavy lifting is done in the private class ActionExecuter of ReflectedHttpActionDescriptor. This class compiles a lambda expression based on the return type of the action:
if (methodCall.Type == typeof(Task))
{
 // for: public Task Action()
 return ...
}
else if (typeof(Task).IsAssignableFrom(methodCall.Type))
{
 // for: public Task<T> Action()
 return ...
}
else
{
 // for: public T Action()
 return ...
}

If the return type is Task, or Task<T>, it will return the task. But what if it is a non-Task return type? Well, it executes the action to get the result and wraps it in a task:
var result = compiled(instance, methodParameters);
...
return TaskHelpers.FromResult(result);
TaskHelpers.FromResult simply creates a dummy task and sets the result.

How much of the Web API runs asynchronously?

Almost zero if you do not return Task or Task<T>! In order to investigate this, we will use a simple technique to trace the thread switching from the time request is picked up by the server (in this case Self-Host) all the way to the controller and then back again.

I use this simple code to name the current thread to the "class.method()" name if it already does not have a name - I also use a prefix for ability to filter these debug outputs from the rest. Remember that assigning a name to a thread already having a name throws exception. We also output the name using Trace.WriteLine():
using System.Diagnostics;

namespace System.Threading
{
 public static class ThreadNameTracingExtensions
 {
  /// <summary>
  /// if thread has a name, it leaves the name as it is. If it does not have a name,
  /// it sets the thread's name to the module.method
  /// It outputs as 
  /// </summary>
  /// <param name="thread"></param>
  public static void TraceName(this Thread thread)
  {
   var st = new StackTrace(new StackFrame(1));
   var methodBase = st.GetFrame(0).GetMethod();
   string name = string.Format("{0}.{1}()", methodBase.ReflectedType.Name, methodBase.Name);
   Trace.WriteLine(string.Format("__ThreadName__ => {0}: '{1}'", name, thread.Name));
   if (string.IsNullOrEmpty(thread.Name))
    thread.Name = name;
  }
 }
}

And now we just sprinkle this line of code in various points in the pipeline:
System.Threading.Thread.CurrentThread.TraceName();

Let's look at how this will behave for an action returning a string:
[HttpGet]
public string Test(int id)
{
 System.Threading.Thread.CurrentThread.TraceName();
 return "Test";
}

Here is the trace output (I have removed "__ThreadName__ => " prefix):


HttpServer.Initialize(): ''
HttpControllerDispatcher.SendAsync(): 'HttpServer.Initialize()'
ApiController.ExecuteAsync(): 'HttpServer.Initialize()'
<>c__DisplayClass3.<ExecuteAsync>b__0(): 'HttpServer.Initialize()'
ApiControllerActionInvoker.InvokeActionAsync(): 'HttpServer.Initialize()'
<>c__DisplayClass3.<InvokeActionAsync>b__0(): 'HttpServer.Initialize()'
ReflectedHttpActionDescriptor.ExecuteAsync(): 'HttpServer.Initialize()'
<>c__DisplayClass5.<ExecuteAsync>b__4(): 'HttpServer.Initialize()'
HelloWorldController.Test(): 'HttpServer.Initialize()'
So as you can see from above, the thread that initialised HttpServer, will call all the methods in the pipeline, including all those xxxAsync methods. Our action is also called synchronously by the same thread. Note anonymous methods created by lambda style delegate definitions all the way through the pipeline.

It seems to me that ASP.NET team have decided to add Async suffix to all methods that could be called asynchronously rather than would be called asynchronously. I would like to get a confirmation if it is the case or I am missing something since I am for now using Self-Hosting. In any case, I think this naming could be misleading and perhaps is a misnomer (although alternative ExecuteCanBeAsync is ugly!) but I trust the team had a good reason to use this name.


Now let's look at an action that returns a task: 

[HttpGet]
public Task<string> AsyncCall()
{
 System.Threading.Thread.CurrentThread.TraceName();
 var task =
  new Task<string>(
   () =>
   {
     
     System.Threading.Thread.CurrentThread.TraceName();
     Thread.Sleep(20 * 1000); // 20 seconds
     System.Threading.Thread.CurrentThread.TraceName();
     return "AsyncCall";
   }
  );
 task.Start();
 return task;
}

And here is the output:

ApiController.ExecuteAsync(): 'HttpServer.Initialize()'
<>c__DisplayClass3.<ExecuteAsync>b__0(): 'HttpServer.Initialize()'
ApiControllerActionInvoker.InvokeActionAsync(): 'HttpServer.Initialize()'
<>c__DisplayClass3.<InvokeActionAsync>b__0(): 'HttpServer.Initialize()'
ReflectedHttpActionDescriptor.ExecuteAsync(): 'HttpServer.Initialize()'
<>c__DisplayClass5.<ExecuteAsync>b__4(): 'HttpServer.Initialize()'
HelloWorldController.AsyncCall(): 'HttpServer.Initialize()'
HelloWorldController.<AsyncCall>b__0(): ''
HelloWorldController.<AsyncCall>b__0(): 'HelloWorldController.<AsyncCall>b__0()'
So as we can see, the initial thread runs the code including the action. However since we are doing the actual work in an action, action is called in a background (ThreadPool) thread - note the last two lines. 


So as we have seen, unless you return a Task or Task<T>, all the pipeline runs synchronously.

Synchronisation

OK, with task-based operations all through the pipeline, one would expect to see some synchronisation code. But where are they? I had to look for a while to see them.

ASP.NET Web API - from what I have learnt from the source code - uses two modern synchronisation mechanisms (rather than classic WaitOne, WaitAll):

  • Task.ContinueWith<T>: this will ensure that the continuation will run in case of successful running of the task. So basically the thread that created the task does not have to wait instead it can define a piece of continuation code to run at the end of the task -  so the synchronisation in fact happens on the background thread. There are numerous places where this technique is used.
  • SynchronizationContext.Post: This is a low level synchronisation technique that is useful only if SynchronizationContext.Current is set.
SynchronizationContext.Current is very important and it synchronises tasks (when they are asynchronous) with the ASP.NET Thread.


Task continuation is a clever synchronisation. Surely the continuation has to wait for the task to finish but the creator of the task has to wait for neither the task nor its continuation. And analogy is the car in the repair garage. If I leave the car in the garage for repair, they will call me to pick up the car - which can be anytime and perhaps I might be too busy to pick it up. But if I ask them to call my wife to pick up the car, my work will not be interrupted - not that my wife will accept to do this favour! It is just an example.

Conclusion

ASP.NET Web API actions (and all the pipeline methods) will be called asynchronously only if you return a Task or Task<T> . This might sound obvious but none of the pipeline methods with Async suffix will run in their own threads. Using blanket Async could be a misnomer. [UPDATE: ASP.NET team indeed have confirmed that the Async is used to denote methods that return Task and can run asynchronously but do not have to]

ASP.NET Web API uses Task continuation for synchronisation. In essence this synchronises the continuation with the background thread in another background thread.

Disclaimer

This article is the result of exploratory debugging of the Web API source code. Some of the information provided might not be accurate but I will make sure the post is updated and maintained if corrections are fed back to me.


5 comments:

  1. My understanding of async was that it is possible to do async without switching threads, in fact, this is the whole point of it, so you don't spend so many server resources?
    So, if you do an async I/O call, other things could be done on the same thread, and then when the call finishes, it would be resumed.

    ReplyDelete
    Replies
    1. I am not sure if I can follow you. Asynchronous can only happen if there is a thread switch. And that is why on server code it is to be avoided unless there is I/O work involved.

      The point with asynchronous on server is to release IIS threads allowing them to serve more requests. This sacrifices some performance for a higher scalability.

      If the same IIS thread is to do all the work, then it is not released back to the pool. The scenario you are talking about is the plain synchronous.

      Delete
  2. Great Post,

    One small question, Should the sentence "So as we have seen, unless you do not return a Task or Task, all the pipeline runs synchronously." omit the "not"?

    ReplyDelete
    Replies
    1. Thanks! I was locked up in a triple negative and my mind could not handle it :) Corrected now.

      Delete
  3. When I step into the threading code (using .net reflector), I see the ExecuteAsync getting called regardless of whether my action returned Task or not. In other words, it looks like the default behavior in MVC 4 (webapi) is to use ExecuteAsync . Am I missing something?

    ReplyDelete

Note: only a member of this blog may post a comment.