I recently created a simple application for testing the HTTP call throughput that can be generated in an asynchronous manner vs a classical multithreaded approach.
The application is a able to perform a predefined number of HTTP calls and at the end it displays the total time needed to perform them. During my tests, all HTTP calls were made to my local IIS sever and they retrieved a small text file (12 bytes in size).
The most important part of the code for the asynchronous implementation is listed below:
public async void TestAsync()
{
this.TestInit();
HttpClient httpClient = new HttpClient();
for (int i = 0; i < NUMBER_OF_REQUESTS; i++)
{
ProcessUrlAsync(httpClient);
}
}
private async void ProcessUrlAsync(HttpClient httpClient)
{
HttpResponseMessage httpResponse = null;
try
{
Task<HttpResponseMessage> getTask = httpClient.GetAsync(URL);
httpResponse = await getTask;
Interlocked.Increment(ref _successfulCalls);
}
catch (Exception ex)
{
Interlocked.Increment(ref _failedCalls);
}
finally
{
if(httpResponse != null) httpResponse.Dispose();
}
lock (_syncLock)
{
_itemsLeft--;
if (_itemsLeft == 0)
{
_utcEndTime = DateTime.UtcNow;
this.DisplayTestResults();
}
}
}
The most important part of the multithreading implementation is listed below:
public void TestParallel2()
{
this.TestInit();
ServicePointManager.DefaultConnectionLimit = 100;
for (int i = 0; i < NUMBER_OF_REQUESTS; i++)
{
Task.Run(() =>
{
try
{
this.PerformWebRequestGet();
Interlocked.Increment(ref _successfulCalls);
}
catch (Exception ex)
{
Interlocked.Increment(ref _failedCalls);
}
lock (_syncLock)
{
_itemsLeft--;
if (_itemsLeft == 0)
{
_utcEndTime = DateTime.UtcNow;
this.DisplayTestResults();
}
}
});
}
}
private void PerformWebRequestGet()
{
HttpWebRequest request = null;
HttpWebResponse response = null;
try
{
request = (HttpWebRequest)WebRequest.Create(URL);
request.Method = "GET";
request.KeepAlive = true;
response = (HttpWebResponse)request.GetResponse();
}
finally
{
if (response != null) response.Close();
}
}
Running the tests revealed that the multithreaded version was faster. It took it around 0.6 seconds to complete for 10k requests, while the async one took around 2 seconds to complete for the same amount of load. This was a bit of a surprise, because I was expecting the async one to be faster. Maybe it was because of the fact that my HTTP calls were very fast. In a real world scenario, where the server should perform a more meaningful operation and where there should also be some network latency, the results might be reversed.
However, what really concerns me is the way HttpClient behaves when the load is increased. Since it takes it around 2 seconds to deliver 10k messages, I thought it would take it around 20 seconds to deliver 10 times the number of messages, but running the test showed that it needs around 50 seconds to deliver the 100k messages. Furthermore, it usually takes it more than 2 minutes to deliver 200k messages and often, a few thousands of them (3-4k) fail with the following exception:
An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
I checked the IIS logs and operations that failed never got to the server. They failed within the client. I ran the tests on a Windows 7 machine with the default range of ephemeral ports of 49152 to 65535. Running netstat showed that around 5-6k ports were being used during tests, so in theory there should have been many more available. If the lack of ports was indeed the cause of the exceptions it means that either netstat didn't properly report the situation or HttClient only uses a maximum number of ports after which it starts throwing exceptions.
By contrast, the multithread approach of generating HTTP calls behaved very predictable. I took it around 0.6 seconds for 10k messages, around 5.5 seconds for 100k messages and as expected around 55 seconds for 1 million messages. None of the messages failed. Further more, while it ran, it never used more than 55 MB of RAM (according to Windows Task Manager). The memory used when sending messages asynchronously grew proportionally with the load. It used around 500 MB of RAM during the 200k messages tests.
I think there are two main reasons for the above results. The first one is that HttpClient seems to be very greedy in creating new connections with the server. The high number of used ports reported by netstat means that it probably doesn't benefit much from HTTP keep-alive.
The second is that HttpClient doesn't seem to have a throttling mechanism. In fact this seems to be a general problem related to async operations. If you need to perform a very large number of operations they will all be started at once and then their continuations will be executed as they are available. In theory this should be ok, because in async operations the load is on external systems but as proved above this is not entirely the case. Having a big number of requests started at once will increase the memory usage and slow down the entire execution.
I managed to obtain better results, memory and execution time wise, by limiting the maximum number of asynchronous requests with a simple but primitive delay mechanism:
public async void TestAsyncWithDelay()
{
this.TestInit();
HttpClient httpClient = new HttpClient();
for (int i = 0; i < NUMBER_OF_REQUESTS; i++)
{
if (_activeRequestsCount >= MAX_CONCURENT_REQUESTS)
await Task.Delay(DELAY_TIME);
ProcessUrlAsyncWithReqCount(httpClient);
}
}
It would be really useful if HttpClient included a mechanism for limiting the number of concurrent requests. When using the Task class (which is based on the .Net thread pool) throttling is automatically achieved by limiting the number of concurrent threads.
For a complete overview, I have also created a version of the async test based on HttpWebRequest rather than HttpClient and managed to obtain much better results. For a start, it allows setting a limit on the number of concurrent connections (with ServicePointManager.DefaultConnectionLimit or via config), which means that it never ran out of ports and never failed on any request (HttpClient, by default, is based on HttpWebRequest, but it seems to ignore the connection limit setting).
The async HttpWebRequest approach was still about 50 - 60% slower than the multithreading one, but it was predictable and reliable. The only downside to it was that it used a huge amount of memory under big load. For example it needed around 1.6 GB for sending 1 million requests. By limiting the number of concurrent requests (like I did above for HttpClient) I managed to reduce the used memory to just 20 MB and obtain an execution time just 10% slower than the multithreading approach.
After this lengthy presentation, my questions are: Is the HttpClient class from .Net 4.5 a bad choice for intensive load applications? Is there any way to throttle it, which should fix the problems I mention about? How about the async flavor of HttpWebRequest?
Update (thanks @Stephen Cleary)
As it turns out, HttpClient, just like HttpWebRequest (on which it is based by default), can have its number of concurrent connections on the same host limited with ServicePointManager.DefaultConnectionLimit. The strange thing is that according to MSDN, the default value for the connection limit is 2. I also checked that on my side using the debugger which pointed that indeed 2 is the default value. However, it seems that unless explicitly setting a value to ServicePointManager.DefaultConnectionLimit, the default value will be ignored. Since I didn't explicitly set a value for it during my HttpClient tests I thought it was ignored.
After setting ServicePointManager.DefaultConnectionLimit to 100 HttpClient became reliable and predictable (netstat confirms that only 100 ports are used). It is still slower than async HttpWebRequest (by about 40%), but strangely, it uses less memory. For the test which involves 1 million requests, it used a maximum of 550 MB, compared to 1.6 GB in the async HttpWebRequest.
So, while HttpClient in combination ServicePointManager.DefaultConnectionLimit seem to ensure reliability (at least for the scenario where all the calls are being made towards the same host), it still looks like its performance is negatively impacted by the lack of a proper throttling mechanism. Something that would limit the concurrent number of requests to a configurable value and put the rest in a queue would make it much more suitable for high scalability scenarios.
See Question&Answers more detail:
os