Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
388 views
in Technique[技术] by (71.8m points)

c# - Parallel execution for IO bound operations

I have read TPL and Task library documents cover to cover. But, I still couldn't comprehend the following case very clearly and right now I need to implement it.

I will simplify my situation. I have an IEnumerable<Uri> of length 1000. I have to make a request for them using HttpClient.

I have two questions.

  1. There is not much computation, just waiting for Http request. In this case can I still use Parallel.Foreach() ?
  2. In case of using Task instead, what is the best practice for creating huge number of them? Let's say I use Task.Factory.StartNew() and add those tasks to a list and wait for all of them. Is there a feature (such as TPL partitioner) that controls number of maximum tasks and maximum HttpClient I can create?

There are couple of similar questions on SO, but no one mentions the maximums. The requirement is just using maximum tasks with maximum HttpClient.

Thank you in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

i3arnon's answer with TPL Dataflow is good; Dataflow is useful especially if you have a mix of CPU and I/O bound code. I'll echo his sentiment that Parallel is designed for CPU-bound code; it's not the best solution for I/O-based code, and especially not appropriate for asynchronous code.

If you want an alternative solution that works well with mostly-I/O code - and doesn't require an external library - the method you're looking for is Task.WhenAll:

var tasks = uris.Select(uri => SendRequestAsync(uri)).ToArray();
await Task.WhenAll(tasks);

This is the easiest solution, but it does have the drawback of starting all requests simultaneously. Particularly if all requests are going to the same service (or a small set of services), this can cause timeouts. To solve this, you need to use some kind of throttling...

Is there a feature (such as TPL partitioner) that controls number of maximum tasks and maximum HttpClient I can create?

TPL Dataflow has that nice MaxDegreeOfParallelism which only starts so many at a time. You can also throttle regular asynchronous code by using another builtin, SemaphoreSlim:

private readonly SemaphoreSlim _sem = new SemaphoreSlim(50);
private async Task SendRequestAsync(Uri uri)
{
  await _sem.WaitAsync();
  try
  {
    ...
  }
  finally
  {
    _sem.Release();
  }
}

In case of using Task instead, what is the best practice for creating huge number of them? Let's say I use Task.Factory.StartNew() and add those tasks to a list and wait for all of them.

You actually don't want to use StartNew. It only has one appropriate use case (dynamic task-based parallelism), which is extremely rare. Modern code should use Task.Run if you need to push work onto a background thread. But you don't even need that to begin with, so neither StartNew nor Task.Run is appropriate here.

There are couple of similar questions on SO, but no one mentions the maximums. The requirement is just using maximum tasks with maximum HttpClient.

Maximums are where asynchronous code really gets tricky. With CPU-bound (parallel) code, the solution is obvious: you use as many threads as you have cores. (Well, at least you can start there and adjust as necessary). With asynchronous code, there isn't as obvious of a solution. It depends on a lot of factors - how much memory you have, how the remote server responds (rate limiting, timeouts, etc), etc.

There's no easy solutions here. You just have to test out how your specific application deals with high levels of concurrency, and then throttle to some lower number.


I have some slides for a talk that attempts to explain when different technologies are appropriate (parallelism, asynchrony, TPL Dataflow, and Rx). If you prefer more of a written description with recipes, I think you may benefit from my book on concurrency.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...