Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
324 views
in Technique[技术] by (71.8m points)

c# - Downloading 1,000+ files fast?

I currently have a list of URLs that link to either images or MP4 files, what is the best way to recode this to run much faster? I know I could run multiple threads or even parallel but what's the best way?

I'm not too worried about speed as long as it isn't as slow as right now, but I don't want to overpower the device's resources such as CPU trying to speed it up.

public static void Download(string saveLocation, List<string> urls)
{
    using (var client = new WebClient())
    {
        foreach (var url in urls)
        {
            Console.WriteLine("Downloading: " + url);
            client.DownloadFile(url, saveLocation + "/" + url.Split('/').Last());
        }
    }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Update

It was just pointed out to me in a comment by Jimi, that DownloadFileAsync is an event driven call and not awaitable. Though, there is a WebClient.DownloadFileTaskAsync version, which would be the appropriate one to use in this example, it is an awaitable call and returns a Task

Downloads the specified resource to a local file as an asynchronous operation using a task object.

Original answer

I know I could run multiple threads or even parallel but what's the best way

Yes you can make it parallel and be in control of the resources you use.

I'm not too worried about speed as long as it isn't as slow as right now, but I don't want to overpower the device's resources such as CPU trying to speed it up

You should be able to achieve this and configure this fairly well.


OK, so there are many ways to do this. Here are some things to think about:

  • You have 1000s of IO bound tasks (as opposed to CPU bound tasks)
  • With this many files, you want sort of parallelism and to be able to to configure the amount of concurrent tasks.
  • You will want to do this in an async / await pattern so you're not wasting system resources on IO completion ports or smashing your CPU

Some immediate solutions:

  • Tasks, and WaitAll in an asnyc / await pattern, this is a great approach however it's a little bit trickier to limit concurrent tasks.
  • You have the Parallel.ForEach and Parallel.For, this has a nice approach to limit concurrent workloads, but its just not suited to IO bound tasks
  • Or another option you might consider is the Microsoft Dataflow (Task Parallel Library), I have come to like these libraries a lot lately as they can give you the best of both worlds.

Please note: there are many other approaches.

So Parallel.ForEach uses the thread pool. Moreover, IO bound operations will block those threads waiting for a device to respond and tie up resources. A general rule of thumb here is

  • If you have CPU bound code, Parallel.ForEach is appropriate;
  • Though if you have IO bound code, Asynchrony is appropriate.

In this case, downloading a file is clearly I/O, there is a DownloadFileAsync version, and 1000 files to download, so you are best to use async/await pattern and some type of limit on concurrent tasks


Here is a very basic example of how you might achieve this:

Given

public class WorkLoad
{
    public string Url {get;set;}
    public string FileName {get;set;}

}

Dataflow example

public async Task DoWorkLoads(List<WorkLoad> workloads)
{
   var options = new ExecutionDataflowBlockOptions
                     {
                        // add pepper and salt to taste
                        MaxDegreeOfParallelism = 50
                     };

   // create an action block
   var block = new ActionBlock<WorkLoad>(MyMethodAsync, options);

   // Queue them up
   foreach (var workLoad in workloads)
      block.Post(workLoad );

   // wait for them to finish
   block.Complete();
   await block.Completion;

}

...

// Notice we are using the async / await pattern
public async Task MyMethodAsync(WorkLoad workLoad)
{
   
    try
    {
        Console.WriteLine("Downloading: " + workLoad.Url);
        await client.DownloadFileAsync(workLoad.Url, workLoad.FileName);
    }
    catch (Exception)
    {
        // probably best to add some error checking some how
    }
}

Summary

This approach gives you Asynchrony, it also gives you MaxDegreeOfParallelism, it doesn't waste resources, and lets IO be IO

Disclaimer, DataFlow may not be where you want to be, however I just thought I'd give you some more information

Disclaimer 2, Also the above code has not been tested, I would seriously consider researching this technology first and doing your on due diligence thoroughly.


Loosely related demo here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...