I had to run a batch job on about 600k blobs and found 2 things that really helped:
- Running the operation from a worker role in the same data center. The speed between Azure services is great as long as they are in the same affinity group. Plus there are no data transfer costs.
Running the operation in parallel. The Task Parallel Library (TPL) in .net v4 makes this really easy. Here is the code to set the cache-control header for every blob in a container in parallel:
// get the info for every blob in the container
var blobInfos = cloudBlobContainer.ListBlobs(
new BlobRequestOptions() { UseFlatBlobListing = true });
Parallel.ForEach(blobInfos, (blobInfo) =>
{
// get the blob properties
CloudBlob blob = container.GetBlobReference(blobInfo.Uri.ToString());
blob.FetchAttributes();
// set cache-control header if necessary
if (blob.Properties.CacheControl != YOUR_CACHE_CONTROL_HEADER)
{
blob.Properties.CacheControl = YOUR_CACHE_CONTROL_HEADER;
blob.SetProperties();
}
});
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…