Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
701 views
in Technique[技术] by (71.8m points)

optimization - Pipelining vs Batching in Stackexchange.Redis

I am trying to insert a large(-ish) number of elements in the shortest time possible and I tried these two alternatives:

1) Pipelining:

List<Task> addTasks = new List<Task>();
for (int i = 0; i < table.Rows.Count; i++)
{
    DataRow row = table.Rows[i];
    Task<bool> addAsync = redisDB.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value"));
    addTasks.Add(addAsync);
}
Task[] tasks = addTasks.ToArray();
Task.WaitAll(tasks);

2) Batching:

List<Task> addTasks = new List<Task>();
IBatch batch = redisDB.CreateBatch();
for (int i = 0; i < table.Rows.Count; i++)
{
    DataRow row = table.Rows[i];
    Task<bool> addAsync = batch.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value"));
    addTasks.Add(addAsync);
}
batch.Execute();
Task[] tasks = addTasks.ToArray();
Task.WaitAll(tasks);

I am not noticing any significant time difference (actually I expected the batch method to be faster): for approx 250K inserts I get approx 7 sec for pipelining vs approx 8 sec for batching.

Reading from the documentation on pipelining,

"Using pipelining allows us to get both requests onto the network immediately, eliminating most of the latency. Additionally, it also helps reduce packet fragmentation: 20 requests sent individually (waiting for each response) will require at least 20 packets, but 20 requests sent in a pipeline could fit into much fewer packets (perhaps even just one)."

To me, this sounds a lot like the a batching behaviour. I wonder if behind the scenes there's any big difference between the two because at a simple check with procmon I see almost the same number of TCP Sends on both versions.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Behind the scenes, SE.Redis does quite a bit of work to try to avoid packet fragmentation, so it isn't surprising that it is quite similar in your case. The main difference between batching and flat pipelining are:

  • a batch will never be interleaved with competing operations on the same multiplexer (although it may be interleaved at the server; to avoid that you need to use a multi/exec transaction or a Lua script)
  • a batch will be always avoid the chance of undersized packets, because it knows about all the data ahead of time
  • but at the same time, the entire batch must be completed before anything can be sent, so this requires more in-memory buffering and may artificially introduce latency

In most cases, you will do better by avoiding batching, since SE.Redis achieves most of what it does automatically when simply adding work.

As a final note; if you want to avoid local overhead, one final approach might be:

redisDB.SetAdd(string.Format(keyFormat, row.Field<int>("Id")),
    row.Field<int>("Value"), flags: CommandFlags.FireAndForget);

This sends everything down the wire, neither waiting for responses nor allocating incomplete Tasks to represent future values. You might want to do something like a Ping at the end without fire-and-forget, to check the server is still talking to you. Note that using fire-and-forget does mean that you won't notice any server errors that get reported.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...