Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
501 views
in Technique[技术] by (71.8m points)

c# - Using AsObservable to observe TPL Dataflow blocks without consuming messages

I have a chain of TPL Dataflow blocks and would like to observe progress somewhere inside the system.

I am aware that I could just jam a TransformBlock into the mesh where I want to observe, get it to post to a progress updater of some variety and then return the message unchanged to the next block. I don't love this solution as the block would be purely there for its side-effect and I would also have to change the block linking logic wherever I want to observe.

So I wondered if I could use ISourceBlock<T>.AsObservable to observe the passing of messages within the mesh without altering it and without consuming the messages. This seems both a purer and more practical solution, if it worked.

From my (limited) understanding of Rx that means that I need the observable to be hot rather than cold, so that my progress updater sees the message but doesn't consume it. And .Publish().RefCount() seems to be the way to make an observable hot. However, it simply does not work as intended - instead either block2 or progress receives and consumes each message.

// Set up mesh
var block1 = new TransformBlock<int, int>(i => i + 20, new ExecutionDataflowBlockOptions() { BoundedCapacity = 1 });
var block2 = new ActionBlock<int>(i => Debug.Print("block2:" + i.ToString()), new ExecutionDataflowBlockOptions() { BoundedCapacity = 1 }); 
var obs = block1.AsObservable().Publish().RefCount(); // Declare this here just in case it makes a difference to do it before the LinkTo call.
var l1 = block1.LinkTo(block2, new DataflowLinkOptions() { PropagateCompletion = true});

// Progress
obs.ForEachAsync(i => Debug.Print("progress:" + i.ToString()));

// Start
var vals = Enumerable.Range(1, 5);
foreach (var v in vals)
{
    block1.Post(v);
}
block1.Complete();

Result is non-deterministic but I get something mixed like this:

block2:21
progress:22
progress:24
block2:23
progress:25

So, am I doing something wrong, or is this impossible due to the way the way TPL Dataflow AsObservable is implemented?

I realise I could also replace the LinkTo between block1 and block2 with an Observable/Observer pair and that might work, but LinkTo with downstream BoundedCapacity = 1 is the whole reason I'm using TPL Dataflow in the first place.

edit: A few clarifications:

  • I did intend to set BoundedCapacity=1 in block2. While it's unnecessary in this trivial example, the downstream-constrained case is where I find TPL Dataflow really useful.
  • To clarify the solution I rejected in my second paragraph, it would be to add the following block linked in between block1 and block2:

    var progressBlock = new TransformBlock<int, int>( i => {SomeUpdateProgressMethod(i); return i;});

  • I would also like to maintain back-pressure so that if a further-upstream block was distributing work to block1 and also other equivalent workers, it wouldn't send work to block1 if that chain was already busy.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The issue with your code is that you're wiring up two consumers of block1. Dataflow is then just giving a value to which ever consumer is there first.

So you need to broadcast the values from block1 into two other blocks to then be able to consume those independently.

Just a side note, don't do .Publish().RefCount() as it doesn't do what you think. It will effectively make a one run only observable that during that one run will allow multiple observers to connect and see the same values. It has nothing to do with the source of the data nor how the Dataflow blocks interact.

Try this code:

// Set up mesh
var block1 = new TransformBlock<int, int>(i => i + 20);
var block_boadcast = new BroadcastBlock<int>(i => i, new DataflowBlockOptions());
var block_buffer = new System.Threading.Tasks.Dataflow.BufferBlock<int>();
var block2 = new ActionBlock<int>(i => Debug.Print("block2:" + i.ToString()));
var obs = block_buffer.AsObservable();
var l1 = block1.LinkTo(block_boadcast);
var l2 = block_boadcast.LinkTo(block2);
var l3 = block_boadcast.LinkTo(block_buffer);

// Progress
obs.Subscribe(i => Debug.Print("progress:" + i.ToString()));

// Start
var vals = Enumerable.Range(1, 5);
foreach (var v in vals)
{
    block1.Post(v);
}
block1.Complete();

That gives me:

block2:21
block2:22
block2:23
block2:24
block2:25
progress:21
progress:22
progress:23
progress:24
progress:25

Which is what I think you wanted.

Now, just as a further aside, using Rx for this might be a better option all around. It's much more powerful and declarative than any TPL or Dataflow option.

Your code boils down to this:

Observable
    .Range(1, 5)
    .Select(i => i + 20)
    .Do(i => Debug.Print("progress:" + i.ToString()));
    .Subscribe(i => Debug.Print("block2:" + i.ToString()));

That pretty much gives you same result.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...