I am trying to create a dataflow using tpl with the following form:
-> LoadDataBlock1 -> ProcessDataBlock1 ->
GetInputPathsBlock -> LoadDataBlock2 -> ProcessDataBlock2 -> MergeDataBlock -> SaveDataBlock
-> LoadDataBlock3 -> ProcessDataBlock3 ->
...
-> LoadDataBlockN -> ProcessDataBlockN ->
The idea is, that GetInputPathsBlock
is a block, which finds the paths to the input data that is to be loaded, and then sends the path to each LoadDataBlock
. The LoadDataBlocks are all identical (except that they have each recieved a unique inputPath string from GetInputPaths). The loaded data is then sent to the ProcessDataBlock
, which does some simple processing. Then the data from each ProcessDataBlock
is sent to MergeDataBlock
, which merges it and sends it to SaveDataBlock
, which then saves it to a file.
Think of it as a dataflow that needs to run for each month. First the path is found for the data for each day. Each day's data is loaded and processed, and then merged together for the entire month and saved. Each month can be run parallelly, data for each day in a month can be loaded parallelly and processed parallelly (after the individual day data has been loaded), and once everything for the month has been loaded and processed, it can be merged and saved.
What I tried
As far as I can tell TransformManyBlock<TInput,string>
can be used to do the splitting (GetInputPathsBlock
), and can be linked to a normal TransformBlock<string,InputData>
(LoadDataBlock
), and from there to another TransformBlock<InputData,ProcessedData>
(ProcessDataBlock
), but I don't know how to then merge it back to a single block.
What I looked at
I found this answer, which uses TransformManyBlock
to go from an IEnumerable<item>
to item
, but I don't fully understand it, and I can't link a TransformBlock<InputData,ProcessedData>
(ProcessDataBlock
) to a
TransformBlock<IEnumerable<ProcessedData>>,ProcessedData>
, so I don't know how to use it.
I have also seen answers like this, which suggests using JoinBlock
, but the number of input files N varies, and the files are all loaded in the same way anyway.
There is also this answer, which seems to do what I want, but I don't fully understand it, and I don't know how the setup with the dictionary would be transferred to my case.
How do I split and merge my dataflow?
- Is there a block type I am missing
- Can I somehow use
TransformManyBlock
twice?
- Does tpl make sense for the split/merge or is there a simpler async/await way?
See Question&Answers more detail:
os