I have been looking through the source and documentation for google dataflow and I didn't see any mention of the message delivery semantics around PubSubIO.Read
.
The problem I am trying to understand is: What kind of message delivery semantics does the PubSubIO and Google Dataflow provide? Based on my reading of the source, the messages get acked before they are emitted using ProcessingContext#output
method. This implies that the Dataflow streaming job will loose messages that have been acked and not passed on.
So, how does Dataflow guarantee (if at all) correctness around windows (especially session), etc in case of failure and redeploy of jobs.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…