In such a case, Flink will maintain an Integer value as managed, keyed state. So once one or more orders has been seen for each merchant, Flink's state backend will have data for each of the 200,000+ merchants.
Flink is highly scalable, so it's not a problem to have a lot of keys. keyBy
partitions the stream so that each task manager (worker) will only handle events for a subset of the keys. (This is a sharded key/value store.) Moreover, you can choose between a heap-based state backend that keeps this state in memory, or one that uses an embedded RocksDB instance on each task manager, which keeps the state on the local disk of each task manager.
Bottom line: 200000 integers isn't very much state. Nothing to worry about, even with a single task manager.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…