Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
136 views
in Technique[技术] by (71.8m points)

algorithm - unordered_map with "max and min key" tracking?

My question is really simple - having unordered_map<int, int> I need to track maximum key with minimum overhead. Not at any moment, but at certain "checkpoints". More details below.

In my trading application I would like to use unordered_map to store OrderBook items. It should map int to int where key is price and value is volume. (actually price is decimal but i'm using int64 internally)

I'm going to use two unordered_map - one for bid orders and one for offer orders.

I need to know Bid - maximum key of one map and Offer minimum key of another map. Normally I have "several" updates in one pack - and I need to recalculate Bid and Offer only when all updates are processed. For example assume I have such orderbook

100 1
102 1
104 1

Bid (max of key) is 104.

Now I have two updates:

remove key 104
add 103 1
end of transaction

Executing should be as follows:

1:

100 1
102 1

Bid undefined (transaction in progress)

2:

100 1
102 1
103 1

Bid = 103

How can I do that with minimum latency ? Latency is primary requirement, if ignore latency I can do a lot of things:

  • trivial: just scan all keys and find best bid/ask if previous one was deleted
  • i even can use sorted map

So it seems i need some simple algorithm to track max / min of some set without recalculating it every time?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could take an implementation of an AVL-Tree in your language of choice, and extend the nodes by the information what the minimum key in the left tree and the maximum key in the right tree is (i.e. by introducing minkey and maxkey fields).

The maintenance of this should be fairly easy. For a new node, simply set minkey = maxkey = key. On insert, whenever you make the decision to follow the left or the right path, update the minkey or maxkey field of the current node accordingly. Similarly on delete and rotations. This way, you'll always have the minimum and maximum available in the top node.

The added beneift would be that lookups could be slightly faster, lookup can be terminated as soon as the searched key is lower than minkey or greater than maxkey.

But to me it is not apparent that you in fact would need that, as to find the minimum/maximum key in a well balanced search tree (such as an AVL Tree) should be O(log n) averaged, hence maybe the whole thing is not even worth the effort.

If you need the fast insert/delete of an unordered_map (i.e. a hash map), you can at least reach amortized constant tiome for min and max, by maintaining the following data structure:

class umapminmax {
   map<int, int> themap;
   int min, max;
}

You must insert and delete only through this class and record the minima/maxima. Whenever the minimum or the maximum key gets deleted, you need to scan all your keys to get the new minimum/maximum. This scanning is O(n). But if it doesn't happen too often, the amortized time of accessing the minimum/maximum will still be constant.

If you know, OTOH, that it is likely that the minimum/maximum value is removed often, then the O(log n) solution with a search tree will probably be the fastest you can get.

Still, from how you describe your application, even the following is imaginable:

  • It happens often, that the maximum is removed. But, frequently a new maximum will get inserted before you ask for the current maximum the next time.

If this is the case, you can apply a bit more logic: Say that the maximum key gets deleted. Now, instead of scanning the map for the new maximum, set a flag (another new field) that indicates that the maximum key is not valid. Then, when by chance another value with a key that is greater or equal than the invalid maximum is inserted, you can just record this key as maximum and reset the flag so that in indicates the maximum is valid. When retrieving the maximum, however, you must consult the flag and perform a scan if it says it is invalid.

You see, you really need a model implementation and do some statistics with real world data.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...