Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
252 views
in Technique[技术] by (71.8m points)

dht - How to represent a kademlia routing table as data structure

The kademlia paper talks about the the organization of buckets, splitting, merging and finding the correct bucket to insert in abstract, concise and confusing terms.

§2.2 talks about a fixed set of 160 buckets with each bucket covering a fixed subset of the keyspace. But later chapters involve additional splitting and buckets covering different parts of the keyspace. That don't fit well into a fixed list

What is the correct way to organize buckets?

Meta: Since the confusion is reflected in many questions and partial information has been scattered over many answers this Q&A are intended to provide an easily linked clarification

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The confusion stems from different versions of the paper.

Flat layout

This is from the pre-print version and mostly used to outline basic properties of kademlia in a theoretical manner and still reflected in §2.2 and §3 of the full version.

Many real-world implementations implement this approach but they don't implement bucket splitting, merging or node multihoming.

It involves putting contacts into the ith bucket that shares i prefix bits with the node. Which means the layout uses distances relative to the node's own ID.

Tree-based layout

This is described in section §2.4.

To implement refinements such as handling highly unbalanced trees described towards the end of §2.4 or deeper non-local splitting described in §4.2 one needs to associate each bucket with the keyspace range it covers, this can be expressed similar to CIDR ranges, i.e. a start ID and the number of prefix bits shared to mask off the tail of the ID.

Splitting a bucket is performed by increasing the number of prefix bits by one and setting the added bit to 0 and 1 respectively for two new buckets.

Unlike the flat layout this structure does not involve distances relative to the node's own ID, although some decisions are based on whether the node's own ID would fall into a bucket.

Since the number of buckets in such a routing table varies over time it has to represented in a resizable data structure, this is mentioned in §2.4. Since access can't be done by a fixed index anymore since the exact bucket that will cover any specific node ID is not known until the prefix-ranges have been examined some kind of O(log n) search is needed if one wants to avoid scanning the whole bucket list each time.
Sorting the buckets by the lowest ID that the bucket would cover is a natural approach to achieve this. BTrees or sorted arrays combined with binary search can be used to achieve this.

Regardless which approach you take, populating a response to find_node requests with the correct set of contacts that match the request's target is not trivial since any single bucket may be insufficient to fill it and thus multiple buckets need to be traversed. It may be simpler to scan the whole routing table for the best available candidates for the reply.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...