Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.2k views
in Technique[技术] by (71.8m points)

string - How to generate a GUID with a custom alphabet, that behaves similar to an MD5 hash (in JavaScript)?

I am wondering how to generate a GUID given an input string, such that the same input string results in the same GUID (sort of like an MD5 hash). The problem with MD5 hashes is they just guarantee low collision rate, rather than uniqueness. Instead I would like something like this:

guid('v1.0.0') == 1231231231123123123112312312311231231231
guid('v1.0.1') == 6154716581615471658161547165816154716581
guid('v1.0.2') == 1883939319188393931918839393191883939319

How would you go about implementing this sort of thing (ideally in JavaScript)? Is it even possible to do? I am not sure where to start. Things like the uuid module don't take a seed string, and they don't let you use a custom format/alphabet.

I am not looking for the canonical UUID format, but rather a GUID, ideally one made up of just integers.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What you would need is define a one-to-one mapping of text strings (such as "v1.0.0") onto 40 digit long strings (such as "123123..."). This is also known as a bijection, although in your case an injection (a simple one-to-one mapping from inputs to outputs, not necessarily onto) may be enough. As you note, hash functions don't necessarily ensure this mapping, but there are other possibilities, such as full-period linear congruential generators (if they take a seed that you can map one-to-one onto input string values), or other reversible functions.

However, if the set of possible input strings is larger than the set of possible output strings, then you can't map all input strings one-to-one with all output strings (without creating duplicates), due to the pigeonhole principle.

For example, you can't generally map all 120-character strings one-to-one with all 40-digit strings unless you restrict the format of the 120-character strings in some way. However, your problem of creating 40-digit output strings can be solved if you can accept limiting input strings to no more than 1040 values (about 132 bits), or if you can otherwise exploit redundancy in the input strings so that they are guaranteed to compress losslessly to 40 decimal digits (about 132 bits) or less, which may or may not be possible. See also this question.

The algorithm involves two steps:

  • First, transform the string to a BigInt by building up the string's charCodeAt() values similarly to the stringToInt method given in another answer. Throw an error if any charCodeAt() is 0x80 or greater, or if the resulting BigInt is equal to or greater than BigInt(alphabet_length)**BigInt(output_length).
  • Then, transform the integer to another string by taking the mod of the BigInt and the output alphabet's size and replacing each remainder with the corresponding character in the output alphabet, until the BigInt reaches 0.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...