Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
558 views
in Technique[技术] by (71.8m points)

encoding - Is a base64 encoded string unique?

I can't find an answer to this. If I encode a string with Base64 will the encoded output be unique based on the string? I ask because I want to create a token which will contain user information so I need make sure the output will be unique depending on the information.

For example if I encode "UnqUserId:987654321 Timestamp:01/02/03" will this be unique so no matter what other userid I put it in there will never be a collision?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Two years late, but here we go:

The short answer is yes, unique binary/hex values will always encode to a unique base64 encoded string.

BUT, multiple base64 encoded strings may represent a single binary/hex value.

This is because hex bytes are not aligned with base64 'digits'. A single hex byte is represented by 8 bits while a single base64 digit is represented by 6 bits. Therefore, any hex value that is not 6-bit aligned can have multiple base64 representations (though correctly implemented base64 encoders should encode to the same base64 representation).

An example of this misalignment is the hex value '0x433356c1'. This value is represented by 32-bits and base64 encodes into 'QzNWwQ=='. This 32-bit value, however, is not 6-bit aligned. So what happens? The base64 encoder pads four zero bits onto the end of the binary representation in this case to make the sequence 36-bits and consequently 6-bit aligned.

When decoding, the base64 decoder now has to decode into an 8-bit aligned value. It truncates the padded bits and decodes the first 32 bits into a hex value. For example, 'QzNWwc==' and 'QzNWwQ==' are different base64 encoded strings, but decode to the same hex value, 0x433356c1. If we look carefully, we notice that the first 32 bits are the same for both of these encoded strings:

'QzNWwc==':
010000 110011 001101 010110 110000 011100

'QzNWwQ==':
010000 110011 001101 010110 110000 010000

The only difference is the last four bits, which are ignored. Keep in mind that no base64 encoder should ever generate 'QzNWwc==' or any other base64 value for 0x433356c1 other than 'QzNWwQ==' since added padding bytes should always be zeros.

In conclusion, it is safe to assume that a unique binary/hex value will always encode to a unique base64 representation using correctly implemented base64 encoders. A 'collision' will only occur during decoding if base64 strings are generated without zeroing padding/alignment bytes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...