python - Convert UUID 32-character hex string into a "YouTube-style" short id and back

Question

Welcome To Ask or Share your Answers For Others

python - Convert UUID 32-character hex string into a "YouTube-style" short id and back

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Convert UUID 32-character hex string into a "YouTube-style" short id and back

I'm assigning all my MongoDB documents a GUID using uuid.uuid1(). I want a way I can derive an 11 character, unique, case-sensitive YouTube-like ID, such as

1_XmY09uRJ4

from uuid's resulting hex string which looks like

ae0a0c98-f1e5-11e1-9t2b-1231381dac60

I want to be able to match the shortened ID to the hex and vice-versa, dynamically without having to store another string in the database. Does anyone have some sample code or can point me in the direction of the module or formula that can do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:43:09+0000

Convert the underlying bytes to a base64 value, stripping the = padding and the newline.

You probably want to use the base64.urlsafe_b64encode() function to avoid using / and + (_ and - are used instead), so the resulting string can be used as a URL path element:

>>> import uuid, base64
>>> base64.urlsafe_b64encode(uuid.uuid1().bytes).rstrip(b'=').decode('ascii')
'81CMD_bOEeGbPwAjMtYnhg'

The reverse:

>>> uuid.UUID(bytes=base64.urlsafe_b64decode('81CMD_bOEeGbPwAjMtYnhg' + '=='))
UUID('f3508c0f-f6ce-11e1-9b3f-002332d62786')

To turn that into generic functions:

from base64 import urlsafe_b64decode, urlsafe_b64encode
from uuid import UUID

def uuid2slug(uuidstring):
    return urlsafe_b64encode(UUID(uuidstring).bytes).rstrip(b'=').decode('ascii')

def slug2uuid(slug):
    return str(UUID(bytes=urlsafe_b64decode(slug + '==')))

This gives you a method to represent the 16-byte UUID in a more compact form. Compress any further and you loose information, which means you cannot decompress it again to the full UUID. The full range of values that 16 bytes can represent will never fit it anything less than 22 base64 characters, which needs 4 characters for every three bytes of input and every character encodes 6 bits of information.

YouTube's unique string is thus not based on a full 16-byte UUID, their 11 character ids are probably stored in the database for easy lookup and based on a smaller value.

Categories

python - Convert UUID 32-character hex string into a "YouTube-style" short id and back

python - Convert UUID 32-character hex string into a "YouTube-style" short id and back

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags