Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
155 views
in Technique[技术] by (71.8m points)

How to convert an integer to the shortest url-safe string in Python?

I want the shortest possible way of representing an integer in a URL. For example, 11234 can be shortened to '2be2' using hexadecimal. Since base64 uses is a 64 character encoding, it should be possible to represent an integer in base64 using even less characters than hexadecimal. The problem is I can't figure out the cleanest way to convert an integer to base64 (and back again) using Python.

The base64 module has methods for dealing with bytestrings - so maybe one solution would be to convert an integer to its binary representation as a Python string... but I'm not sure how to do that either.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This answer is similar in spirit to Douglas Leeder's, with the following changes:

  • It doesn't use actual Base64, so there's no padding characters
  • Instead of converting the number first to a byte-string (base 256), it converts it directly to base 64, which has the advantage of letting you represent negative numbers using a sign character.

    import string
    ALPHABET = string.ascii_uppercase + string.ascii_lowercase + 
               string.digits + '-_'
    ALPHABET_REVERSE = dict((c, i) for (i, c) in enumerate(ALPHABET))
    BASE = len(ALPHABET)
    SIGN_CHARACTER = '$'
    
    def num_encode(n):
        if n < 0:
            return SIGN_CHARACTER + num_encode(-n)
        s = []
        while True:
            n, r = divmod(n, BASE)
            s.append(ALPHABET[r])
            if n == 0: break
        return ''.join(reversed(s))
    
    def num_decode(s):
        if s[0] == SIGN_CHARACTER:
            return -num_decode(s[1:])
        n = 0
        for c in s:
            n = n * BASE + ALPHABET_REVERSE[c]
        return n
    

    >>> num_encode(0)
    'A'
    >>> num_encode(64)
    'BA'
    >>> num_encode(-(64**5-1))
    '$_____'

A few side notes:

  • You could (marginally) increase the human-readibility of the base-64 numbers by putting string.digits first in the alphabet (and making the sign character '-'); I chose the order that I did based on Python's urlsafe_b64encode.
  • If you're encoding a lot of negative numbers, you could increase the efficiency by using a sign bit or one's/two's complement instead of a sign character.
  • You should be able to easily adapt this code to different bases by changing the alphabet, either to restrict it to only alphanumeric characters or to add additional "URL-safe" characters.
  • I would recommend against using a representation other than base 10 in URIs in most cases—it adds complexity and makes debugging harder without significant savings compared to the overhead of HTTP—unless you're going for something TinyURL-esque.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...