Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
487 views
in Technique[技术] by (71.8m points)

algorithm - Map incrementing integer range to six-digit base 26 max, but unpredictably

I want to design a URL shortener for a particular use case and type of end-user that I have targetted. I have decided that I want the URLs to be stored internally according to an auto-incrementing integer key. However, it is also required that a key be represented to users in the url as six-digit base 26 (a-z * 6) AND it's not possible to predict what the base 26 url key is based on the incrementing integer key. In other words, the first url key should not be aaaaaa then the next time someone creates a url it shouldn't be aaaaab, etc, and no loop generating a random one and fishing into the database to see if it already exists repeatedly.

The second part of the requirements (urls in base 26 difficult for an outsider to predict) is the more interesting part. Ideally I would like some kind of algorithmic 1-1 mapping of all of the numbers in the 26^6 range to another number in the same range that I can just then print in base 26, and that I can undo algorithmically and don't need to store in a separate table when I want to look up the url. How can I accomplish this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Why not just shuffle the bits around in a specific order before converting to the base 26 value? For example, bit 0 becomes bit 5, bit 1 becomes bit 2, etc. To decode, just do the reverse.

Here's an example in Python. (Edited now to include converting the base too.)

    import random
    
    # generate a random bit order
    # you'll need to save this mapping permanently, perhaps just hardcode it
    # map how ever many bits you need to represent your integer space
    mapping = range(28)
    mapping.reverse()
    #random.shuffle(mapping)
    
    # alphabet for changing from base 10
    chars = 'abcdefghijklmnopqrstuvwxyz'
    
    # shuffle the bits
    def encode(n):
        result = 0
        for i, b in enumerate(mapping):
            b1 = 1 << i
            b2 = 1 << mapping[i]
            if n & b1:
                result |= b2
        return result
    
    # unshuffle the bits
    def decode(n):
        result = 0
        for i, b in enumerate(mapping):
            b1 = 1 << i
            b2 = 1 << mapping[i]
            if n & b2:
                result |= b1
        return result
    
    # change the base
    def enbase(x):
        n = len(chars)
        if x < n:
            return chars[x]
        return enbase(x/n) + chars[x%n]
    
    # go back to base 10
    def debase(x):
        n = len(chars)
        result = 0
        for i, c in enumerate(reversed(x)):
            result += chars.index(c) * (n**i)
        return result
    
    # test it out
    for a in range(200):
        b = encode(a)
        c = enbase(b)
        d = debase(c)
        e = decode(d)
        while len(c) < 7:
            c = ' ' + c
        print '%6d %6d %s %6d %6d' % (a, b, c, d, e)

The output of this script, showing the encoding and decoding process:

   0            0       a            0    0
   1    134217728  lhskyi    134217728    1
   2     67108864  fqwfme     67108864    2
   3    201326592  qyoqkm    201326592    3
   4     33554432  cvlctc     33554432    4
   5    167772160  oddnrk    167772160    5
   6    100663296  imhifg    100663296    6
   7    234881024  ttztdo    234881024    7
   8     16777216  bksojo     16777216    8
   9    150994944  mskzhw    150994944    9
  10     83886080  hbotvs     83886080   10
  11    218103808  sjheua    218103808   11
  12     50331648  egdrcq     50331648   12
  13    184549376  pnwcay    184549376   13
  14    117440512  jwzwou    117440512   14
  15    251658240  veshnc    251658240   15
  16      8388608   sjheu      8388608   16
  17    142606336  mabsdc    142606336   17
  18     75497472  gjfmqy     75497472   18
  19    209715200  rqxxpg    209715200   19

Note that zero maps to zero, but you can just skip that number.

This is simple, efficient and should be good enough for your purposes. If you really needed something secure I obviously would not recommend this. It's basically a naive block cipher. There won't be any collisions.

Probably best to make sure that bit N doesn't ever map to bit N (no change) and probably best if some low bits in the input get mapped to higher bits in the output, in general. In other words, you may want to generate the mapping by hand. In fact, a decent mapping would be simply reversing the bit order. (That's what I did for the sample output above.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...