Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
335 views
in Technique[技术] by (71.8m points)

c++ - Compress 21 Alphanumeric Characters in to 16 Bytes

I'm trying to take 21 bytes of data which uniquely identifies a trade and store it in a 16 byte char array. I'm having trouble coming up with the right algorithm for this.

The trade ID which I'm trying to compress consists of 2 fields:

  1. 18 alphanumeric characters consisting of the ASCII characters 0x20 to 0x7E, Inclusive. (32-126)
  2. A 3-character numeric string "000" to "999"

So a C++ class that would encompass this data looks like this:

class ID
{
public:
    char trade_num_[18];
    char broker_[3];
};

This data needs to be stored in a 16-char data structure, which looks like this:

class Compressed
{
public:
    char sku_[16];    
};

I tried to take advantage of the fact that since the characters in trade_num_ are only 0-127 there was 1 unused bit in each character. Similarly, 999 in binary is 1111100111, which is only 10 bits -- 6 bits short of a 2-byte word. But when I work out how much I can squeeze this down, the smallest I can make it is 17 bytes; one byte too big.

Any ideas?

By the way, trade_num_ is a misnomer. It can contain letters and other characters. That's what the spec says.

EDIT: Sorry for the confusion. The trade_num_ field is indeed 18 bytes and not 16. After I posted this thread my internet connection died and I could not get back to this thread until just now.

EDIT2: I think it is safe to make an assumption about the dataset. For the trade_num_ field, we can assume that the non-printable ASCII characters 0-31 will not be present. Nor will ASCII codes 127 or 126 (~). All the others might be present, including upper and lower case letters, numbers and punctuations. This leaves a total of 94 characters in the set that trade_num_ will be comprised of, ASCII codes 32 through 125, inclusive.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you have 18 characters in the range 0 - 127 and a number in the range 0 - 999 and compact this as much as possible then it will require 17 bytes.

>>> math.log(128**18 * 1000, 256)
16.995723035582763

You may be able to take advantage of the fact that some characters are most likely not used. In particular it is unlikely that there are any characters below value 32, and 127 is also probably not used. If you can find one more unused character so you can first convert the characters into base 94 and then pack them into the bytes as closely as possible.

>>> math.log(94**18 * 1000, 256)
15.993547951857446

This just fits into 16 bytes!


Example code

Here is some example code written in Python (but written in a very imperative style so that it can easily be understood by non-Python programmers). I'm assuming that there are no tildes (~) in the input. If there are you should substitute them with another character before encoding the string.

def encodeChar(c):
    return ord(c) - 32

def encode(s, n):
    t = 0
    for c in s:
        t = t * 94 + encodeChar(c)
    t = t * 1000 + n

    r = []
    for i in range(16):
        r.append(int(t % 256))
        t /= 256

    return r

print encode('                  ', 0)    # smallest possible value
print encode('abcdefghijklmnopqr', 123)
print encode('}}}}}}}}}}}}}}}}}}', 999)  # largest possible value

Output:

[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0]
[ 59, 118, 192, 166, 108,  50, 131, 135, 174,  93,  87, 215, 177,  56, 170, 172]
[255, 255, 159, 243, 182, 100,  36, 102, 214, 109, 171,  77, 211, 183,   0, 247]

This algorithm uses Python's ability to handle very large numbers. To convert this code to C++ you could use a big integer library.

You will of course need an equivalent decoding function, the principle is the same - the operations are performed in reverse order.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...