Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
273 views
in Technique[技术] by (71.8m points)

c# - Compressing big number (or string) to small value

My ASP.NET page has following query string parameter:

…?IDs=1000000012,1000000021,1000000013,1000000022&...

Here IDs parameter will always have numbers separated by something, in this case ,. Currently there are 4 numbers but normally they would be in between 3 and 7.

Now, I am looking for method to convert each big number from above into smallest possible value; specifically compressing value of IDs query string parameter. Both, compressing each number algorithm or compressing whole value of IDs query string parameter are welcome.

  1. Encode or decode is not an issue; just compressing the value IDs query string parameter.
  2. Creating some unique small value for IDs and then retrieving its value from some data source is out of scope.

Is there an algorithm to compress such big numbers to small values or to compress value of the IDs query string parameter all together?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You basically need so much room for your numbers because you are using base 10 to represent them. An improvement would be to use base 16 (hex). So for example, you could represent 255 (3 digits) as ff (2 digits).

You can take that concept further by using a much larger number base... the set of all characters that are valid query string parameters:

A-Z, a-z, 0-9, '.', '-', '~', '_', '+'

That gives you a base of 67 characters to work with (see Wikipedia on QueryString).

Have a look at this SO post for approaches to converting base 10 to arbitrary number bases.

EDIT:

In the linked SO post, look at this part:

string xx = IntToString(42, 
            new char[] { '0','1','2','3','4','5','6','7','8','9',
            'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
            'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x'});

That's almost what you need. Just expand it by adding the few characters it is missing:

yz.-~_+

That post is missing a method to go back to base 10. I'm not going to write it :-) but the procedure is like this:

Define a counter I'll call TOTAL.

Look at the right most character and find it's position in the array.
TOTAL = (the position of the character in the array) Example: Input is BA1. TOTAL is now 1 (since "1" is in position 1 in the array)

Now look at the next character left of the first one and find it's position in the array. TOTAL += 47 * (the position of the character in the array) Example: Input is BA1. TOTAL is now (47 * 11) + 1 = 518

Now look at the next character left of the previous one and find it's position in the array. TOTAL += 47 * 47 * (the position of the character in the array) Example: Input is BA1. Total is now (47 * 47 * 10) + (47 * 11) + 1 = 243508

And so on.

I suggest you write a unit test that converts a bunch of base 10 numbers into base 47 and then back again to make sure your conversion code works properly.

Note how you represented a 6 digit base 10 number in just 3 digits of base 47 :-)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...