python - Reconstructing two (string concatenated) numbers that were originally floats

Question

Welcome To Ask or Share your Answers For Others

python - Reconstructing two (string concatenated) numbers that were originally floats

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Reconstructing two (string concatenated) numbers that were originally floats

Unfortunately the printing instruction of a code was written without an end-of-the-line character and one every 26 numbers consists of two numbers joined together. The following is a code that shows an example of such behaviour; at the end there is a fragment of the original database.

import numpy as np

for _ in range(2):
  A=np.random.rand()+np.random.randint(0,100)
  B=np.random.rand()+np.random.randint(0,100)
  C=np.random.rand()+np.random.randint(0,100)
  D=np.random.rand()+np.random.randint(0,100)
  with open('file.txt','a') as f:
    f.write(f'{A},{B},{C},{D}')

And thus the output example file looks very similar to what follows:

40.63358599010553,53.86722741700399,21.800795158561158,13.95828176311762557.217562728494684,2.626308403991772,4.840593988487278,32.401778122213486

With the issue being that there are two numbers 'printed together', in the example they were as follows:

13.95828176311762557.217562728494684

So you cannot know if they should be

13.958281763117625, 57.217562728494684

or

13.9582817631176255, 7.217562728494684

Please understand that in this case they are only two options, but the problem that I want to address considers 'unbounded numbers' which are type Python's "float" (where 'unbounded' means in a range we don't know e.g. in the range +- 1E4)

Can the original numbers be reconstructed based on "some" python internal behavior I'm missing?

Actual data with periodicity 27 (i.e. the 26th number consists of 2 joined together):

0.9221878978925224, 0.9331311610066017,0.8600582424784715,0.8754578588852764,0.8738648974725404, 0.8897837559800233,0.6773502027673041,0.736325377603136,0.7956454122424133, 0.8083168444596229,0.7089031184165164, 0.7475306242508357,0.9702361286847581, 0.9900689384633811,0.7453878225174624, 0.7749000030576826,0.7743879170108678, 0.8032590543649807,0.002434,0.003673,0.004194,0.327903,11.357262,13.782266,20.14374,31.828905,33.9260060.9215201173775437, 0.9349343132442707,0.8605282244327555,0.8741626682026793,0.8742163597524663, 0.8874673376386358,0.7109322043854609,0.7376362393985332,0.796158275345

question from:https://stackoverflow.com/questions/65928610/reconstructing-two-string-concatenated-numbers-that-were-originally-floats

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:01:44+0000

To expand my comment into an actual answer:

We do have some information - An IEEE-754 standard float only has 32 bits of precision, some of which is taken up by the mantissa (not all numbers can be represented by a float). For datasets like yours, they're brushing up against the edge of that precision.

We can make that work for us - we just need to test whether the number can, in fact, be represented by a float, at each possible split point. We can abuse strings for this, by testing num_str == str(float(num_str)) (i.e. a string remains the same after being converted to a float and back to a string)

If your number is able to be represented exactly by the IEEE float standard, then the before and after will be equal
If the number cannot be represented exactly by the IEEE float standard, it will be coerced into the nearest number that the float can represent. Obviously, if we then convert this back to a string, will not be identical to the original.

Here's a snippet, for example, that you can play around with

def parse_number(s: str) -> List[float]:
    if s.count('.') == 2:
        first_decimal = s.index('.')
        second_decimal = s[first_decimal + 1:].index('.') + first_decimal + 1
        split_idx = second_decimal - 1
        for i in range(second_decimal - 1, first_decimal + 1, -1):
            a, b = s[:split_idx], s[split_idx:]
            if str(float(a)) == a and str(float(b)) == b:
                return [float(a), float(b)]
        # default to returning as large an a as possible
        return [float(s[:second_decimal - 1]), float(s[second_decimal - 1:])]
    else:
        return [float(s)]

parse_number('33.9260060.9215201173775437')
# [33.926006, 0.9215201173775437]
# this is the only possible combination that actually works for this particular input

Obviously this isn't foolproof, and for some numbers there may not be enough information to differentiate the first number from the second. Additionally, for this to work, the tool that generated your data needs to have worked with IEEE standards-compliant floats (which does appear to be the case in this example, but may not be if the results were generated using a class like Decimal (python) or BigDecimal (java) or something else).

Some inputs might also have multiple possibilities. In the above snippet I've biased it to take the longest possible [first number], but you could modify it to go in the opposite order and instead take the shortest possible [first number].

Categories

python - Reconstructing two (string concatenated) numbers that were originally floats

python - Reconstructing two (string concatenated) numbers that were originally floats

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags