I'm trying to read a very big Fortran unformatted binary file with python. This file contains 2^30 integers.
I find that the record markers is confusing (the first one is -2147483639), anyway I have achieved to recover the data structure ( those wanted integers are all similar, thus differ from record markers ) and write the code below ( with help of here ).
However, we can see the markers at the begin and the end of each record are not the same. Why is that?
Is it because the size of the data is too long ( 536870910 = (2^30 - 4) / 2 ) ?
But ( 2^31 - 1 ) / 4 = 536870911 > 536870910.
Or just some mistakes made by the author of the data file?
Another question, what's the type of the marker at begin of a record , int or unsigned int?
fp = open(file_path, "rb")
rec_len1, = struct.unpack( '>i', fp.read(4) )
data1 = np.fromfile( fp, '>i', 536870910)
rec_end1, = struct.unpack( '>i', fp.read(4) )
rec_len2, = struct.unpack( '>i', fp.read(4) )
data2 = np.fromfile( fp, '>i', 536870910)
rec_end2, = struct.unpack( '>i', fp.read(4) )
rec_len3, = struct.unpack( '>i', fp.read(4) )
data3 = np.fromfile( fp, '>i', 4)
rec_end3, = struct.unpack( '>i', fp.read(4) )
data = np.concatenate([data1, data2, data3])
(rec_len1,rec_end1,rec_len2,rec_end2,rec_len3,rec_end3)
here's the values of record lenth readed as showed above:
(-2147483639, -2176, 2406, 589824, 1227787, -18)
See Question&Answers more detail:
os