Is it possible to initialise a numpy recarray that will hold strings, without knowing the length of the strings beforehand?
As a (contrived) example:
mydf = np.empty( (numrows,), dtype=[ ('file_name','STRING'), ('file_size_MB',float) ] )
The problem is that I'm constructing my recarray in advance of populating it with information, and I don't necessarily know the maximum length of file_name
in advance.
All my attempts result in the string field being truncated:
>>> mydf = np.empty( (2,), dtype=[('file_name',str),('file_size_mb',float)] )
>>> mydf['file_name'][0]='foobarasdf.tif'
>>> mydf['file_name'][1]='arghtidlsarbda.jpg'
>>> mydf
array([('', 6.9164002347457e-310), ('', 9.9413127e-317)],
dtype=[('file_name', 'S'), ('file_size_mb', '<f8')])
>>> mydf['file_name']
array(['f', 'a'],
dtype='|S1')
(As an aside, why does mydf['file_name']
show 'f' and 'a' whilst mydf
shows '' and ''?)
Similarly, if I initialise with type (say) |S10
for file_name
then things get truncated at length 10.
The only similar question I could find is this one, but this calculates the appropriate string length a priori and hence is not quite the same as mine (as I know nothing in advance).
Is there any alternative other than initalising the file_name
with (eg) |S9999999999999
(ie some ridiculous upper limit)?
See Question&Answers more detail:
os