python - How to preserve column names starting with a minus when using numpy.genfromtxt?

Question

Welcome To Ask or Share your Answers For Others

python - How to preserve column names starting with a minus when using numpy.genfromtxt?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to preserve column names starting with a minus when using numpy.genfromtxt?

Similar to this question, numpy.genfromtxt modifies my columns' names:

import numpy as np
from io import BytesIO  # https://stackoverflow.com/a/11970414/321973

str = 'x,-1,1
0,1,1
1,2,3'
data = np.genfromtxt(BytesIO(str.encode()), delimiter=',', names=True)
print(data.dtype.names)

yields ('x', '1', '1_1') instead of the desired ('x', '-1', '1') (or even better, ('x', -1, 1)). I tried deletechars="""~!@#$%^&*()=+~|]}[{';: /?>,<""" as suggested there to no avail.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:25:45+0000

The behavior you're seeing is caused by the fact that np.genfromtxt uses the NameValidator class here to automatically strip certain non-alphanumeric characters from the field names.

It's perfectly legal for a field name to contain a '-' character, e.g.:

x = np.array((1,), dtype=[('-1', 'i')])
print(x['-1'])
# 1

In fact, two out of three of the modified field names you get back from np.genfromtxt are also not "valid Python identifiers" ('1' and '1_1', since they start with digits).

It's therefore possible to construct the array you describe as long as you bypass using np.genfromtxt to set the field names. One way to do it would be to initialize an empty array, specify the field names and dtypes explicitly, then fill it with the rest of the string contents:

names = str.splitlines()[0].split(',')
types = ('i',) * 3
dtype = zip(names, types)

data = np.empty(2, dtype=dtype)
data[:] = np.genfromtxt(BytesIO(str.encode()), delimiter=',', dtype=dtype,
                        skiprows=1)
print(repr(data))
# array([(0, 0, 1), (1, 0, 2)], 
#       dtype=[('x', '<i4'), ('-1', '<i4'), ('1', '<i4')])

However, just because you can doesn't mean you should - there may well be other unpredictable consequences to having a '-' in one of your field names. The safest option is to stick with using only valid Python identifiers as field names.

Categories

python - How to preserve column names starting with a minus when using numpy.genfromtxt?

python - How to preserve column names starting with a minus when using numpy.genfromtxt?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags