python - How to preserve column names while importing data using numpy?

Question

Welcome To Ask or Share your Answers For Others

python - How to preserve column names while importing data using numpy?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to preserve column names while importing data using numpy?

I am using the numpy library in Python to import CSV file data into a ndarray as follows:

data = np.genfromtxt('mydata.csv', 
                     delimiter=',', dtype=None, names=True)

The result provides the following column names:

print(data.dtype.names)

('row_label',
 'MyDataColumn1_0',
 'MyDataColumn1_1')

The original column names are:

row_label, My-Data-Column-1.0, My-Data-Column-1.1

It appears that NumPy is forcing my column names to adopt C-style variable name formatting. Yet there are many cases where my Python scripts require access to columns according to column name, so I need to ensure that column names remain constant. To accomplish this either NumPy needs to preserve the original column names or else I need to convert my column names to the format NumPy is using.

Is there a way to preserve the original column names during import?
If not, is there an easy way to convert column labels to use the format NumPy is using, preferably using some NumPy function?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:24:38+0000

if you set names=True, then the first line of your data file is passed through this function:

validate_names = NameValidator(excludelist=excludelist,
                               deletechars=deletechars,
                               case_sensitive=case_sensitive,
                               replace_space=replace_space)

These are those options that you can supply:

excludelist : sequence, optional
    A list of names to exclude. This list is appended to the default list
    ['return','file','print']. Excluded names are appended an underscore:
    for example, `file` would become `file_`.
deletechars : str, optional
    A string combining invalid characters that must be deleted from the
    names.
defaultfmt : str, optional
    A format used to define default field names, such as "f%i" or "f_%02i".
autostrip : bool, optional
    Whether to automatically strip white spaces from the variables.
replace_space : char, optional
    Character(s) used in replacement of white spaces in the variables
    names. By default, use a '_'.

Perhaps you could try to supply your own deletechars string that is an empty string. But you'd be better off modifying and passing this:

defaultdeletechars = set("""~!@#$%^&*()-=+~|]}[{';: /?.>,<""")

Just take out the period and minus sign from that set, and pass it as:

np.genfromtxt(..., names=True, deletechars="""~!@#$%^&*()=+~|]}[{';: /?>,<""")

Here's the source: https://github.com/numpy/numpy/blob/master/numpy/lib/_iotools.py#l245

Categories

python - How to preserve column names while importing data using numpy?

python - How to preserve column names while importing data using numpy?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags