Say df
is a pandas dataframe.
df.loc[]
only accepts names
df.iloc[]
only accepts integers (actual placements)
df.ix[]
accepts both names and integers:
When referencing rows, df.ix[row_idx, ]
only wants to be given names. e.g.
df = pd.DataFrame({'a' : ['one', 'two', 'three','four', 'five', 'six'],
'1' : np.arange(6)})
df = df.ix[2:6]
print(df)
1 a
2 2 three
3 3 four
4 4 five
5 5 six
df.ix[0, 'a']
throws an error, it doesn't give return 'two'.
When referencing columns, iloc is prefers integers, not names. e.g.
df.ix[2, 1]
returns 'three', not 2. (Although df.idx[2, '1']
does return 2
).
Oddly, I'd like the exact opposite functionality. Usually my column names are very meaningful, so in my code I reference them directly. But due to a lot of observation cleaning, the row names in my pandas data frames don't usually correspond to range(len(df))
.
I realize I can use:
df.iloc[0].loc['a'] # returns three
But it seems ugly! Does anyone know of a better way to do this, so that the code would look like this?
df.foo[0, 'a'] # returns three
In fact, is it possible to add on my own new method to pandas.core.frame.DataFrame
s, so e.g.
df.idx(rows, cols)
is in fact df.iloc[rows].loc[cols]
?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…