Numpy functions, eg np.mean(), np.var(), etc, accept an array-like argument, like np.array, or list, etc.
But passing in a pandas dataframe also works. This means that a pandas dataframe can indeed disguise itself as a numpy array, which I find a little strange (despite knowing the fact that the underlying values of a df are indeed numpy arrays).
For an object to be an array-like, I thought that it should be slicable using integer indexing in the way a numpy array is sliced. So for instance df[1:3, 2:3] should work, but it would lead to an error.
So, possibly a dataframe gets converted into a numpy array when it goes inside the function. But if that is the case then why does np.mean(numpy_array) lead to a different result than that of np.mean(df)?
a = np.random.rand(4,2)
a
Out[13]:
array([[ 0.86688862, 0.09682919],
[ 0.49629578, 0.78263523],
[ 0.83552411, 0.71907931],
[ 0.95039642, 0.71795655]])
np.mean(a)
Out[14]: 0.68320065182041034
gives a different result than what the below gives...
df = pd.DataFrame(data=a, index=range(np.shape(a)[0]),
columns=range(np.shape(a)[1]))
df
Out[18]:
0 1
0 0.866889 0.096829
1 0.496296 0.782635
2 0.835524 0.719079
3 0.950396 0.717957
np.mean(df)
Out[21]:
0 0.787276
1 0.579125
dtype: float64
The former output is a single number, whereas the latter is a column-wise mean. How does a numpy function know about the make of a dataframe?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…