Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
178 views
in Technique[技术] by (71.8m points)

python - Diff on pandas dataframe with more than one column

I have a pandas dataframe with two columns:

ddf.head()

    a    b
0   3136 13280
1   3072 13312
2   3152 13296
3   3120 13248
4   3120 13200

I would like to calculate the difference between consecutive elements in the same column. Now, if I do it for one column at a time (ddf['a'].diff()) it works as I expect, but if I try ddf.diff() it gives:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-68-6ff864856571> in <module>()
----> 1 ddf.diff()

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in diff(self, periods)
   4285         diffed : DataFrame
   4286         """
-> 4287         new_data = self._data.diff(periods)
   4288         return self._constructor(new_data)
   4289 

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in diff(self, *args, **kwargs)
   1287 
   1288     def diff(self, *args, **kwargs):
-> 1289         return self.apply('diff', *args, **kwargs)
   1290 
   1291     def interpolate(self, *args, **kwargs):

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in apply(self, f, *args, **kwargs)
   1267                 applied = f(blk, *args, **kwargs)
   1268             else:
-> 1269                 applied = getattr(blk,f)(*args, **kwargs)
   1270 
   1271             if isinstance(applied,list):

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in diff(self, n)
    423     def diff(self, n):
    424         """ return block for the diff of the values """
--> 425         new_values = com.diff(self.values, n, axis=1)
    426         return make_block(new_values, self.items, self.ref_items, fastpath=True)
    427 

/home/app/anaconda/lib/python2.7/site-packages/pandas/core/common.pyc in diff(arr, n, axis)
    643     if arr.ndim == 2 and arr.dtype.name in _diff_special:
    644         f = _diff_special[arr.dtype.name]
--> 645         f(arr, out_arr, n, axis)
    646     else:
    647         res_indexer = [slice(None)] * arr.ndim

/home/app/anaconda/lib/python2.7/site-packages/pandas/algos.so in pandas.algos.diff_2d_int16 (pandas/algos.c:91446)()

ValueError: Buffer dtype mismatch, expected 'float32_t' but got 'double'
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use this:

>>> df - df.shift(1)
    a   b
0 NaN NaN
1 -64  32
2  80 -16
3 -32 -48
4   0 -48

But actually, at my machine, df.diff() works ok:

>>> df.diff()
    a   b
0 NaN NaN
1 -64  32
2  80 -16
3 -32 -48
4   0 -48

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...