Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
94 views
in Technique[技术] by (71.8m points)

python - Subclassing a Pandas DataFrame, updates?

To inherit, or not to inherit?

What is the latest on the subclassing issue for Pandas? (Most of the other threads are 3-4 years old).

I am hoping to do something like ...

import pandas as pd

class SomeData(pd.DataFrame):
    # Methods
    pass

ClsInstance = SomeData()

# Create a new column on ClsInstance?
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is how I've done it. I've followed advice found:

The example below only shows the use of constructing new subclasses of pandas.DataFrame. If you follow the advice in my first link, you may consider subclassing pandas.Series as well to account for taking single dimensional slices of your pandas.DataFrame subclass.

Defining SomeData

import pandas as pd
import numpy as np

class SomeData(pd.DataFrame):
    # This class variable tells Pandas the name of the attributes
    # that are to be ported over to derivative DataFrames.  There
    # is a method named `__finalize__` that grabs these attributes
    # and assigns them to newly created `SomeData`
    _metadata = ['my_attr']

    @property
    def _constructor(self):
        """This is the key to letting Pandas know how to keep
        derivative `SomeData` the same type as yours.  It should
        be enough to return the name of the Class.  However, in
        some cases, `__finalize__` is not called and `my_attr` is
        not carried over.  We can fix that by constructing a callable
        that makes sure to call `__finlaize__` every time."""
        def _c(*args, **kwargs):
            return SomeData(*args, **kwargs).__finalize__(self)
        return _c

    def __init__(self, *args, **kwargs):
        # grab the keyword argument that is supposed to be my_attr
        self.my_attr = kwargs.pop('my_attr', None)
        super().__init__(*args, **kwargs)

    def my_method(self, other):
        return self * np.sign(self - other)

Demonstration

mydata = SomeData(dict(A=[1, 2, 3], B=[4, 5, 6]), my_attr='an attr')

print(mydata, type(mydata), mydata.my_attr, sep='
' * 2)

   A  B
0  1  4
1  2  5
2  3  6

<class '__main__.SomeData'>

an attr
newdata = mydata.mul(2)

print(newdata, type(newdata), newdata.my_attr, sep='
' * 2)

   A   B
0  2   8
1  4  10
2  6  12

<class '__main__.SomeData'>

an attr
newerdata = mydata.my_method(newdata)

print(newerdata, type(newerdata), newerdata.my_attr, sep='
' * 2)

   A  B
0 -1 -4
1 -2 -5
2 -3 -6

<class '__main__.SomeData'>

an attr

Gotchas

This borks on the method pd.DataFrame.equals

newerdata.equals(newdata)  # Should be `False`
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-304-866170ab179e> in <module>()
----> 1 newerdata.equals(newdata)

~/anaconda3/envs/3.6.ml/lib/python3.6/site-packages/pandas/core/generic.py in equals(self, other)
   1034         the same location are considered equal.
   1035         """
-> 1036         if not isinstance(other, self._constructor):
   1037             return False
   1038         return self._data.equals(other._data)

TypeError: isinstance() arg 2 must be a type or tuple of types

What happens is that this method expected to find an object of type type in the _constructor attribute. Instead, it found my callable that I placed there in order to fix the __finalize__ issue I came across.

Work around

Override the equals method with the following in your class definition.

    def equals(self, other):
        try:
            pd.testing.assert_frame_equal(self, other)
            return True
        except AssertionError:
            return False

newerdata.equals(newdata)  # Should be `False`

False

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...