python - Subclassing a Pandas DataFrame, updates?

Question

Welcome To Ask or Share your Answers For Others

python - Subclassing a Pandas DataFrame, updates?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Subclassing a Pandas DataFrame, updates?

To inherit, or not to inherit?

What is the latest on the subclassing issue for Pandas? (Most of the other threads are 3-4 years old).

I am hoping to do something like ...

import pandas as pd

class SomeData(pd.DataFrame):
    # Methods
    pass

ClsInstance = SomeData()

# Create a new column on ClsInstance?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T01:13:38+0000

This is how I've done it. I've followed advice found:

The example below only shows the use of constructing new subclasses of pandas.DataFrame. If you follow the advice in my first link, you may consider subclassing pandas.Series as well to account for taking single dimensional slices of your pandas.DataFrame subclass.

Defining `SomeData`

import pandas as pd
import numpy as np

class SomeData(pd.DataFrame):
    # This class variable tells Pandas the name of the attributes
    # that are to be ported over to derivative DataFrames.  There
    # is a method named `__finalize__` that grabs these attributes
    # and assigns them to newly created `SomeData`
    _metadata = ['my_attr']

    @property
    def _constructor(self):
        """This is the key to letting Pandas know how to keep
        derivative `SomeData` the same type as yours.  It should
        be enough to return the name of the Class.  However, in
        some cases, `__finalize__` is not called and `my_attr` is
        not carried over.  We can fix that by constructing a callable
        that makes sure to call `__finlaize__` every time."""
        def _c(*args, **kwargs):
            return SomeData(*args, **kwargs).__finalize__(self)
        return _c

    def __init__(self, *args, **kwargs):
        # grab the keyword argument that is supposed to be my_attr
        self.my_attr = kwargs.pop('my_attr', None)
        super().__init__(*args, **kwargs)

    def my_method(self, other):
        return self * np.sign(self - other)

Demonstration

mydata = SomeData(dict(A=[1, 2, 3], B=[4, 5, 6]), my_attr='an attr')

print(mydata, type(mydata), mydata.my_attr, sep='
' * 2)

   A  B
0  1  4
1  2  5
2  3  6

<class '__main__.SomeData'>

an attr

newdata = mydata.mul(2)

print(newdata, type(newdata), newdata.my_attr, sep='
' * 2)

   A   B
0  2   8
1  4  10
2  6  12

<class '__main__.SomeData'>

an attr

newerdata = mydata.my_method(newdata)

print(newerdata, type(newerdata), newerdata.my_attr, sep='
' * 2)

   A  B
0 -1 -4
1 -2 -5
2 -3 -6

<class '__main__.SomeData'>

an attr

Gotchas

This borks on the method pd.DataFrame.equals

newerdata.equals(newdata)  # Should be `False`

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-304-866170ab179e> in <module>()
----> 1 newerdata.equals(newdata)

~/anaconda3/envs/3.6.ml/lib/python3.6/site-packages/pandas/core/generic.py in equals(self, other)
   1034         the same location are considered equal.
   1035         """
-> 1036         if not isinstance(other, self._constructor):
   1037             return False
   1038         return self._data.equals(other._data)

TypeError: isinstance() arg 2 must be a type or tuple of types

What happens is that this method expected to find an object of type type in the _constructor attribute. Instead, it found my callable that I placed there in order to fix the __finalize__ issue I came across.

Work around

Override the equals method with the following in your class definition.

    def equals(self, other):
        try:
            pd.testing.assert_frame_equal(self, other)
            return True
        except AssertionError:
            return False

newerdata.equals(newdata)  # Should be `False`

False

Categories

python - Subclassing a Pandas DataFrame, updates?

python - Subclassing a Pandas DataFrame, updates?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Defining `SomeData`

Demonstration

Gotchas

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

python - Subclassing a Pandas DataFrame, updates?

python - Subclassing a Pandas DataFrame, updates?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Defining SomeData

Demonstration

Gotchas

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Defining `SomeData`