python - Pandas DataFrame Object Inheritance or Object Use?

Question

Welcome To Ask or Share your Answers For Others

python - Pandas DataFrame Object Inheritance or Object Use?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas DataFrame Object Inheritance or Object Use?

I am building a library for working with very specific structured data and I am building my infrastructure on top of Pandas. Currently I am writing a bunch of different data containers for different use cases, such as CTMatrix for Country x Time Data etc. to house methods appropriate for all CountryxTime structured data.

I am currently debating between

Option 1: Object Inheritance

class CTMatrix(pd.DataFrame):
    methods etc. here

or Option 2: Object Use

class CTMatrix(object):
    _data = pd.DataFrame

    then use getter, setter methods to control access to _data etc.

From a software engineering perspective is there an obvious choice here?

My thoughts so far are:

Option 1:

Can use DataFrame methods directly on the CTMatrix Class (like CTmatrix.sort()) without having to support them via methods on the encapsulated _data object in Option #2
Updates and New methods in Pandas are inherited, except for methods that may be overwritten with local class methods

BUT

Complications with some methods such as __init__() and having to pass the attributes up to the superclass super(MyDF, self).__init__(*args, **kw)

Option 2:

More control over the Class and it's behavior
Possibly more resilient to updates in Pandas?

But

Having to use a getter() or non-hidden attribute to use the object like a dataframe such as (CTMatrix.data.sort())

Are there any additional downsides for taking the approach in Option #1?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T03:05:46+0000

I would avoid subclassing DataFrame, because many of the DataFrame methods will return a new DataFrame and not another instance of your CTMatrix object.

There are a few of open issues on GitHub around this e.g.:

More generally, this is a question of composition vs inheritance. I would be especially wary of benefit #2. It might seem great now, but unless you are keeping a close eye on updates to Pandas (and it is a fast moving target), you can easily end up with unexpected consequences and your code will end up intertwined with Pandas.

Categories

python - Pandas DataFrame Object Inheritance or Object Use?

python - Pandas DataFrame Object Inheritance or Object Use?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags