I am building a library for working with very specific structured data and I am building my infrastructure on top of Pandas. Currently I am writing a bunch of different data containers for different use cases, such as CTMatrix for Country x Time Data etc. to house methods appropriate for all CountryxTime structured data.
I am currently debating between
Option 1: Object Inheritance
class CTMatrix(pd.DataFrame):
methods etc. here
or Option 2: Object Use
class CTMatrix(object):
_data = pd.DataFrame
then use getter, setter methods to control access to _data etc.
From a software engineering perspective is there an obvious choice here?
My thoughts so far are:
Option 1:
- Can use DataFrame methods directly on the CTMatrix Class (like
CTmatrix.sort()
) without having to support them via methods on the encapsulated _data
object in Option #2
- Updates and New methods in Pandas are inherited, except for methods that may be overwritten with local class methods
BUT
- Complications with some methods such as
__init__()
and having to pass the attributes up to the superclass super(MyDF, self).__init__(*args, **kw)
Option 2:
- More control over the Class and it's behavior
- Possibly more resilient to updates in Pandas?
But
- Having to use a getter() or non-hidden attribute to use the object like a dataframe such as (
CTMatrix.data.sort()
)
Are there any additional downsides for taking the approach in Option #1?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…