python - Why are attributes lost after copying a Pandas DataFrame

Question

Welcome To Ask or Share your Answers For Others

python - Why are attributes lost after copying a Pandas DataFrame

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Why are attributes lost after copying a Pandas DataFrame

Why is it not possible to pass attributes of an instance through a copy? I want to pass the name attribute to another dataframe.

import copy
df = pd.DataFrame([1,2,3])
df.name = 'sheet1'
df2 = copy.deepcopy(df)

print(f'df.name: {df.name}')
>> df.name: sheet1

print(f'df2.name: {df2.name}')
>>    AttributeError    
        ...      
      'DataFrame' object has no attribute 'name'

Similarly, why does this also not work, when creating a class and inheriting from it?

class DfWithName(pd.DataFrame):

    def __init__(self, *args, **kwargs):
        self.__init__ = super().__init__(*args, **kwargs)
        print('lol')

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, value):
        self._name = value

and using the same code:

import copy
df = DfWithName([1,2,3])
df.name = 'sheet1'
df2 = copy.deepcopy(df) 
print(f'df.name: {df2.name}')
>>    AttributeError    
        ...      
      'DataFrame' object has no attribute 'name'

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:39:35+0000

As noted elsewhere, the DataFrame class has a custom __deepcopy__ method which does not necessarily copy arbitrary attributes assigned to an instance, as with a normal object.

Interestingly, there is an internal _metadata attribute that seems intended to be able to list additional attributes of an NDFrame that should be kept when copying/serializing it. This is discussed some here: https://github.com/pandas-dev/pandas/issues/9317

Unfortunately this is still considered an undocumented internal detail, so it probably shouldn't be used. From looking at the code you can in principle do:

mydf = pd.DataFrame(...)
mydf.name = 'foo'
mydf._metadata += ['name']

and when you copy it it should take the name with it.

You could subclass DataFrame to make this the default:

import functools

class NamedDataFrame(pd.DataFrame):
    _metadata = pd.DataFrame._metadata + ['name']

    def __init__(self, name, *args, **kwargs):
        self.name = name
        super().__init__(*args, **kwargs)

    @property
    def _constructor(self):
        return functools.partial(self.__class__, self.name)

You could also do this without relying on this internal _metadata attribute if you provide your own wrapper to the existing copy method, and possibly also __getstate__ and __setstate__.

Update: It seems actually use of the _metadata attribute for extending Pandas classes is now documented. So the above example should more or less work. These docs are more for development of Pandas itself so it might still be a bit volatile. But this is how Pandas itself extends subclasses of NDFrame.

Categories

python - Why are attributes lost after copying a Pandas DataFrame

python - Why are attributes lost after copying a Pandas DataFrame

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags