Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
822 views
in Technique[技术] by (71.8m points)

python - How can I recover a corrupted, partially pickled file?

My program was killed while serializing data (a dict) to disk with dill. I cannot open the partially-written file now.

Is it possible to partially or fully recover the data? If so, how?

Here's what I've tried:

>>> dill.load(open(filename, 'rb'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lib/python3.4/site-packages/dill/dill.py", line 288, in load
    obj = pik.load()
EOFError: Ran out of input
>>> 

The file is not empty:

>>> os.stat(filename).st_size
31110059

Note: all data in the dictionary was comprised of python built-in types.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The pure-Python version of pickle.Unpickler keeps a stack around even if it encounters an error, so you can probably get at least something out of it:

import io
import pickle

# Use the pure-Python version, we can't see the internal state of the C version
pickle.Unpickler = pickle._Unpickler

import dill

if __name__ == '__main__':
    obj = [1, 2, {3: 4, "5": ('6',)}]
    data = dill.dumps(obj)

    handle = io.BytesIO(data[:-5])  # cut it off

    unpickler = dill.Unpickler(handle)

    try:
        unpickler.load()
    except EOFError:
        pass

    print(unpickler.stack)

I get the following output:

[3, 4, '5', ('6',)]

The pickle data format isn't that complicated. Read through the Python module's source code and you can probably find a way to hook all of the load_ methods to give you more information.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...