Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
195 views
in Technique[技术] by (71.8m points)

Unpickling a python 2 object with python 3

I'm wondering if there is a way to load an object that was pickled in Python 2.4, with Python 3.4.

I've been running 2to3 on a large amount of company legacy code to get it up to date.

Having done this, when running the file I get the following error:

  File "H:fixers - 3.4addressfixer - 3.4runklibaddressaddress_generic.py"
, line 382, in read_ref_files
    d = pickle.load(open(mshelffile, 'rb'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal
not in range(128)

Looking at the pickled object in contention, it's a dict in a dict, containing keys and values of type str.

So my question is: Is there a way to load an object, originally pickled in python 2.4, with python 3.4?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You'll have to tell pickle.load() how to convert Python bytestring data to Python 3 strings, or you can tell pickle to leave them as bytes.

The default is to try and decode all string data as ASCII, and that decoding fails. See the pickle.load() documentation:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

Setting the encoding to latin1 allows you to import the data directly:

with open(mshelffile, 'rb') as f:
    d = pickle.load(f, encoding='latin1') 

but you'll need to verify that none of your strings are decoded using the wrong codec; Latin-1 works for any input as it maps the byte values 0-255 to the first 256 Unicode codepoints directly.

The alternative would be to load the data with encoding='bytes', and decode all bytes keys and values afterwards.

Note that up to Python versions before 3.6.8, 3.7.2 and 3.8.0, unpickling of Python 2 datetime object data is broken unless you use encoding='bytes'.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...