Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
171 views
in Technique[技术] by (71.8m points)

carriage return - Python pickle: fix characters before loading

I got a pickled object (a list with a few numpy arrays in it) that was created on Windows and apparently saved to a file loaded as text, not in binary mode (ie. with open(filename, 'w') instead of open(filename, 'wb')). Result is that now I can't unpickle it (not even on Windows) because it's infected with characters (and possibly more)? The main complaint is

ImportError: No module named multiarray

supposedly because it's looking for numpy.core.multiarray , which of course doesn't exist. Simply removing the characters didn't do the trick (tried both sed -e 's/ //g' and, in python s = file.read().replace(' ', ''), but both break the file and yield a cPickle.UnpicklingError later on)

Problem is that I really need to get the data out of the objects. Any ideas how to fix the files?

Edit: On request, the first few hundred bytes of my file, Octal:

x80x02]qx01(}qx02(U
total_timeqx03G?x90x15rxc9(sx00U
reaction_timeqx04NUx0ejump_directionqx05cnumpy.core.multiarray
scalar
qx06cnumpy
dtype
qx07Ux02f8Kx00Kx01x87Rqx08(Kx03Ux01<NNNJxffxffxffxffJxffxffxffxffKx00tbUx08x025x9dx13xfc#xc8?x86RqUx14normalised_directionq
hx06hx08Ux08xf0xf9,x0eAx18xf8?x86Rqx0bU
jump_distanceqx0chx06hx08Ux08x13x14xea&xb0x9bx1a@x86Rq
Ux04jumpqx0ecnumpy.core.multiarray
_reconstruct
qx0fcnumpy
ndarray
qx10Kx00x85Ux01bx87Rqx11(Kx01Kx02x85hx08x89Ux10x87x16xdaEGxf4xf3?x06`OCxe7"x1a@tbUx0emovement_speedqx12hx06hx08Ux08\pxf5[2xc2xef?x86Rqx13Ux0ctrial_lengthqx14G@x98x87xf8x1axb4xbaUconditionqx15Ux0bhigh_mentalqx16Ux07subjectqx17Kx02Ux12movement_directionqx18hx06hx08Ux08xdex06xcfx1c50xfd?x86Rqx19Ux08positionqx1ahx0fhx10Kx00x85Ux01bx87Rqx1b(Kx01Kx02x85hx08x89Ux10Kxb7xb4x07q=x1exc0xf2xc2YIxb7U&xc0tbUx04typeqx1chx0eUx08movementqx1dhx0fhx10Kx00x85Ux01bx87Rqx1e(Kx01Kx02x85hx08x89Ux10xad8x9c9x10xb5xeexbfxffaxa2hWRxcf?tbu}qx1f(hx03G@xbaxbcxb8xadxc8x14hx04G?xd9x99%]xadVx00hx05hx06hx08Ux08xe3Xxa9=xc1xb1xeb?x86Rq h
hx06hx08Ux08x88xf7xb9xc1xd6xff?x86Rq!hx0chx06hx08Ux08vx7fxebx11xea5
@x86Rq"hx0ehx0fhx10Kx00x85Ux01bx87Rq#(Kx01Kx02x85hx08x89Ux10xcdxd9x92x9ax94=x06@]Cxafxefxebxefx02@tbhx12hx06hx08Ux08-x9c&x185xfdxef?x86Rq$hx14G@
xb8Wxb2`Vxachx15hx16hx17Kx02hx18hx06hx08Ux08x8ex87xd1xc2

You may also download the whole file (22k).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Presuming that the file was created with the default protocol=0 ASCII-compatible method, you should be able to load it anywhere by using open('pickled_file', 'rU') i.e. universal newlines.

If this doesn't work, show us the first few hundred bytes: print repr(open('pickled_file', 'rb').read(200)) and paste the results into an edit of your question.

Update after file contents were published:

Your file starts with 'x80x02'; it was dumped with protocol 2, the latest/best. Protocols 1 and 2 are binary protocols. Your file was written in text mode on Windows. This has resulted in each ' ' being converted to ' ' by the C runtime. Files should be opened in binary mode like this:

with open('result.pickle', 'wb') as f: # b for binary
    pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)

with open('result.pickle', 'rb') as f: # b for binary
    obj = pickle.load(f)

Docs are here. This code will work portably on both Windows and non-Windows systems.

You can recover the original pickle image by reading the file in binary mode and then reversing the damage by replacing all occurrences of ' ' by ' '. Note: This recovery procedure is necessary whether you are trying to read it on Windows or not.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...