Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
143 views
in Technique[技术] by (71.8m points)

python - Hash algorithm for dynamic growing/streaming data?

Are there any algorithms that you can continue hashing from a known hash digest? For example, the client upload a chunk of file to ServerA, I can get a md5 sum of the uploaded content, then the client upload the rest of the file chunk to ServerB, can I transfer the state of md5 internals to ServerB and finish the hashing?

There was a cool black magic hack based on md5 I found years ago at comp.lang.python, but it's using ctypes for a specific version of md5.so or _md5.dll, so it's not quite portable code for different python interpreter versions or other programming languages. Besides, the md5 module is deprecated in python standard library since 2.5 so I need to find a more general solution.

What's more, can the state of the hashing be stored in the hex digest itself? (So I can continue hashing a stream of data with an existing hash digest, not a dirty internal hack.)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is theoretically possible (the md5 so far should contain all the state you need to continue) but it looks like the normal APIs don't provide what you need. If you can suffice with a CRC instead, this will probably be a lot easier, since those are more commonly used for the "streaming" cases like you need. See here:

binascii.crc32(data[, crc])

crc32() accepts an optional crc input which is the checksum to continue from.

Hope that helps.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...