Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

algorithm - How to generate checksum & convert to 64 bit in Javascript for very large files without overflowing RAM?

Question:

  1. How to generate a checksum correctly, which is unique, consistent independent of browsers? Also, I would like to convert a SHA256/MD5 checksum string to 64-bit.

  2. How to properly read a file without huge RAM requirement to generate checksum? i.e. how do we deal with 1 GB file without compromising RAM

e.g. Is it possible to read a file without loading it into memory? (see the answer)

This project seems promising, but couldn't get it worked either.


My intention is to generate the checksum progressively/incrementally in chunks of X MBs. This may help to avoid using too much RAM at a time.
Following is the code, which is not working as expected:

let SIZE_CHECKSUM = 10 * Math.pow(1024, 2); // 10 MB; But can be 1 MB too
async function GetChecksum (file: File):
Promise<string>
{
  let hashAlgorithm: CryptoJS.lib.IHasher<Object> = CryptoJS.algo.SHA256.create();
  let totalChunks: number = Math.ceil(file.size / SIZE_CHECKSUM);
  for (let chunkCount = 0, start = 0, end = 0; chunkCount < totalChunks; ++chunkCount)
  {
    end = Math.min(start + SIZE_CHECKSUM, file.size);
    let resultChunk: string = await (new Response(file.slice(start, end)).text());
    hashAlgorithm.update(resultChunk);
    start = chunkCount * SIZE_CHECKSUM;
  }
  let long: bigInt.BigInteger = bigInt.fromArray(hashAlgorithm.finalize().words, 16, false);
  if(long.compareTo(bigInt.zero) < 0)
    long = long.add(bigInt.one.shiftLeft(64));
  return long.toString();
}

It shows different results in different browsers.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is a logical issue in the code at below line:

start = chunkCount * SIZE_CHECKSUM;  // <--- bug

The variable start is initialized to 0 and then again reset to 0 in the 1st iteration, which is not right.
Following is the way to get a 32 bytes SHA5 checksum with the same library mentioned in the question: "emn178/js-sha256".

That library doesn't provide a Typescript interface, but we can define trivially as following:

// Sha256.d.ts  (also name the corresponding JS file as "Sha256.js")
declare class Sha256 {
  update (data: ArrayBuffer): Sha256;
  hex (): string;
}

declare var sha256: any;
declare interface sha256 {
  create (): Sha256;
}

Then use it as following:

import "./external/Sha256"

async function GetChecksum (file: File):
Promise<string>
{
  let algorithm = sha256.create(); 
  for(let chunkCount = 0, totalChunks = Math.ceil(file.size / SIZE_CHECKSUM); 
      chunkCount < totalChunks;
      ++chunkCount)
  {
    let start = chunkCount * SIZE_CHECKSUM, end = Math.min(start + SIZE_CHECKSUM, file.size); 
    algorithm.update(await (new Response(file.slice(start, end)).arrayBuffer()));
  }
  return algorithm.hex();
}

Above code generates same checksums in all my browsers for any chunk size.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...