I'm handing arrays with extremely huge dimensionality (over 80,000). I'm to convert the whole matrix into dask.array
, call .nonzero()
, then collect the result. So the code is something like this:
layer_arr = da.from_zarr(z, chunks="auto") # (8 million, 80k)
# collect
nonzero = layer_arr.nonzero()
rows, cols = nonzero
# try to compute chunk but this also blows up my memory
# cols.compute_chunk_sizes()
# save
da.to_npy_stack("col", cols)
The problem is, I can't save the result because my memory blows up. I tried to do some re-chunking to avoid this but even .compute_chunk_sizes()
blows up my memory. What can I do to avoid this?
question from:
https://stackoverflow.com/questions/65932899/unable-to-compute-nonzero-because-the-array-is-too-huge 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…