You can row-wise build a sparse matrix in memory pretty easily:
import numpy as np
import scipy.sparse as sps
input_file_name = "something.csv"
sep = ""
def _process_data(row_array):
return row_array
sp_data = []
with open(input_file_name) as csv_file:
for row in csv_file:
data = np.fromstring(row, sep=sep)
data = _process_data(data)
data = sps.coo_matrix(data)
sp_data.append(data)
sp_data = sps.vstack(sp_data)
This will be easier to write into hdf5 which is a way better way to store numbers at this scale than a text file.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…