dataframe - Unable to allocate 4.03 GiB for an array with shape (19321328, 28) and data type object

Question

Welcome To Ask or Share your Answers For Others

dataframe - Unable to allocate 4.03 GiB for an array with shape (19321328, 28) and data type object

posted Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

dataframe - Unable to allocate 4.03 GiB for an array with shape (19321328, 28) and data type object

I have CSV files with 28 columns and >20,000 rows. I have 240 such files which amount to ~3GB memory space. I need to read all these CSV files and append them into a single Dataframe.

When I was working on fewer files, I was successfully able to run the code. Now when I am working on all the files, I am getting this error -

time:  130.8604452610016 sec

MemoryError: Unable to allocate 4.03 GiB for an array with shape (19321328, 28) and data type object

Also, the system hangs and I have to restart my CPU :((

This is what I have done till now -

path = r'C:UsersSakshi Sharma.spyder-py3filter' 
#path = r'H:T & F Safe DatafilterDatautm log' #This is when I am tring to read data from an externa hard drive
allFiles = glob.glob(os.path.join(path,"*.csv"))

np_array_list = []
start = time.time()
for file_ in allFiles:
    df = pd.read_csv( file_, index_col = None, header = None, low_memory = False)
    np_array_list.append(df.to_numpy())
end = time.time()
print("time: ",(end-start),"sec")

comb_np_array = np.vstack(np_array_list)
big_frame = pd.DataFrame(comb_np_array)
big_frame.to_csv(r"C:UsersSAKSHI SHARMA.spyder-py3Test.csv", index = False, header = None)
print(big_frame)

The output I get for running fewer number of csv files is as follows -

time:  1.5234270095825195 sec
         0           1       2   ...   25             26     27
0       NTP  1593577703  accept  ...  123  192.168.251.1    784
1       NTP  1593577704  accept  ...  123  192.168.251.1  56370
2       NTP  1593577704  accept  ...  123  192.168.251.1   7081
3       NTP  1593577704  accept  ...  123  192.168.251.1  46782
4       NTP  1593577704  accept  ...  123  192.168.251.1  38699
    ...         ...     ...  ...  ...            ...    ...
251154  NTP  1593602413  accept  ...  123  192.168.251.1  64161
251155  NTP  1593602413  accept  ...  123  192.168.251.1  30659
251156  NTP  1593602413  accept  ...  123  192.168.251.1  49763
251157  NTP  1593602413  accept  ...  123  192.168.251.1  56146
251158  NTP  1593602414  accept  ...  123  192.168.251.1    796

[251159 rows x 28 columns]

Can someone please tell me what I should do to read a big CSV. Also, does giving the path to an external hard drive where I have stored the CSV files create any problem?

path = r'C:UsersSakshi Sharma.spyder-py3utm log'
    or
path = r'H:SAKSHI SHARMAutm log'

I tried this for lesser files and the amount of time taken was almost same in both the cases. But does it matter when the file size is huge?

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

dataframe - Unable to allocate 4.03 GiB for an array with shape (19321328, 28) and data type object

dataframe - Unable to allocate 4.03 GiB for an array with shape (19321328, 28) and data type object

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags