Every time you call np.vstack
NumPy has to allocate space for a brand new array.
So if we say 1 row requires 1 unit of memory
np.vstack([container, container2])
requires an additional 900+5000
units of memory. Moreover, before the assignment occurs,
Python needs to hold space for the old mergedContainer
(if it exists) as well
as space for the new mergedContainer
. So building mergedContainer
iteratively with slices actually requires more memory than trying to build it
with a single call to np.vstack
.
Building it iteratively:
| total | mergedContainer | container1 | container2 | temp | |
|-------+-----------------+------------+------------+------+----------------------------------------------------------------------|
| 7800 | 1900 | 900 | 5000 | 0 | mergedContainer = np.vstack((container1, container2[:1000])) |
| 11200 | 3400 | 900 | 5000 | 1900 | mergedContainer = np.vstack((mergedContainer, container[1000:2500])) |
| 13200 | 3900 | 900 | 5000 | 3400 | mergedContainer = np.vstack((mergedContainer, container[2500:3000])) |
Building it from a single call to np.vstack:
| total | mergedContainer | container1 | container2 | temp | |
|-------+-----------------+------------+------------+------+-------------------------------------------------------|
| 11800 | 5900 | 900 | 5000 | 0 | mergedContainer = np.vstack((container1, container2)) |
We can do even better, however. Instead of calling np.vstack
repeatedly, allocate all the space that is needed once from
the very beginning and write the contents of both container1
and
container2
into it. In other words, avoid allocating two disparate arrays
container1
and container2
if you know eventually you want to merge them.
container = np.empty((5900, 4000))
Note that basic slices such as container[:900]
always return views, and views require
essentially no additional memory. So you could define container1
and
container2
like this:
container1 = container[:900]
container2 = container[900:]
and then assign values in place. This modifies container
:
container1[:] = ...
container2[:] = ...
Thus your your memory requirement would stay around 5900 units.
For example,
import numpy as np
np.random.seed(2015)
container = np.empty((5, 4), dtype='int')
container1 = container[:2]
container2 = container[2:]
container1[:] = np.random.randint(10, size=(2,4))
container2[:] = np.random.randint(1000, size=(3,4))
print(container)
yields
[[ 2 2 9 6]
[ 8 5 7 8]
[112 70 487 124]
[859 8 275 936]
[317 134 393 909]]
while only requiring space for one array of shape (5, 4), and temporarly-used space for the random arrays.
Thus, you wouldn't have to change very much in your code to save memory. Just set it up with
container = np.empty((5900, 4000))
container1 = container[:900]
container2 = container[900:]
and then use
container1[:] = ...
instead of
container1 = ...
to assign values in-place. (Or, of course, you could just write directly into container
.)