What you are doing is halftoning your image.
The methods proposed by others work great, but they are repeating a lot of expensive computations over and over again. Since in a uint16
there are at most 65,536 different values, using a look-up table (LUT) can streamline things a lot. And since the LUT is small, you don't have to worry that much about doing things in place, or not creating boolean arrays. The following code reuses Bi Rico's function to create the LUT:
import numpy as np
import timeit
rows, cols = 768, 1024
image = np.random.randint(100, 14000,
size=(1, rows, cols)).astype(np.uint16)
display_min = 1000
display_max = 10000
def display(image, display_min, display_max): # copied from Bi Rico
# Here I set copy=True in order to ensure the original image is not
# modified. If you don't mind modifying the original image, you can
# set copy=False or skip this step.
image = np.array(image, copy=True)
image.clip(display_min, display_max, out=image)
image -= display_min
np.floor_divide(image, (display_max - display_min + 1) / 256,
out=image, casting='unsafe')
return image.astype(np.uint8)
def lut_display(image, display_min, display_max) :
lut = np.arange(2**16, dtype='uint16')
lut = display(lut, display_min, display_max)
return np.take(lut, image)
>>> np.all(display(image, display_min, display_max) ==
lut_display(image, display_min, display_max))
True
>>> timeit.timeit('display(image, display_min, display_max)',
'from __main__ import display, image, display_min, display_max',
number=10)
0.304813282062
>>> timeit.timeit('lut_display(image, display_min, display_max)',
'from __main__ import lut_display, image, display_min, display_max',
number=10)
0.0591987428298
So there is a x5 speed-up, which is not a bad thing, I guess...