I have an autoencoder that takes an image as an input and produces a new image as an output.
The input image (1x1024x1024x3) is split into patches (1024x32x32x3) before being fed to the network.
Once I have the output, also a batch of patches size 1024x32x32x3, I want to be able to reconstruct a 1024x1024x3 image. I thought I had this sussed by simply reshaping, but here's what happened.
First, the image as read by Tensorflow:
I patched the image with the following code
patch_size = [1, 32, 32, 1]
patches = tf.extract_image_patches([image],
patch_size, patch_size, [1, 1, 1, 1], 'VALID')
patches = tf.reshape(patches, [1024, 32, 32, 3])
Here are a couple of patches from this image:
But it's when I reshape this patch data back into an image that things go pear-shaped.
reconstructed = tf.reshape(patches, [1, 1024, 1024, 3])
converted = tf.image.convert_image_dtype(reconstructed, tf.uint8)
encoded = tf.image.encode_png(converted)
In this example, no processing has been done between patching and reconstructing. I have made a version of the code you can use to test this behaviour. To use it, run the following:
echo "/path/to/test-image.png" > inputs.txt
mkdir images
python3 image_test.py inputs.txt images
The code will make one input image, one patch image, and one output image for each of the 1024 patches in each input image, so comment out the lines that create input and output images if you're only concerned in saving all the patches.
Somebody please explain what happened :(
See Question&Answers more detail:
os