Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.4k views
in Technique[技术] by (71.8m points)

pytorch - Problem with adding smiles on photos with convolutional autoencoder

I have a dataset with images and another dataset as it's description:

attrs

There are a lot of pictures: people with and without sunglasses, smiles and other attributes. What I want to do is be able to add smiles to photos where people are not smiling. I've started like this:

smile_ids = attrs['Smiling'].sort_values(ascending=False).iloc[100:125].index.values
smile_data = data[smile_ids]

no_smile_ids = attrs['Smiling'].sort_values(ascending=True).head(5).index.values
no_smile_data = data[no_smile_ids]

eyeglasses_ids = attrs['Eyeglasses'].sort_values(ascending=False).head(25).index.values
eyeglasses_data = data[eyeglasses_ids]

sunglasses_ids = attrs['Sunglasses'].sort_values(ascending=False).head(5).index.values
sunglasses_data = data[sunglasses_ids]

When I print them their are fine:

plot_gallery(smile_data, IMAGE_H, IMAGE_W, n_row=5, n_col=5, with_title=True, titles=smile_ids)

faces output

Plot gallery looks like this:

def plot_gallery(images, h, w, n_row=3, n_col=6, with_title=False, titles=[]):
plt.figure(figsize=(1.5 * n_col, 1.7 * n_row))
plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
for i in range(n_row * n_col):
    plt.subplot(n_row, n_col, i + 1)
    try:
        plt.imshow(images[i].reshape((h, w, 3)), cmap=plt.cm.gray, vmin=-1, vmax=1, interpolation='nearest')
        if with_title:
            plt.title(titles[i])
        plt.xticks(())
        plt.yticks(())
    except:
        pass

Then I do:

def to_latent(pic):
with torch.no_grad():
    inputs = torch.FloatTensor(pic.reshape(-1, 45*45*3))
    inputs = inputs.to('cpu')
    autoencoder.eval()
    output = autoencoder.encode(inputs)        
    return output

def from_latent(vec):
with torch.no_grad():
    inputs = vec.to('cpu')
    autoencoder.eval()
    output = autoencoder.decode(inputs)        
    return output

After that:

smile_latent = to_latent(smile_data).mean(axis=0)
no_smile_latent = to_latent(no_smile_data).mean(axis=0)
sunglasses_latent = to_latent(sunglasses_data).mean(axis=0)

smile_vec = smile_latent-no_smile_latent
sunglasses_vec = sunglasses_latent - smile_latent

And finally:

def add_smile(ids):
for id in ids:
    pic = data[id:id+1]
    latent_vec = to_latent(pic)
    latent_vec[0] += smile_vec
    pic_output = from_latent(latent_vec)
    pic_output = pic_output.view(-1,45,45,3).cpu()
    plot_gallery([pic,pic_output], IMAGE_H, IMAGE_W, n_row=1, n_col=2)
    
def add_sunglasses(ids):
for id in ids:
    pic = data[id:id+1]
    latent_vec = to_latent(pic)
    latent_vec[0] += sunglasses_vec
    pic_output = from_latent(latent_vec)
    pic_output = pic_output.view(-1,45,45,3).cpu()
    plot_gallery([pic,pic_output], IMAGE_H, IMAGE_W, n_row=1, n_col=2)

But when I execute this line I don't get any faces:

add_smile(no_smile_ids)

The output:

enter image description here

Could someone please explain where is my mistake or why it can happen? Thanks for any help.

Added: checking the shape of pic_output:

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Wild guess, but it seems you are broadcasting your images instead of permuting the axes. The former will have the undesired effect of mixing information across the batches/channels.

pic_output = pic_output.view(-1, 45, 45, 3).cpu()

should be replaced with

pic_output = pic_output.permute(0, 2, 3, 1).cpu()

Assuming tensor pic_output is already shaped like (-1, 3, 45, 45).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...