I have a dataset which contains huge amount of pictures (2 millions). A lot of pre-processing has been done and pictures are identified with id. Some of the ids do not exist, but they are generated (FYI, easier to code). This means that when I try to open an image, I surround it with a try/except block. If picture does not exist, I write to a log file and try to add that image identifier's name to a list. I might try to open the same file twice (actually needed for files which exist) and my reasoning was that if I add a picture's identifier to a list, I will not need to catch exception and code will run faster because I can just check if name of the file which does not exist is in the list and if it is, then I can just return None.
I provide some of the code:
def __init__(self, real_frames_dataframe, fake_frames_dataframe,
augmentations, image_size=224):
# Should increase training speed as on second epoch will not need to catch exceptions
self.non_existent_files = []
def __getitem__(self, index):
row_real = self.real_df.iloc[index]
row_fake = self.fake_df.iloc[index]
real_image_name = row_real["image_path"]
fake_image_name = row_fake["image_path"]
# Will go here from second epoch
if real_image_name in self.non_existent_files or fake_image_name in self.non_existent_files:
return None
try:
img_real = Image.open(real_image_name).convert("RGB")
except FileNotFoundError:
log.info("Real Image not found: {}".format(real_image_name))
self.non_existent_files.append(real_image_name)
return None
try:
img_fake = Image.open(fake_image_name).convert("RGB")
except FileNotFoundError:
log.info("Fake Image not found: {}".format(fake_image_name))
self.non_existent_files.append(fake_image_name)
return None
The problem is that I can see the same identifier to be in the log file multiple times. For example:
Line 3201: 20:56:27, training.DeepfakeDataset, INFO Real Image not found: nvptcoxzah
vptcoxzah_260.png
Line 3322: 21:23:13, training.DeepfakeDataset, INFO Real Image not found: nvptcoxzah
vptcoxzah_260.png
I thought the identifier will be appended to non_existent_files and the next time I will not even try to open this file. However, this does not happen. Can anyone explain why?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…