Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
212 views
in Technique[技术] by (71.8m points)

python - Pytorch Dataset. Once catch exception, items do not seem to be added to list

I have a dataset which contains huge amount of pictures (2 millions). A lot of pre-processing has been done and pictures are identified with id. Some of the ids do not exist, but they are generated (FYI, easier to code). This means that when I try to open an image, I surround it with a try/except block. If picture does not exist, I write to a log file and try to add that image identifier's name to a list. I might try to open the same file twice (actually needed for files which exist) and my reasoning was that if I add a picture's identifier to a list, I will not need to catch exception and code will run faster because I can just check if name of the file which does not exist is in the list and if it is, then I can just return None.

I provide some of the code:

     def __init__(self, real_frames_dataframe, fake_frames_dataframe,
                 augmentations, image_size=224):

        # Should increase training speed as on second epoch will not need to catch exceptions
        self.non_existent_files = []

    def __getitem__(self, index):
        row_real = self.real_df.iloc[index]
        row_fake = self.fake_df.iloc[index]

        real_image_name = row_real["image_path"]
        fake_image_name = row_fake["image_path"]

        # Will go here from second epoch
        if real_image_name in self.non_existent_files or fake_image_name in self.non_existent_files:
            return None

        try:
            img_real = Image.open(real_image_name).convert("RGB")
        except FileNotFoundError:
            log.info("Real Image not found: {}".format(real_image_name))
            self.non_existent_files.append(real_image_name)
            return None
        try:
            img_fake = Image.open(fake_image_name).convert("RGB")
        except FileNotFoundError:
            log.info("Fake Image not found: {}".format(fake_image_name))
            self.non_existent_files.append(fake_image_name)
            return None

The problem is that I can see the same identifier to be in the log file multiple times. For example:

Line 3201: 20:56:27, training.DeepfakeDataset, INFO Real Image not found: nvptcoxzah
vptcoxzah_260.png
Line 3322: 21:23:13, training.DeepfakeDataset, INFO Real Image not found: nvptcoxzah
vptcoxzah_260.png

I thought the identifier will be appended to non_existent_files and the next time I will not even try to open this file. However, this does not happen. Can anyone explain why?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...