python - Handle invalid/corrupted image files in ImageDataGenerator.flow_from_directory in Keras

Question

Welcome To Ask or Share your Answers For Others

python - Handle invalid/corrupted image files in ImageDataGenerator.flow_from_directory in Keras

1 Reply

深蓝 · Answer 1 · 2021-10-23T20:03:31+0000

Well, one solution is to modify the ImageDataGenerator code and put error handling mechanism (i.e. try/except) in it.

However, one alternative is to wrap your generator inside another generator and use try/except there. The disadvantage of this solution is that it throws away the whole generated batch even if one single image is corrupted in that batch (this may mean that it is possible that some of the samples may not be used for training at all):

data_gen = ImageDataGenerator(...)

train_gen = data_gen.flow_from_directory(...)

def my_gen(gen):
    while True:
        try:
            data, labels = next(gen)
            yield data, labels
        except:
            pass

# ... define your model and compile it

# fit the model
model.fit_generator(my_gen(train_gen), ...)

Another disadvantage of this solution is that since you need to specify the number of steps of generator (i.e. steps_per_epoch) and considering that a batch may be thrown away in a step and a new batch is fetched instead in the same step, you may end up training on some of the samples more than once in an epoch. This may or may not have significant effects depending on how many batches include corrupted images (i.e. if there are a few, then there is nothing to be worried about that much).

Finally, note that you may want to use the newer Keras data-generator i.e. Sequence class to read images one by one in the __getitem__ method in each batch and discard corrupted ones. However, the problem of the previous approach, i.e. training on some of the images more than once, is still present in this approach as well since you also need to implement the __len__ method and it is essentially equivalent to steps_per_epoch argument. Although, in my opinion, this approach (i.e. subclassing Sequence class) is superior to the above approach (of course, if you put aside the fact that you may need to write more code) and have fewer side effects (since you can discard a single image and not the whole batch).

Categories

python - Handle invalid/corrupted image files in ImageDataGenerator.flow_from_directory in Keras

python - Handle invalid/corrupted image files in ImageDataGenerator.flow_from_directory in Keras

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags