I am trying to make use of queues for loading data from files in Tensorflow.
I would like to to run the graph with validation data at the end of each epoch to get a better feel for how the training is going.
That is where i am running into problems. I cant seem to figure out how to
make the switch between training data and validation data when using queues.
I have stripped down my code to a bare minimum toy example to make it easier to
get help. Instead of including all the code that loads the image files, performs inference, and training, I have chopped it off at the
point where the filenames are loaded into the queue.
import tensorflow as tf
# DATA
train_items = ["train_file_{}".format(i) for i in range(6)]
valid_items = ["valid_file_{}".format(i) for i in range(3)]
# SETTINGS
batch_size = 3
batches_per_epoch = 2
epochs = 2
# CREATE GRAPH
graph = tf.Graph()
with graph.as_default():
file_list = tf.placeholder(dtype=tf.string, shape=None)
# Create a queue consisting of the strings in `file_list`
q = tf.train.string_input_producer(train_items, shuffle=False, num_epochs=None)
# Create batch of items.
x = q.dequeue_many(batch_size)
# Inference, train op, and accuracy calculation after this point
# ...
# RUN SESSION
with tf.Session(graph=graph) as sess:
# Initialize variables
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
# Start populating the queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
for epoch in range(epochs):
print("-"*60)
for step in range(batches_per_epoch):
if coord.should_stop():
break
train_batch = sess.run(x, feed_dict={file_list: train_items})
print("TRAIN_BATCH: {}".format(train_batch))
valid_batch = sess.run(x, feed_dict={file_list: valid_items})
print("
VALID_BATCH : {}
".format(valid_batch))
except Exception, e:
coord.request_stop(e)
finally:
coord.request_stop()
coord.join(threads)
Variations and experiments
Trying different values for num_epochs
num_epochs=None
If i set the num_epochs
argument in tf.train.string_input_producer()
to
None
it gives be the following output,
which shows that it is running two epochs as intended, but it is using data
from the training set when running evaluation.
------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']
------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
VALID_BATCH : ['train_file_3' 'train_file_4' 'train_file_5']
num_epochs=2
If i set the num_epochs
argument in tf.train.string_input_producer()
to 2
it gives be the following output,
which shows that it is not even running the full two batches at all
(and evaliation is still using training data)
------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
VALID_BATCH : ['train_file_0' 'train_file_1' 'train_file_2']
------------------------------------------------------------
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
num_epochs=1
If i set the num_epochs
argument in tf.train.string_input_producer()
to 1
in the hopes that it will flush out
any aditional training data from the queue so it can make use of the validation
data, i get the following output, which shows that it is terminating as soon as
it gets through one epoch of training data, and does not get to go through
loading evaluation data.
------------------------------------------------------------
TRAIN_BATCH: ['train_file_0' 'train_file_1' 'train_file_2']
TRAIN_BATCH: ['train_file_3' 'train_file_4' 'train_file_5']
Setting capacity
argument to various values
I have also tried setting the capacity
argument in
tf.train.string_input_producer()
to small values, such as 3, and 1. But these
had no effect on the results.
What other approach should I take?
What other approach could i take to switch between training and validation data?
Would i have to create separate queues? I am at a loss as to how to get that to
work. Would i have to create additional coordinators and queue runners as well?
See Question&Answers more detail:
os