The function _floats_feature
described in the Tensorflow-Guide expects a scalar (either float32 or float64) as input.
def _float_feature(value):
"""Returns a float_list from a float / double."""
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
As you can see the inputted scalar is written into a list (value=[value]
) which is subsequently given to tf.train.FloatList
as input. tf.train.FloatList
expects an iterator that outputs a float in each iteration (as the list does).
If your feature is not a scalar but a vectur, _float_feature
can be rewritten to pass the iterator directly to tf.train.FloatList (instead of putting it into a list first).
def _float_array_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
However if your feature has two or more dimensions this solution does not work anymore. Like @mmry described in his answer in this case flattening your feature or splitting it into several one-dimensional features would be a solution. The disadvantage of these two approaches is that the information about the actual shape of the feature is lost if no further effort is invested.
Another possibility to write an example message for a higher dimensional array is to convert the array into a byte string and then use the _bytes_feature
function described in the Tensorflow-Guide to write the example message for it. The example message is then serialized and written into a TFRecord file.
import tensorflow as tf
import numpy as np
def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
if isinstance(value, type(tf.constant(0))): # if value ist tensor
value = value.numpy() # get value of tensor
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def serialize_array(array):
array = tf.io.serialize_tensor(array)
return array
#----------------------------------------------------------------------------------
# Create example data
array_blueprint = np.arange(4, dtype='float64').reshape(2,2)
arrays = [array_blueprint+1, array_blueprint+2, array_blueprint+3]
#----------------------------------------------------------------------------------
# Write TFrecord file
file_path = 'data.tfrecords'
with tf.io.TFRecordWriter(file_path) as writer:
for array in arrays:
serialized_array = serialize_array(array)
feature = {'b_feature': _bytes_feature(serialized_array)}
example_message = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example_message.SerializeToString())
The serialized example messages stored in the TFRecord file can be accessed via tf.data.TFRecordDataset
. After the example messages have been parsed, the original array needs to be restored from the byte string it was converted to. This is possible via tf.io.parse_tensor
.
# Read TFRecord file
def _parse_tfr_element(element):
parse_dic = {
'b_feature': tf.io.FixedLenFeature([], tf.string), # Note that it is tf.string, not tf.float32
}
example_message = tf.io.parse_single_example(element, parse_dic)
b_feature = example_message['b_feature'] # get byte string
feature = tf.io.parse_tensor(b_feature, out_type=tf.float64) # restore 2D array from byte string
return feature
tfr_dataset = tf.data.TFRecordDataset('data.tfrecords')
for serialized_instance in tfr_dataset:
print(serialized_instance) # print serialized example messages
dataset = tfr_dataset.map(_parse_tfr_element)
for instance in dataset:
print()
print(instance) # print parsed example messages with restored arrays
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…