Because data for a constant tensor is embedded into graph definition. This means this data is stored both in the client, which maintains the graph definition, and in the runtime, which allocates it's own memory for all tensors.
IE, try
a = tf.constant([1,2])
tf.get_default_graph().as_graph_def()
You'll see
dtype: DT_INT32
tensor_shape {
dim {
size: 2
}
}
tensor_content: "0100000002000000"
}
The tensor_content
field is the raw content, same as np.array([1,2], dtype=np.int32).tobytes()
.
Now, to see the runtime allocation, you can run with export TF_CPP_MIN_LOG_LEVEL=1
.
If you evaluate anything using a
you'll see something like this
2017-02-24 16:13:58: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 1 kernel_name: "Const_1/_1" tensor { dtype: DT_INT32 shape { dim { size: 2 } } allocation_description { requested_bytes: 8 allocated_bytes: 256 allocator_name: "cuda_host_bfc" allocation_id: 1 ptr: 8605532160 } } }
This means the runtime asked to allocate 8 bytes, and TF actually allocated 256 bytes. (the choices on how much data to actually allocate are somewhat arbitrary at the moment - bfc_allocator.cc )
Having constants embedded in the graph makes it easier to do some graph-based optimizations like constant folding . But this also means that large constants are inefficient. Also, using large constants is a common cause of exceeding 2GB limit for size of graph.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…