Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
429 views
in Technique[技术] by (71.8m points)

python - Tensorflow execution time

I have a function within a Python script that I am calling multiple times (https://github.com/sankhaMukherjee/NNoptExpt/blob/dev/src/lib/NNlib/NNmodel.py): I have simplified the function significantly for this example.

def errorValW(self, X, y, weights):

    errVal = None

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        nW = len(self.allW)
        W = weights[:nW] 
        B = weights[nW:]

        for i in range(len(W)):
            sess.run(tf.assign( self.allW[i], W[i] ))

        for i in range(len(B)):
            sess.run(tf.assign( self.allB[i], B[i] ))

        errVal = sess.run(self.err, 
            feed_dict = {self.Inp: X, self.Op: y})

    return errVal

I am calling this function many times from another function. When I see the program log, It appears that this function keeps taking longer and longer. A partial log is shown:

21:37:12,634 - ... .errorValW ... - Finished the function [errorValW] in 1.477610e+00 seconds
21:37:14,116 - ... .errorValW ... - Finished the function [errorValW] in 1.481470e+00 seconds
21:37:15,608 - ... .errorValW ... - Finished the function [errorValW] in 1.490914e+00 seconds
21:37:17,113 - ... .errorValW ... - Finished the function [errorValW] in 1.504651e+00 seconds
21:37:18,557 - ... .errorValW ... - Finished the function [errorValW] in 1.443876e+00 seconds
21:37:20,183 - ... .errorValW ... - Finished the function [errorValW] in 1.625608e+00 seconds
21:37:21,719 - ... .errorValW ... - Finished the function [errorValW] in 1.534915e+00 seconds
... many lines later  
22:59:26,524 - ... .errorValW ... - Finished the function [errorValW] in 9.576592e+00 seconds
22:59:35,991 - ... .errorValW ... - Finished the function [errorValW] in 9.466405e+00 seconds
22:59:45,708 - ... .errorValW ... - Finished the function [errorValW] in 9.716456e+00 seconds
22:59:54,991 - ... .errorValW ... - Finished the function [errorValW] in 9.282923e+00 seconds
23:00:04,407 - ... .errorValW ... - Finished the function [errorValW] in 9.415035e+00 seconds

Has anyone else experienced anything like this?? This is totally baffling to me ...

Edit: this is for reference ...

For reference, the initializer for the class is shown below. I suspect that the graph for the result variable is progressively increasing in size. I have seen this problem when I try to save models with tf.train.Saver(tf.trainable_variables()) and the size of this file keeps increasing. I am not sure if I am making a mistake in defining the model in any way ...

def __init__(self, inpSize, opSize, layers, activations):

    self.inpSize = inpSize
    self.Inp     = tf.placeholder(dtype=tf.float32, shape=inpSize, name='Inp')
    self.Op      = tf.placeholder(dtype=tf.float32, shape=opSize, name='Op')

    self.allW    = []
    self.allB    = []

    self.result  = None

    prevSize = inpSize[0]
    for i, l in enumerate(layers):
        tempW = tf.Variable( 0.1*(np.random.rand(l, prevSize) - 0.5), dtype=tf.float32, name='W_{}'.format(i) )
        tempB = tf.Variable( 0, dtype=tf.float32, name='B_{}'.format(i) )

        self.allW.append( tempW )
        self.allB.append( tempB )

        if i == 0:
            self.result = tf.matmul( tempW, self.Inp ) + tempB
        else:
            self.result = tf.matmul( tempW, self.result ) + tempB

        prevSize = l

        if activations[i] is not None:
            self.result = activations[i]( self.result )

    self.err = tf.sqrt(tf.reduce_mean((self.Op - self.result)**2))


    return
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You are calling tf.assign in the the session context. This will keep adding ops to your graph every time you execute the errorValW function, slowing down execution as your graph grows larger. As a rule of thumb, you should avoid ever calling Tensorflow ops when executing models on data (since this will usually be inside a loop, resulting in constant growth of the graph). From my personal experience, even if you are only adding "a few" ops during execution time this can result in extreme slowdown.

Note that tf.assign is an op like any other. You should define it once beforehand (when creating the model/building the graph) and then run the same op repeatedly after launching the session.

I don't know what exactly you are trying to achieve in your code snippet, but consider the following:

...
with tf.Session() as sess:
    sess.run(tf.assign(some_var, a_value))

could be replaced by

a_placeholder = tf.placeholder(type_for_a_value, shape_for_a_value)
assign_op = tf.assign(some_var, a_placeholder)
...
with tf.Session() as sess:
    sess.run(assign_op, feed_dict={a_placeholder: a_value})

where a_placeholder should have the same dtype/shape as some_var. I have to admit I haven't tested this snippet so please let me know if there are issues, but this should be about right.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...