The gradient expects a scalar function, so by default, it sums up the entries. That is the default behavior simply because all of the gradient descent algorithms need that type of functionality, and stochastic gradient descent (or variations thereof) are the preferred methods inside Tensorflow. You won't find any of the more advanced algorithms (like BFGS or something) because they simply haven't been implemented yet (and they would require a true Jacobian, which also hasn't been implemented). For what its worth, here is a functioning Jacobian implementation that I wrote:
def map(f, x, dtype=None, parallel_iterations=10):
'''
Apply f to each of the elements in x using the specified number of parallel iterations.
Important points:
1. By "elements in x", we mean that we will be applying f to x[0],...x[tf.shape(x)[0]-1].
2. The output size of f(x[i]) can be arbitrary. However, if the dtype of that output
is different than the dtype of x, then you need to specify that as an additional argument.
'''
if dtype is None:
dtype = x.dtype
n = tf.shape(x)[0]
loop_vars = [
tf.constant(0, n.dtype),
tf.TensorArray(dtype, size=n),
]
_, fx = tf.while_loop(
lambda j, _: j < n,
lambda j, result: (j + 1, result.write(j, f(x[j]))),
loop_vars,
parallel_iterations=parallel_iterations
)
return fx.stack()
def jacobian(fx, x, parallel_iterations=10):
'''
Given a tensor fx, which is a function of x, vectorize fx (via tf.reshape(fx, [-1])),
and then compute the jacobian of each entry of fx with respect to x.
Specifically, if x has shape (m,n,...,p), and fx has L entries (tf.size(fx)=L), then
the output will be (L,m,n,...,p), where output[i] will be (m,n,...,p), with each entry denoting the
gradient of output[i] wrt the corresponding element of x.
'''
return map(lambda fxi: tf.gradients(fxi, x)[0],
tf.reshape(fx, [-1]),
dtype=x.dtype,
parallel_iterations=parallel_iterations)
While this implementation works, it does not work when you try to nest it.
For instance, if you try to compute the Hessian by using jacobian( jacobian( ... ))
, then you get some strange errors. This is being tracked as Issue 675. I am still awaiting a response on why this throws an error. I believe that there is a deep-seated bug in either the while loop implementation or the gradient implementation, but I really have no idea.
Anyway, if you just need a jacobian, try the code above.