Why is my CNN implementation in C++ too slow. The python anaconda version runs at least 100 times as fast as the c++ code. I see a difference only in the use of cpu.whereas anaconda version tries to use 100% of cpu whereas my implementation uses 25% of cpu as I see in the Task Manager.
I have optimized my for loops as much as possible. I am using pointers to pointers to implement N-D Arrays. I use C++ structs to abstracts the network layers,planes and nodes of the neural network.
I create array of planes to construct a layer and array of layers to construct the neural network.
Here's a snippet of the innermost part of my loops in the backward pass.
for (int k = 0; k < plnext.szx; k++)
{
for (int l = 0; l < plnext.szy; l++)
{
if (!(n - k >= 0 && n - k < plnext.kx && o - l >= 0 && o - l < plnext.ky))
continue;
node&presnode = layers[i].arrplanes[j].nodes[k][l];
prevplane.nodes[n][o].delta += presplane.kern[m][(n - k)][(o - l)].wt*presnode.delta;
layers[i].arrplanes[j].kern[m][n - k][o - l].chwt += eta *presnode.delta*prevnode.state;
}
}
question from:
https://stackoverflow.com/questions/65952635/why-is-my-cnn-implmentation-in-c-slower-by-orders-of-magnitude-than-the-keras 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…