Normally, I experienced good results with initializing weights in a random range of something like 0.01 to 0.5.
To 1: As far as I know the local error for output layer normally is expectedOutput - currentOutput
, because this simplified statement never fails and has enough accuracy. After this, for fully-connected layers, you use backpropagation to adjust weights of hidden layers. See Yann Lecuns work for efficient: Efficient Backprop
To 2: To prevent to have an input of 1 to your output layer because the sum of the hiddens layer is too big and sigmoid delivers 1 for a huge amount of epochs you could do a simple, easy, efficient hack: always divide the input of each output layers neuron with the amount of neurons in the parent (hidden) layer, therefore your input is always in the interval [-1.0, 1.0] before the sigmoid transfer function is used. In most cases this trick reduces the amount of epochs needed to train the network drastically.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…