Do not call .train()
multiple times in your own loop that tries to do alpha
arithmetic.
It's unnecessary, and it's error-prone.
Specifically, in the above code, decrementing the original 0.025
alpha by 0.001
forty times results in (0.025 - 40*0.001
) -0.015
final alpha
, which would also have been negative for many of the training epochs. But a negative alpha
learning-rate is nonsensical: it essentially asks the model to nudge its predictions a little bit in the wrong direction, rather than a little bit in the right direction, on every bulk training update. (Further, since model.iter
is by default 5, the above code actually performs 40 * 5
training passes – 200
– which probably isn't the conscious intent. But that will just confuse readers of the code & slow training, not totally sabotage results, like the alpha
mishandling.)
There are other variants of error that are common here, as well. If the alpha
were instead decremented by 0.0001
, the 40 decrements would only reduce the final alpha
to 0.021
– whereas the proper practice for this style of SGD (Stochastic Gradient Descent) with linear learning-rate decay is for the value to end "very close to 0.000
"). If users start tinkering with max_epochs
– it is, after all, a parameter pulled out on top! – but don't also adjust the decrement every time, they are likely to far-undershoot or far-overshoot 0.000
.
So don't use this pattern.
Unfortunately, many bad online examples have copied this anti-pattern from each other, and make serious errors in their own epochs
and alpha
handling. Please don't copy their error, and please let their authors know they're misleading people wherever this problem appears.
The above code can be improved with the much-simpler replacement:
max_epochs = 40
model = Doc2Vec() # of course, if non-default parameters needed, use them here
# but most users won't need to change alpha/min_alpha at all
model.build_vocab(tagged_data)
model.train(tagged_data, total_examples=model.corpus_count, epochs=max_epochs)
model.save("d2v.model")
Here, the .train()
method will do exactly the requested number of epochs
, smoothly reducing the internal effective alpha
from its default starting value to near-zero. (It's rare to need to change the starting alpha
, but even if you wanted to, just setting a new non-default value at initial model-creation is enough.)