On occasion, circumstances require us to do the following:
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(num_words=my_max)
Then, invariably, we chant this mantra:
tokenizer.fit_on_texts(text)
sequences = tokenizer.texts_to_sequences(text)
While I (more or less) understand what the total effect is, I can't figure out what each one does separately, regardless of how much research I do (including, obviously, the documentation). I don't think I've ever seen one without the other.
So what does each do? Are there any circumstances where you would use either one without the other? If not, why aren't they simply combined into something like:
sequences = tokenizer.fit_on_texts_to_sequences(text)
Apologies if I'm missing something obvious, but I'm pretty new at this.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…