python - how to reshape text data to be suitable for LSTM model in keras

Question

Welcome To Ask or Share your Answers For Others

python - how to reshape text data to be suitable for LSTM model in keras

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - how to reshape text data to be suitable for LSTM model in keras

Update1:

The code Im referring is exactly the code in the book which you can find it here.

The only thing is that I don't want to have embed_size in the decoder part. That's why I think I don't need to have embedding layer at all because If I put embedding layer, I need to have embed_size in the decoder part(please correct me if Im wrong).

Overall, Im trying to adopt the same code without using the embedding layer, because I need o have vocab_size in the decoder part.

I think the suggestion provided in the comment could be correct (using one_hot_encoding) how ever I faced with this error:

When I did one_hot_encoding:

tf.keras.backend.one_hot(indices=sent_wids, classes=vocab_size)

I received this error:

in check_num_samples you should specify the + steps_name + argument ValueError: If your data is in the form of symbolic tensors, you should specify the steps_per_epoch argument (instead of the batch_size argument, because symbolic tensors are expected to produce batches of input data)

The way that I have prepared data is like this:

shape of sent_lens is (87716, 200) and I want to reshape it in a way I can feed it into LSTM. here 200 stands for the sequence_lenght and 87716 is number of samples I have.

below is The code for LSTM Autoencoder:

inputs = Input(shape=(SEQUENCE_LEN,VOCAB_SIZE), name="input")
encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = LSTM(VOCAB_SIZE, return_sequences=True)(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
history = autoencoder.fit(Xtrain, Xtrain,batch_size=BATCH_SIZE, 
epochs=NUM_EPOCHS)

Do I still need to do anything extra, if No, why I can not get this works?

Please let me know which part is not clear I will explain.

Thanks for your help:)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:43:05+0000

You will need to reshape your data in the following way:

Samples. One sequence is one sample. A batch is comprised of one or more samples.
Time Steps. One time step is one point of observation in the sample.
Features. One feature is one observation at a time step.

(samples, time_steps, features)

Then your model should look like the following (simplified version):

visible = Input(shape=(time_steps, features))
encoder = LSTM(100, activation='relu')(visible)
# define reconstruct decoder
decoder = RepeatVector(time_steps)(encoder)
decoder = LSTM(100, activation='relu', return_sequences=True)(decoder)
decoder = TimeDistributed(Dense(features))(decoder)
model = Model(visible, decoder)

Check this great tutorial. Should be helpful for your case.

However, that said you might only need to expand the dimensions of the array.

Check this out as well it might clear things up.

Hope the above is helpful.

Categories

python - how to reshape text data to be suitable for LSTM model in keras

python - how to reshape text data to be suitable for LSTM model in keras

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags