Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
198 views
in Technique[技术] by (71.8m points)

python - How to combine Topic Prediction and Sequence tagging (NER) in Keras (architecture advice)?

Today I'm using Keras to solve 2 Issues Separately with the same input:

  1. Sequence Tagging (NER)
  2. Topic Classification

I have a dataset combined of sentences from different topics. for each Topic, I have different set of Named Entities.

given a new sentence, I would like to model to predict both topic, and the corresponding Named Entities in the sentences.

E.g.

Sentence:

"Coronavirus [ Disease ] has hit the UK [ Country ] hard, with the country recording more than 3m [ Total Cases ] cases and 90,000 [ Death Count ] deaths linked to the disease."

Topic: 
Medicine

Entities:
{
   'Coronavirus' : 'Disease',
   'UK' : 'Country',
   '3M' : 'Total_Cases',
   '90,000' : 'Death_Count'
}

Assuming I have nearly 15 Topics, What would be the architecture for solving both problems in 1 Model?

Inputs:

  • tokenized sentence (Bert Tokenizer) : 512 tokens per sentence

Outputs per Sentence:

  • 512 tags (100 classes altogether)
  • sentence Topic: 15 Categories.

Data Size ~ 70000 Sentences w/ topics. Architecture used for each problem, independently:

Sequence Tagging

inputter = Input(shape=(512,)) ## no. of Max Bert Tokens in a sentence
model = Embedding(input_dim=X_tr.max()+1, output_dim=20 #input_dim - vocab size
                          input_length=512)(inputter) 
model = Bidirectional(LSTM(units=50, return_sequences=True,
                                   recurrent_dropout=0.1))(model) 
model = TimeDistributed(Dense(50, activation="tanh"))(model) 
crf = CRF(len(y_tr.shape[-1])) #no. of labels 
out = crf(model)  # output
model = Model(inputter, out)
model.compile(optimizer="adam", loss=crf.loss_function, metrics=[crf.accuracy])

Topic Classification

model = Sequential()
model.add(Embedding(X_tr.max()+1, 20,input_length = 512))
model.add(Bidirectional(LSTM(50, activation='tanh',recurrent_dropout=0.2),input_shape=(512)))
model.add(Dense(36))
model.add(Dense(15 activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])

How Can I enjoy both worlds of Many-to-Many (sequence tagging) architecture and Many-to-One (Topic Classification) architecture

Given the same Input to get 2 outputs, and obviously to create some sort of cross connections to eventually make a better decision while performing sequence tagging and get both outputs (topic and named entities)

I'm using Keras on TensorFlow backend

question from:https://stackoverflow.com/questions/65888039/how-to-combine-topic-prediction-and-sequence-tagging-ner-in-keras-architectur

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...