parsing - Finding meaningful sub-sentences from a sentence

Question

Welcome To Ask or Share your Answers For Others

parsing - Finding meaningful sub-sentences from a sentence

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

parsing - Finding meaningful sub-sentences from a sentence

Is there a way to to find all the sub-sentences of a sentence that still are meaningful and contain at least one subject, verb, and a predicate/object?

For example, if we have a sentence like "I am going to do a seminar on NLP at SXSW in Austin next month". We can extract the following meaningful sub-sentences from this sentence: "I am going to do a seminar", "I am going to do a seminar on NLP", "I am going to do a seminar on NLP at SXSW", "I am going to do a seminar at SXSW", "I am going to do a seminar in Austin", "I am going to do a seminar on NLP next month", etc.

Please note that there is no deduced sentences here (e.g. "There will be a NLP seminar at SXSW next month". Although this is true, we don't need this as part of this problem.) . All generated sentences are strictly part of the given sentence.

How can we approach solving this problem? I was thinking of creating annotated training data that has a set of legal sub-sentences for each sentence in the training data set. And then write some supervised learning algorithm(s) to generate a model.

I am quite new to NLP and Machine Learning, so it would be great if you guys could suggest some ways to solve this problem.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:30:38+0000

You can use dependency parser provided by Stanford CoreNLP. Collapsed output of your sentence will look like below.

nsubj(going-3, I-1)
xsubj(do-5, I-1)
aux(going-3, am-2)
root(ROOT-0, going-3)
aux(do-5, to-4)
xcomp(going-3, do-5)
det(seminar-7, a-6)
dobj(do-5, seminar-7)
prep_on(seminar-7, NLP-9)
prep_at(do-5, -11)
prep_in(do-5, Austin-13)
amod(month-15, next-14)
tmod(do-5, month-15)

The last 5 of your sentence output are optional. You can remove one or more parts that are not essential to your sentence.
Most of this optional parts are belong to prepositional and modifier e.g : prep_in, prep_do, advmod, tmod, etc. See Stanford Dependency Manual.

For example, if you remove all modifier from the output, you will get

I am going to do a seminar on NLP at SXSW in Austin.

Categories

parsing - Finding meaningful sub-sentences from a sentence

parsing - Finding meaningful sub-sentences from a sentence

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags