I'm trying to create a small english-like language for specifying tasks. The basic idea is to split a statement into verbs and noun-phrases that those verbs should apply to. I'm working with nltk but not getting the results i'd hoped for, eg:
>>> nltk.pos_tag(nltk.word_tokenize("select the files and copy to harddrive'"))
[('select', 'NN'), ('the', 'DT'), ('files', 'NNS'), ('and', 'CC'), ('copy', 'VB'), ('to', 'TO'), ("harddrive'", 'NNP')]
>>> nltk.pos_tag(nltk.word_tokenize("move the files to harddrive'"))
[('move', 'NN'), ('the', 'DT'), ('files', 'NNS'), ('to', 'TO'), ("harddrive'", 'NNP')]
>>> nltk.pos_tag(nltk.word_tokenize("copy the files to harddrive'"))
[('copy', 'NN'), ('the', 'DT'), ('files', 'NNS'), ('to', 'TO'), ("harddrive'", 'NNP')]
In each case it has failed to realise the first word (select, move and copy) were intended as verbs. I know I can create custom taggers and grammars to work around this but at the same time I'm hesitant to go reinventing the wheel when a lot of this stuff is out of my league. I particularly would prefer a solution where non-English languages could be handled as well.
So anyway, my question is one of:
Is there a better tagger for this type of grammar?
Is there a way to weight an existing tagger towards using the verb form more frequently than the noun form?
Is there a way to train a tagger?
Is there a better way altogether?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…