Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
650 views
in Technique[技术] by (71.8m points)

windows - NLTK v3.2: Unable to nltk.pos_tag()

Hi text mining champions,

I'm using Anaconda with NLTK v3.2 on Windows 10.(client's environment)

When I try to POS tag, I keep getting a URLLIB2 error:

URLError: <urlopen error unknown url type: c>

It seems urllib2 is unable to recognize windows paths? How can I work around this?

The command is simple as:

nltk.pos_tag(nltk.word_tokenize("Hello World"))

edit: There is a duplicate question, however I think the answers obtained here by manan and alvas are a better fix.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

EDITED

This issue has been resolved from NLTK v3.2.1. Upgrading your NLTK version would resolve the issue, e.g. pip install -U nltk.


I faced the same issue and the error encountered was as follows;

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:Python27libsite-packages
ltk-3.2-py2.7.egg
ltkag\__init__.py", line 110, in pos_tag
tagger = PerceptronTagger()
  File "C:Python27libsite-packages
ltk-3.2-py2.7.egg
ltkagperceptron.py", line 141, in __init__
self.load(AP_MODEL_LOC)
  File "C:Python27libsite-packages
ltk-3.2-py2.7.egg
ltkagperceptron.py", line 209, in load
self.model.weights, self.tagdict, self.classes = load(loc)
  File "C:Python27libsite-packages
ltk-3.2-py2.7.egg
ltkdata.py", line 801, in load
opened_resource = _open(resource_url)
  File "C:Python27libsite-packages
ltk-3.2-py2.7.egg
ltkdata.py", line 924, in _open
return urlopen(resource_url)
  File "C:Python27liburllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
  File "C:Python27liburllib2.py", line 391, in open
response = self._open(req, data)
  File "C:Python27liburllib2.py", line 414, in _open
'unknown_open', req)
  File "C:Python27liburllib2.py", line 369, in _call_chain
result = func(*args)
  File "C:Python27liburllib2.py", line 1206, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: c>

The URLError that you mentioned was due to a bug in the perceptron.py file within the NLTK library for Windows. In my machine, the file is at this location

C:Python27Libsite-packages
ltk-3.2-py2.7.egg
ltkagperceptron.py

(Basically look at an equivalent location within yours wherever you have the Python27 folder)

The bug was basically in the code to find the corresponding location for the averaged_perceptron_tagger within your machine. One can have a look at the line 801 and 924 mentioned in the data.py file regarding this.

I think the NLTK developer community recently fixed this bug in the code. Have a look at this commit made to their code a few days back.

https://github.com/nltk/nltk/commit/d3de14e58215beebdccc7b76c044109f6197d1d9#diff-26b258372e0d13c2543de8dbb1841252

The snippet where the change was made is as follows;

self.tagdict = {}
self.classes = set()
    if load:
        AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
          self.load(AP_MODEL_LOC)
        # Initially it was:AP_MODEL_LOC = str(find('taggers/averaged_perceptron_tagger/'+PICKLE)) 

def tag(self, tokens):

Updating the file to the most recent commit worked for me and was able to use the nltk.pos_tag command. I believe this would resolve your problem as well (assuming you have everything else set up).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...