I have the following code
import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile
lmtzr = nltk.stem.wordnet.WordNetLemmatizer()
def sanitize(wordList):
answer = [word.translate(None, string.punctuation) for word in wordList]
answer = [lmtzr.lemmatize(word.lower()) for word in answer]
return answer
words = []
for filename in json_list:
words.extend([sanitize(nltk.word_tokenize(' '.join([tweet['text']
for tweet in json.load(open(filename,READ))])))])
I've tested lines 2-4 in a separate testing.py file when I wrote
import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile
wordList= [''the', 'the', '"the']
print wordList
wordList2 = [word.translate(None, string.punctuation) for word in wordList]
print wordList2
answer = [lmtzr.lemmatize(word.lower()) for word in wordList2]
print answer
freq = nltk.FreqDist(wordList2)
print freq
and the command prompt returns ['the','the','the'], which is what I wanted (removing punctuation).
However, when I put the exact same code in a different file, python returns a TypeError stating that
File "foo.py", line 8, in <module>
for tweet in json.load(open(filename, READ))])))])
File "foo.py", line 2, in sanitize
answer = [word.translate(None, string.punctuation) for word in wordList]
TypeError: translate() takes exactly one argument (2 given)
json_list is a list of all the file paths (I printed and check that this list is valid). I'm confused on this TypeError because everything works perfectly fine when I'm just testing it in a different file.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…