I'm an absolute beginner with Python, and I am very stuck at this part. I tried creating a function to preprocess my texts/data for topic modeling, and it works perfectly when I ran it as an individual code, but when it does not return anything when I ran it as a function. I would appreciate any help!
- The codes I'm using are very basic, and probably inefficient, but it's for my basic class, so really basic ways is the way to go for me!
codes:
def clean (data):
data_prep = []
for data in data:
tokenized_words = nltk.word_tokenize (data)
text_words = [token.lower() for token in tokenized_words if token.isalnum()]
text_words = [word for word in text_words if word not in stop_words]
text_joined = " ".join(textwords)
data_prep.append(text_joined)
return data_prep
the outputs are really random like "j", ",", "i". I was using a .txt file as my data, converted from a .csv file.
edit:
I've adjusted my codes from pointed mistakes and it is now
def clean (data):
data_prep = []
for row in data:
tokenized_words = nltk.word_tokenize (data)
text_words = [token.lower() for token in tokenized_words if token.isalnum()]
text_words = [word for word in text_words if word not in stop_words]
text_joined = " ".join(text_words)
data_prep.append(text_joined)
return data_prep
results: it now returns tokenized sentences and seemingly on loop.
what is my mistake this time?
see image
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…