If you don't mind installing a new python library, I suggest you use gensim.
The first tutorial does exactly what you ask:
# remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in documents]
You will then need to create the dictionary for your corpus of document and create the bag-of-words.
dictionary = corpora.Dictionary(texts)
dictionary.save('/tmp/deerwester.dict') # store the dictionary, for future
print(dictionary)
You can weight the result using tf-idf and stuff and do LDA quite easily after.
Have a look at the tutorial 1 here
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…