this is my code:
from whoosh.analysis import RegexAnalyzer
rex = RegexAnalyzer(re.compile(ur"([u4e00-u9fa5])|(w+(.?w+)*)"))
a=[(token.text) for token in rex(u"hi 中 000 中文测试中文 there 3.141 big-time under_score")]
self.render_template('index.html',{'a':a})
and it show this on the web page:
[u'hi', u'u4e2d', u'000', u'u4e2d', u'u6587', u'u6d4b', u'u8bd5', u'u4e2d', u'u6587', u'there', u'3.141', u'big', u'time', u'under_score']
but i want to show chinese word , so i change this:
a=[(token.text).encode('utf-8') for token in rex(u"hi 中 000 中文测试中文 there 3.141 big-time under_score")]
and it show :
['hi', 'xe4xb8xad', '000', 'xe4xb8xad', 'xe6x96x87', 'xe6xb5x8b', 'xe8xafx95', 'xe4xb8xad', 'xe6x96x87', 'there', '3.141', 'big', 'time', 'under_score']
so how to show chinese word in my code,
thanks
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…