According to this answer: urllib2 read to Unicode
I have to get the content-type in order to change to Unicode. However, some websites don't have a "charset".
For example, the ['content-type'] for this page is "text/html". I can't convert it to Unicode.
encoding=urlResponse.headers['content-type'].split('charset=')[-1]
htmlSource = unicode(htmlSource, encoding)
TypeError: 'int' object is not callable
Is there a default "encoding" (English, of course)...so that if nothing is found, I can just use that?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…