I received a url: https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp?-75-desktop-virtualization-solutions; it is from BeautifulSoup.
url=u'https://www.packtpub.com/virtualization-and-cloud/citrix-xenappxae-75-desktop-virtualization-solutions'
I want to feed back into urllib2.urlopen again.
import urllib2
source = urllib2.urlopen(url).read()
The error I get:
UnicodeEncodeError: 'gbk' codec can't encode character u'xae' in position 43: illegal multibyte sequence
Thus, I tried:
source = urllib2.urlopen(url.encode("utf-8")).read()
It got page source, however it is different from what from the original url.
originalUrl = 'https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp?-75-desktop-virtualization-solutions'
originalSource = urllib2.urlopen(originalUrl).read()
originalSource == source
The result is False. Is there any idea to fix this url? How to convert u'xae' into original ?
?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…