Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
905 views
in Technique[技术] by (71.8m points)

python - Unknown url type error in urllib2

I have searched a lot of similar question on SO, but did not find an exact match to my case.

I am trying to download a video using python 2.7

Here is my code for downloading the video

import urllib2
from bs4 import BeautifulSoup as bs


with open('video.txt','r') as f:
    last_downloaded_video = f.read()

webpage = urllib2.urlopen('http://*.net/watch/**-'+last_downloaded_video)

soup = bs(webpage)
a = []
for link in soup.find_all('a'):
    if link.has_attr('data-video-id'):
        a.append(link)

#try just with first data-video-id

id = a[0]['data-video-id']
webpage2 = urllib2.urlopen('http://*/video/play/'+id)
soup = bs(webpage2)
string = str(soup.find_all('script')[2])
print string
url = string.split(': ')[1].split(',')[0]
url = url.replace('"','')
print url
print type(url)

video = urllib2.urlopen(url).read()
filename = "video.mp4"
with open(filename,'wb') as f:
    f.write(video)

This code gives an unknown url type error. The traceback is

Traceback (most recent call last):
  File "naruto.py", line 26, in <module>
    video = urllib2.urlopen(url).read()
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 427, in _open
    'unknown_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1247, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: 'http>

However, when i store the same url in a variable and attempt to download it from terminal, no error is shown. I am confused as to what the problem is. I got a similar question in python mailing list

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's hard to tell without seeing the HTML from the page that you are scraping, however, a stray ' (single quote) character at the beginning of the URL might be the cause - this causes the same exception:

>>> import urllib2
>>> urllib2.urlopen("'http://blah.com")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "urllib2.py", line 404, in open
    response = self._open(req, data)
  File "urllib2.py", line 427, in _open
    'unknown_open', req)
  File "urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "urllib2.py", line 1249, in unknown_open
    raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: 'http>

So, try cleaning up your URL and remove any stray quotes.

Update after OP feedback:

The results of the print statement indicate that the URL has a single quote character at the beginning and end of the URL string. There should not any quotes of any type surrounding the URL when it is passed to urlopen(). You can remove leading and trailing quotes (both single and double) from the URL string with this:

url = url.strip(''"')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...