Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
488 views
in Technique[技术] by (71.8m points)

python - httplib.BadStatusLine: ''

As always, I frequently have issues, and I have thoroughly searched for an answer to the current one but find myself at a loss. Here are some of the places I have searched: - How to fix httplib.BadStatusLine exception? - Python httplib2 Handling Exceptions - python http status code

My issue is the following. I have created a spider and want to crawl different urls. When I crawl each url independently everything works fine. However, when I try to crawl both I get the following error: httplib.BadStatusLine: ''

I have followed some advice that I read (see links mentioned above) and can print the response.status for each request works, but the response.url does not print and the error is thrown. (I only print both statements to try to identify the source of the error).

I hope that this is clear.

I am using scrapy and selenium

class PeoplePage(Spider):
    name = "peopleProfile"
    allowed_domains = ["blah.com"]
    handle_httpstatus_list = [200, 404]
    start_urls = [
        "url1",
        "url2"
    ]

    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        print response.status
        print '???????????????????????????????????'
        if response.status == 200:
            self.driver.implicitly_wait(5)
            self.driver.get(response.url)
            print response.url
            print '!!!!!!!!!!!!!!!!!!!!'

            # DO STUFF

        self.driver.close()
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Based on Python Doc, httplib.BadStatusLine raised if a server responds with a HTTP status code that we don’t understand. You can try to pass this exception. You should not close your driver if you are going to call more than one url.

Try this:

def parse(self, response):
    try:
        print response.status
        print '???????????????????????????????????'
        if response.status == 200:
            self.driver.implicitly_wait(5)
            self.driver.get(response.url)
            print response.url
            print '!!!!!!!!!!!!!!!!!!!!'

            # DO STUFF
    except httplib.BadStatusLine:
        pass

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...