I'm going through a set of pages and I'm not certain how many there are, but the current page is represented by a simple number present in the url (e.g. "http://www.website.com/page/1")
I would like to use a for loop in scrapy to increment the current guess at the page and stop when it reaches a 404. I know the response that is returned from the request contains this information, but I'm not sure how to automatically get a response from a request.
Any ideas on how to do this?
Currently my code is something along the lines of :
def start_requests(self):
baseUrl = "http://website.com/page/"
currentPage = 0
stillExists = True
while(stillExists):
currentUrl = baseUrl + str(currentPage)
test = Request(currentUrl)
if test.response.status != 404: #This is what I'm not sure of
yield test
currentPage += 1
else:
stillExists = False
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…