I'm working with the CrawlSpider class to crawl a website and I would like to modify the headers that are sent in each request. Specifically, I would like to add the referer to the request.
As per this question, I checked
response.request.headers.get('Referer', None)
in my response parsing function and the Referer
header is not present. I assume that means the Referer is not being submitted in the request (unless the website doesn't return it, I'm not sure on that).
I haven't been able to figure out how to modify the headers of a request. Again, my spider is derived from CrawlSpider. Overriding CrawlSpider's _requests_to_follow
or specifying a process_request
callback for a rule will not work because the referer is not in scope at those times.
Does anyone know how to modify request headers dynamically?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…