I am trying to scrape data from the URLs below. But selenium fails when driver.get(url)
Some times the error is [Errno 104] Connection reset by peer
, sometimes [Errno 111] Connection refused
. On rare days it works just fine and on my mac with real browser the same spider works fine every single time. So this isn't related to my spider
.
Have tried many solutions like waiting got selectors on page, implicit wait, using selenium-requests yo pass proper request headers, etc. But nothing seems to work.
http://www.snapdeal.com/offers/deal-of-the-day
https://paytm.com/shop/g/paytm-home/exclusive-discount-deals
I am using python
, selenium
& headless Firefox webdriver
to achieve this. The os is centos 6.5
.
Note: I have many AJAX
heavy pages that gets scraped successfully some are below.
http://www.infibeam.com/deal-of-the-day.html, http://www.amazon.in/gp/goldbox/ref=nav_topnav_deals
Already spent many days trying to debug the issue with no luck. Any help would be appreciated.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…