I use Capybara + ChromeDriver Selenium to scrape webpage.
During request the main goal is to download csv file.
Request takes about 15-20 seconds.
When I run 2 requests at the same time - it works well
3,4,5 etc parallel requests fail - looks like file is not being downloaded.
What's wrong here?
Here's my config.
Thanks!
require 'csv'
require 'capybara'
require 'capybara/dsl'
class Scraper
include Capybara::DSL
Capybara.default_driver = :selenium
Capybara.register_driver :selenium do |app|
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument('--disable-popup-blocking')
options.add_argument('--window-size=1920,1268')
options.add_preference(:download, directory_upgrade: true,
prompt_for_download: false,
default_directory: DownloadHelpers.getpath)
options.add_preference(:browser, set_download_behavior: { behavior: 'allow' })
driver = Capybara::Selenium::Driver.new(app, browser: :chrome, options: options)
bridge = driver.browser.send(:bridge)
path = '/session/:session_id/chromium/send_command'
path[':session_id'] = bridge.session_id
bridge.http.call(:post, path,
cmd: 'Page.setDownloadBehavior',
params: {
behavior: 'allow',
downloadPath: DownloadHelpers.getpath
}
)
driver
end
Capybara.default_driver = :selenium
Capybara.javascript_driver = :selenium
end
UPDATE
How I run tasks - through rake task.
Scraper located inside rails lib folder.
Each request is called via rake task, that initializing rails environment and run scraper script.
question from:
https://stackoverflow.com/questions/65832416/capybara-selenium-web-scraping-parallel-requests-fail 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…