Capybara + Selenium + Web scraping - parallel requests fail

Question

Welcome To Ask or Share your Answers For Others

Capybara + Selenium + Web scraping - parallel requests fail

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Capybara + Selenium + Web scraping - parallel requests fail

I use Capybara + ChromeDriver Selenium to scrape webpage. During request the main goal is to download csv file. Request takes about 15-20 seconds.

When I run 2 requests at the same time - it works well 3,4,5 etc parallel requests fail - looks like file is not being downloaded.

What's wrong here? Here's my config.

Thanks!

require 'csv'
require 'capybara'
require 'capybara/dsl'

class Scraper
  include Capybara::DSL
  Capybara.default_driver = :selenium
  Capybara.register_driver :selenium do |app|
    options = Selenium::WebDriver::Chrome::Options.new

    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-gpu')
    options.add_argument('--disable-popup-blocking')
    options.add_argument('--window-size=1920,1268')

    options.add_preference(:download, directory_upgrade: true,
                                  prompt_for_download: false,
                                  default_directory: DownloadHelpers.getpath)

    options.add_preference(:browser, set_download_behavior: { behavior: 'allow' })

    driver = Capybara::Selenium::Driver.new(app, browser: :chrome, options: options)

    bridge = driver.browser.send(:bridge)

    path = '/session/:session_id/chromium/send_command'
    path[':session_id'] = bridge.session_id

    bridge.http.call(:post, path,
      cmd: 'Page.setDownloadBehavior',
      params: {
        behavior: 'allow',
        downloadPath: DownloadHelpers.getpath
      }
    )

    driver
  end

  Capybara.default_driver = :selenium
  Capybara.javascript_driver = :selenium
end

UPDATE

How I run tasks - through rake task.

Scraper located inside rails lib folder. Each request is called via rake task, that initializing rails environment and run scraper script.

question from:https://stackoverflow.com/questions/65832416/capybara-selenium-web-scraping-parallel-requests-fail

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:36:45+0000

Since you're running each task in a separate process there should be no conflict between them, which would then lead to the assumption that you're running into a resource (memory or cpu) limitation when trying to open multiple rails apps and browser instances. These could be causing the browser to fail to open, or they could just be slowing things down enough that you see flaky/failing behavior due to your scripts not being written to handle that cleanly.

Categories

Capybara + Selenium + Web scraping - parallel requests fail

Capybara + Selenium + Web scraping - parallel requests fail

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags