Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
230 views
in Technique[技术] by (71.8m points)

Capybara + Selenium + Web scraping - parallel requests fail

I use Capybara + ChromeDriver Selenium to scrape webpage. During request the main goal is to download csv file. Request takes about 15-20 seconds.

When I run 2 requests at the same time - it works well 3,4,5 etc parallel requests fail - looks like file is not being downloaded.

What's wrong here? Here's my config.

Thanks!

require 'csv'
require 'capybara'
require 'capybara/dsl'

class Scraper
  include Capybara::DSL
  Capybara.default_driver = :selenium
  Capybara.register_driver :selenium do |app|
    options = Selenium::WebDriver::Chrome::Options.new

    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-gpu')
    options.add_argument('--disable-popup-blocking')
    options.add_argument('--window-size=1920,1268')

    options.add_preference(:download, directory_upgrade: true,
                                  prompt_for_download: false,
                                  default_directory: DownloadHelpers.getpath)

    options.add_preference(:browser, set_download_behavior: { behavior: 'allow' })

    driver = Capybara::Selenium::Driver.new(app, browser: :chrome, options: options)

    bridge = driver.browser.send(:bridge)

    path = '/session/:session_id/chromium/send_command'
    path[':session_id'] = bridge.session_id

    bridge.http.call(:post, path,
      cmd: 'Page.setDownloadBehavior',
      params: {
        behavior: 'allow',
        downloadPath: DownloadHelpers.getpath
      }
    )

    driver
  end

  Capybara.default_driver = :selenium
  Capybara.javascript_driver = :selenium
end

UPDATE

How I run tasks - through rake task.

Scraper located inside rails lib folder. Each request is called via rake task, that initializing rails environment and run scraper script.

question from:https://stackoverflow.com/questions/65832416/capybara-selenium-web-scraping-parallel-requests-fail

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Since you're running each task in a separate process there should be no conflict between them, which would then lead to the assumption that you're running into a resource (memory or cpu) limitation when trying to open multiple rails apps and browser instances. These could be causing the browser to fail to open, or they could just be slowing things down enough that you see flaky/failing behavior due to your scripts not being written to handle that cleanly.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...