python - How can I keep a PyQT5 stream open to catch dojo/domReady! JS execution?

Question

Welcome To Ask or Share your Answers For Others

python - How can I keep a PyQT5 stream open to catch dojo/domReady! JS execution?

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How can I keep a PyQT5 stream open to catch dojo/domReady! JS execution?

I am using example code below to scrape a website. The problem is that the website has code behind "dojo/domReady!" attributes so the code referenced below will complete and scrape the HTML before the remaining site content has been adjusted/finalized.

Can anybody help me adjust the below code to enable it to "wait 10 seconds after page connection" before grabbing the HTML as the page exists? I am trying to wait an arbitrary amount of time to allow for any or all of the content to render further past the initial page load.

Example:

import bs4 as bs
import sys
import urllib3.request
from PyQt5.QtWebEngineWidgets import QWebEnginePage
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
import time

class Page(QWebEnginePage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebEnginePage.__init__(self)
        self.html = ''
        self.loadFinished.connect(self._on_load_finished)
        self.load(QUrl(url))
        self.app.exec_()

    def _on_load_finished(self):

        self.html = self.toHtml(self.Callable)
        print('Load finished')

    def Callable(self, html_str):
        self.html = html_str
        self.app.quit()


def main():
    page = Page('some_website')
    soup = bs.BeautifulSoup(page.html, 'html.parser')
    print(soup)

main()

question from:https://stackoverflow.com/questions/65713943/how-can-i-keep-a-pyqt5-stream-open-to-catch-dojo-domready-js-execution

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - How can I keep a PyQT5 stream open to catch dojo/domReady! JS execution?

python - How can I keep a PyQT5 stream open to catch dojo/domReady! JS execution?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags