Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
448 views
in Technique[技术] by (71.8m points)

python - PyQt4 to PyQt5 -> mainFrame() deprecated, need fix to load web pages

I'm doing Sentdex's PyQt4 YouTube tutorial right here. I'm trying to follow along but use PyQt5 instead. It's a simple web scraping app. I followed along with Sentdex's tutorial and I got here:

enter image description here

Now I'm trying to write the same application with PyQt5 and this is what I have:

import os
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl, QEventLoop
from PyQt5.QtWebEngineWidgets import QWebEnginePage
from bs4 import BeautifulSoup
import requests


class Client(QWebEnginePage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebEnginePage.__init__(self)
        self.loadFinished.connect(self._loadFinished)
        self.load(QUrl(url))
        self.app.exec_()

    def _loadFinished(self):
        self.app.quit()


url = 'https://pythonprogramming.net/parsememcparseface/'
client_response = Client(url)

#I think the issue is here at LINE 26
source = client_response.mainFrame().toHtml()

soup = BeautifulSoup(source, "html.parser")
js_test = soup.find('p', class_='jstest')
print(js_test.text)

When I run this, I get the message:

source = client_response.mainFrame().toHtml()
AttributeError: 'Client' object has no attribute 'mainFrame'

I've tried a few different solutions but none work. Any help would be appreciated.

EDIT

Logging QUrl(url) on line 15 returns this value:

PyQt5.QtCore.QUrl('https://pythonprogramming.net/parsememcparseface/')

When I try source = client_response.load(QUrl(url)) for line 26, I end up with the message:

File "test3.py", line 28, in <module> soup = BeautifulSoup(source, "html.parser") File "/Users/MYNAME/.venv/qtproject/lib/python3.6/site-packages/bs4/__init__.py", line 192, in __init__ elif len(markup) <= 256 and ( TypeError: object of type 'NoneType' has no len()

When I try source = client_response.url() I get:

soup = BeautifulSoup(source, "html.parser")
      File "/Users/MYNAME/.venv/qtproject/lib/python3.6/site-packages/bs4/__init__.py", line 192, in __init__
        elif len(markup) <= 256 and (
    TypeError: object of type 'QUrl' has no len()
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

you must call the QWebEnginePage::toHtml() inside the definition of the class. QWebEnginePage::toHtml() takes a pointer function or a lambda as a parameter, and this pointer function must in turn take a parameter of 'str' type (this is the parameter that contains the page's html). Here is sample code below.

import bs4 as bs
import sys
import urllib.request
from PyQt5.QtWebEngineWidgets import QWebEnginePage
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl

class Page(QWebEnginePage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebEnginePage.__init__(self)
        self.html = ''
        self.loadFinished.connect(self._on_load_finished)
        self.load(QUrl(url))
        self.app.exec_()

    def _on_load_finished(self):
        self.html = self.toHtml(self.Callable)
        print('Load finished')

    def Callable(self, html_str):
        self.html = html_str
        self.app.quit()


def main():
    page = Page('https://pythonprogramming.net/parsememcparseface/')
    soup = bs.BeautifulSoup(page.html, 'html.parser')
    js_test = soup.find('p', class_='jstest')
    print js_test.text

if __name__ == '__main__': main()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...