Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
701 views
in Technique[技术] by (71.8m points)

javascript - invoking onclick event with beautifulsoup python

I am trying to fetch the links to all accomodations in Cyprus from this website: http://www.zoover.nl/cyprus

So far I can retrieve the first 15 which are already shown. So now I have to invoke the click on the "volgende"-link. However I don't know how to do that and in the source code I am not able to track down the function called to use e.g. sth like posted here: Issues with invoking "on click event" on the html page using beautiful soup in Python

I only need the step where the "clicking" happens so I can fetch the next 15 links and so on.

Does anybody know how to help? Thanks already!

EDIT:

My code looks like this now:

def getZooverLinks(country):
    zooverWeb = "http://www.zoover.nl/"
    url = zooverWeb + country
    parsedZooverWeb = parseURL(url)
    driver = webdriver.Firefox()
    driver.get(url)

    button = driver.find_element_by_class_name("next")
    links = []
    for page in xrange(1,3):
        for item in parsedZooverWeb.find_all(attrs={'class': 'blue2'}):
            for link in item.find_all('a'):
                newLink = zooverWeb + link.get('href')
                links.append(newLink)
        button.click()'

and I get the following error:

selenium.common.exceptions.StaleElementReferenceException: Message: Element is no longer attached to the DOM Stacktrace: at fxdriver.cache.getElementAt (resource://fxdriver/modules/web-element-cache.js:8956) at Utils.getElementAt (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:8546) at fxdriver.preconditions.visible (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:9585) at DelayedCommand.prototype.checkPreconditions_ (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:12257) at DelayedCommand.prototype.executeInternal_/h (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:12274) at DelayedCommand.prototype.executeInternal_ (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:12279) at DelayedCommand.prototype.execute/< (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:12221)

I'm confused :/

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

While it might be tempting to try to do this using Beautifulsoup's evaluateJavaScript method, in the end Beautifulsoup is a parser rather than an interactive web browsing client.

You should seriously consider solving this with selenium, as briefly shown in this answer. There are pretty good Python bindings available for selenium.

You could just use selenium to find the element and click it, and then pass the page on to Beautifulsoup, and use your existing code to fetch the links.

Alternatively, you could use the Javascript that's listed in the onclick handler. I pulled this from the source: EntityQuery('Ns=pPopularityScore%7c1&No=30&props=15292&dims=530&As=&N=0+3+10500915');. The No parameter increments with 15 for each page, but the props has me guessing. I'd recommend not getting into this, though, and just interact with the website as a client would, using selenium. That's much more robust to changes on their side, as well.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...