Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
404 views
in Technique[技术] by (71.8m points)

python - How to get html with javascript rendered sourcecode by using selenium

I run a query in one web page, then I get result url. If I right click see html source, I can see the html code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solution using selenium. Here's my code:

from selenium import webdriver
url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'
driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')
driver.get(url)
print driver.page_source

>>> <html><head></head><body></body></html>         Obviously It's not right!!

Here's the source code I need in right click windows, (I want the INFORMATION part)

</script></div><div class="searchColRight"><div id="topActions" class="clearfix 
noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"
href="Default.aspx?    _act=VitalSearchR ...... <<INFORMATION I NEED>> ... 
to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">

        jQuery(document).ready(function() {
            jQuery(".ancestry-information-tooltip").actooltip({
href: "#AncestryInformationTooltip", orientation: "bottomleft"});
        });

So my question is: How to get the information generated by JS?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You will need to get get the document via javascript you can use seleniums execute_script function

from time import sleep # this should go at the top of the file

sleep(5)
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
print html

That will get everything inside of the <html> tag


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...