Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
538 views
in Technique[技术] by (71.8m points)

linux - get a browser rendered html+javascript

I need a comandline tool (or Javascript/PHP, but i think commandline is the one way) for render and get the rendered content of URL, but the important its I need to renderer the Javascript not only the CSS/Html/images.

For example command like: "renderengine http://www.google.es outputfile.html" and the content of the web (parsed html and javascript executed) isa saved in outputfile.html.

I need this because i need to take the result of a full javascript website like grooveshark, the site load all using javascript/ajax and the crawlers dont find nothing, only basic HTML empty template (because is loaded after using ajax/javscript)

Exists any browser engine for linux with support to Javascript (for example V8) that output the result for save in files?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
  • Selenium : very complete solution with bindings in many languages
  • puppeteer : headless Chrome API, usable in NodeJS or as a command-line tool
  • HTtrack : command-line tool
  • Apache Notch & webmagic : open source Java web crawlers
  • pholcus : "distributed & high concurrency" web crawler written in Go
  • Xvfb a display server implementing the X11 display server protocol, without showing any screen output. I have used it successfully with Travis CI and Protractor as an example. Alternative: XDummy
  • PhantomJS (first suggested by nvuono) : can export the rendered page as non-HTML (pdf, png...). PhantomJS development is suspended until further notice (more details). Closely related: SlimerJS, CasperJS

And there are many Python web scraping libraries:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...