Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
415 views
in Technique[技术] by (71.8m points)

php - Non-browser emulation of JavaScript - is it possible?

I have a new project I am working on that involves fetching a webpage, (using PHP and cURL) parsing the HTML and javascript out of it and then handling the data in the results.

Basically I hit a brick wall when the site uses javascript to fetch its data by AJAX. In this case, the initial data will not appear in the fetched page unless the javascript is run in a browser.

Are there any PHP libraries for this? (I suspect not, but I could be wrong.)

I would really rather build this as a server-based solution, otherwise I am forced to have to build an application for this and using mozilla and/or IE runtime libraries - which kind of defeats the purpose.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You will need:

  • one JavaScript interpreter
  • one DOM Level 2 Core and HTML implementation
  • 500g of non-standard but commonly-used DOM extensions
  • a pinch of DOM Level 2 Style (which might mean also a CSS interpreter and layout engine)
  • yoghurt pots, round-ended scissors and sticky-back plastic

Once you have assembled your components (remember to get a grown-up to help you with the sandboxing), you'll find what you have is essentially indistinguishable from a web browser.

JAVA is not part of the shell build on the server. V8/SquirrelFish is C++ code I would need to convert to PHP.

Porting a JS engine to PHP would be a huge task, and the resulting performance likely horrible. You can't even really get away with a nearly-solution on JavaScript any more, since so many pages are using hideously complex libraries like jQuery to do everything, which will require in-depth JS support.

I don't think you're going to be able to do this purely in PHP. You'll have to hook up Java/Rhino/HTMLUnit or a proper web browser like Mozilla. If your hosting environment doesn't give you the flexibility you need to compile and deploy that sort of thing, you'd have to move to a better hosting setup with a shell (preferably VPS).

If you can avoid this unpleasantness some other way, by special-casing known pages' AJAX access, do that.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...