javascript - Parsing webpages to extract contents

Question

Welcome To Ask or Share your Answers For Others

javascript - Parsing webpages to extract contents

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:26:35+0000

Suggested readings

Static pages:

java.net.URLConnection and java.net.HttpURLConnection
jsoup - HTML parser and content manipulation library

Mind you, many of the pages will create content dynamically using JavaScript after loading. For such a case, the 'static page' approach won't help, you will need to search for tools in the "Web automation" category.
Selenium is such a toolset. You can command you browser to open and navigate pages using a common browser, you may even be able to use a 'headless browser' (no UI) using the phantomjs.

Good luck, there's lots of reading and coding ahead of you.

[edited for examples]

This technique is called Web scraping - use it with google for examples. The following are offered as an example of results in my searches, I offer no warranties or endorsements for them

For "static Webpage scrapping" - here's an example using jsoup

For "dynamic pages" - here's an example using Selenium

Categories

javascript - Parsing webpages to extract contents

javascript - Parsing webpages to extract contents

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags