So how do I scrape a website which has dynamic content?
there are a few options:
- Use Selenium, which allows you to simulate opening a browser, letting the page render, then pull the html source code
- Sometimes you can look at the XHR and see if you can fetch the data directly (like from an API)
- Sometimes the data is within the
<script>
tags of the html source. You could search through those and use json.loads()
once you manipulate the text into a json format
what exactly is the difference between dynamic and static content?
Dynamic means the data is generated from a request after the initial page request. Static means all the data is there at the original call to the site
How do I extract other information like price and image from the website? and how to get particular classes for example like a price?
Refer to your first question
how would I know that data is dynamically created?
You'll know it's dynamically created if you see it in the dev tools page source, but not in the html page source you first request. You can also see if the data is generated by additional requests in the dev tool and looking at Network -> XHR
Lastly
Amazon does offer an API to access the data. Try looking into that as well
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…