I'm trying to parse some web pages for future use. For parsing webpages, I've used different modules like urllib, lxml, BeautifulSoup, HTMLParser to reach my goal.
I didn't meet any problem while parsing web pages until I faced the hidden tags.
When I opened the page with a chrome browser and used the developer tools to see elements of page, I was able to see the <embed>
part of the code:
<embed type="..." src="..." ID="..." >
and simply can copy/paste manually.
I need to parse ID
from this hidden tag. Why can I parse this part from the site by using python? Any way to parse these hidden parts?
I know it's not possible to see some code parts like php and asp in the html source but I suppose it's not the case.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…