I have read a lot of answers regarding web scraping that talk about BeautifulSoup, Scrapy e.t.c. to perform web scraping.
Is there a way to do the equivalent of saving a page's source from a web brower?
That is, is there a way in Python to point it at a website and get it to save the page's source to a text file with just the standard Python modules?
Here is where I got to:
import urllib
f = open('webpage.txt', 'w')
html = urllib.urlopen("http://www.somewebpage.com")
#somehow save the web page source
f.close()
Not much I know - but looking for code to actually pull the source of the page so I can write it. I gather that urlopen just makes a connection.
Perhaps there is a readlines() equivalent for reading lines of a web page?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…