How can I remove all HTML from a string in Python? For example, how can I turn:
blah blah <a href="blah">link</a>
into
blah blah link
Thanks!
When your regular expression solution hits a wall, try this super easy (and reliable) BeautifulSoup program.
from BeautifulSoup import BeautifulSoup html = "<a> Keep me </a>" soup = BeautifulSoup(html) text_parts = soup.findAll(text=True) text = ''.join(text_parts)
1.4m articles
1.4m replys
5 comments
57.0k users