You can role your own function using bs4 and itertools.takewhile
h = """<html>
<div class="class1">Included Text</div>
[...]
<h1><b>text</b></h1><span>[..]</span><div>[...]</div>
[...]
<span class="class2">
[...]</span>"""
soup = BeautifulSoup(h)
def get_html_between(start_select, end_tag, cls):
start = soup.select_one(start_select)
all_next = start.find_all_next()
yield "".join(start.contents)
for t in takewhile(lambda tag: tag.get("name") != end_tag and tag.get("class") != [cls], all_next):
yield t
for ele in get_html_between("div.class1","div","class2"):
print(ele)
Output:
Included Text
<h1><b>text</b></h1>
<b>text</b>
<span>[..]
</span>
<div>[...]</div>
To make it a little more flexible, you can pass in the initial tag and a cond lambda/function, for multiple class1s just iterate and pass each on:
def get_html_between(start_tag, cond):
yield "".join(start_tag.contents)
all_next = start_tag.find_all_next()
for ele in takewhile(cond, all_next):
yield ele
cond = lambda tag: tag.get("name") != "div" and tag.get("class") != ["class2"]
soup = BeautifulSoup(h, "lxml")
for tag in soup.select("div.class1"):
for ele in get_html_between(tag, cond):
print(ele)
Using you newest edit:
In [15]: cond = lambda tag: tag.get("name") != "div" and tag.get("class") != ["class2"]
In [16]: for tag in soup.select("div.class1"):
for ele in get_html_between(tag, cond):
print(ele)
print("
")
....:
Included Text
<h1><b>text</b></h1>
<b>text</b>
<span>[..]</span>
<div>[...]</div>
Included Text
<h1><b>text</b></h1>
<b>text</b>
<span>[..]</span>
<div>[...]</div>
Included Text
<h1><b>text</b></h1>
<b>text</b>
<span>[..]</span>
<div>[...]</div>
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…