I actually dealt with this myself, in the hackiest way possible: by post-processing the result.
r = re.compile(r'^(s*)', re.MULTILINE)
def prettify_2space(s, encoding=None, formatter="minimal"):
return r.sub(r'11', s.prettify(encoding, formatter))
Actually, I monkeypatched prettify_2space
in place of prettify
in the class. That's not essential to the solution, but let's do it anyway, and make the indent width a parameter instead of hardcoding it to 2:
orig_prettify = bs4.BeautifulSoup.prettify
r = re.compile(r'^(s*)', re.MULTILINE)
def prettify(self, encoding=None, formatter="minimal", indent_width=4):
return r.sub(r'1' * indent_width, orig_prettify(self, encoding, formatter))
bs4.BeautifulSoup.prettify = prettify
So:
x = '''<section><article><h1></h1><p></p></article></section>'''
soup = bs4.BeautifulSoup(x)
print(soup.prettify(indent_width=3))
… gives:
<html>
<body>
<section>
<article>
<h1>
</h1>
<p>
</p>
</article>
</section>
</body>
</html>
Obviously if you want to patch Tag.prettify
as well as BeautifulSoup.prettify
, you have to do the same thing there. (You might want to create a generic wrapper that you can apply to both, instead of repeating yourself.) And if there are any other prettify
methods, same deal.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…