I'm trying to parse a string with multiple lines.
Suppose it is:
text = '''
Section1
stuff belonging to section1
stuff belonging to section1
stuff belonging to section1
Section2
stuff belonging to section2
stuff belonging to section2
stuff belonging to section2
'''
I want to use the finditer method of the re module to get a dictionary like:
{'section': 'Section1', 'section_data': 'stuff belonging to section1
stuff belonging to section1
stuff belonging to section1
'}
{'section': 'Section2', 'section_data': 'stuff belonging to section2
stuff belonging to section2
stuff belonging to section2
'}
I tried the following:
import re
re_sections=re.compile(r"(?P<section>Sectiond)s*(?P<section_data>.+)", re.DOTALL)
sections_it = re_sections.finditer(text)
for m in sections_it:
print m.groupdict()
But this results in:
{'section': 'Section1', 'section_data': 'stuff belonging to section1
stuff belonging to section1
stuff belonging to section1
Section2
stuff belonging to section2
stuff belonging to section2
stuff belonging to section2
'}
So the section_data also matches Section2.
I also tried to tell the second group to match all but the first one. But this leads to no output at all.
re_sections=re.compile(r"(?P<section>Sectiond)s+(?P<section_data>^(?P=section))", re.DOTALL)
I know I could use the following re, but I'm looking for a version, where I do not have to tell what the second group looks like.
re_sections=re.compile(r"(?P<section>Sectiond)s+(?P<section_data>[a-z12s]+)", re.DOTALL)
Thank you very much!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…