If there are no nested brackets, you can just do this:
re.findall(r'(.*?)[.*?]', example_str)
However, you don't even really need a regex here. Just split on brackets:
(s.split(']')[-1] for s in example_str.split('['))
The only reason your attempt didn't work:
re.findall(r"(.*?)[.*]+", example_str)
… is that you were doing a non-greedy match within the brackets, which means it was capturing everything from the first open bracket to the last close bracket, instead of capturing just the first pair of brackets.
Also, the +
on the end seems wrong. If you had 'abc [def][ghi] jkl[mno]'
, would you want to get back ['abc ', '', ' jkl']
, or ['abc ', ' jkl']
? If the former, don't add the +
. If it's the latter, do—but then you need to put the whole bracketed pattern in a non-capturing group: r'(.*?)(?:[.*?])+
.
If there might be additional text after the last bracket, the split
method will work fine, or you could use re.split
instead of re.findall
… but if you want to adjust your original regex to work with that, you can.
In English, what you want is any (non-greedy) substring before a bracket-enclosed substring or the end of the string, right?
So, you need an alternation between [.*?]
and $
. Of course you need to group that in order to write the alternation, and you don't want to capture the group. So:
re.findall(r"(.*?)(?:[.*?]|$)", example_str)