I have a naive "parser" that simply does something like:
[x.split('=') for x in mystring.split(',')]
However mystring can be something like
'foo=bar,breakfast=spam,eggs'
Obviously,
The naive splitter will just not do it. I am limited to Python 2.6 standard library for this,
So for example pyparsing can not be used.
Expected output is
[('foo', 'bar'), ('breakfast', 'spam,eggs')]
I'm trying to do this with regex, but am facing the following problems:
My First attempt
r'([a-z_]+)=(.+),?'
Gave me
[('foo', 'bar,breakfast=spam,eggs')]
Obviously,
Making .+
non-greedy does not solve the problem.
So,
I'm guessing I have to somehow make the last comma (or $
) mandatory.
Doing just that does not really work,
r'([a-z_]+)=(.+?)(?:,|$)'
As with that the stuff behind the comma in an value containing one is omitted,
e.g. [('foo', 'bar'), ('breakfast', 'spam')]
I think I must use some sort of look-behind(?) operation.
The Question(s)
1. Which one do I use? or
2. How do I do that/this?
Edit:
Based on daramarak's answer below,
I ended up doing pretty much the same thing as abarnert later suggested in a slightly more verbose form;
vals = [x.rsplit(',', 1) for x in (data.split('='))]
ret = list()
while vals:
value = vals.pop()[0]
key = vals[-1].pop()
ret.append((key, value))
if len(vals[-1]) == 0:
break
EDIT 2:
Just to satisfy my curiosity, is this actually possible with pure regular expressions? I.e so that re.findall()
would return a list of 2-tuples?
See Question&Answers more detail:
os