Parsing an arbitrarily nested structure can be done with pyparsing by defining a placeholder to hold the nested part, using the Forward class. In this case, you are just parsing simple name-value pairs, where then value could itself be a nested structure containing name-value pairs.
name :: word of alphanumeric characters
entry :: name '=' value
struct :: '{' entry* '}'
value :: real | integer | quotedstring | struct
This translates to pyparsing almost verbatim. To define value, which can recursively contain values, we first create a Forward() placeholder, which can be used as part of the definition of entry. Then once we have defined all the possible types of values, we use the '<<' operator to insert this definition into the value expression:
EQ,LBRACE,RBRACE = map(Suppress,"={}")
name = Word(alphas, alphanums+"_")
value = Forward()
entry = Group(name + EQ + value)
real = Regex(r"[+-]?d+.d*").setParseAction(lambda x: float(x[0]))
integer = Regex(r"[+-]?d+").setParseAction(lambda x: int(x[0]))
quotedString.setParseAction(removeQuotes)
struct = Group(LBRACE + ZeroOrMore(entry) + RBRACE)
value << (quotedString | struct | real | integer)
The parse actions on real and integer will convert these elements from strings to float or ints at parse time, so that the values can be used as their actual types immediately after parsing (no need to post-process to do string-to-other-type conversion).
Your sample is a collection of one or more entries, so we use that to parse the total input:
result = OneOrMore(entry).parseString(sample)
We can access the parsed data as a nested list, but it is not so pretty to display. This code uses pprint to pretty-print a formatted nested list:
from pprint import pprint
pprint(result.asList())
Giving:
[['company', 'My Company'],
['phone', '555-5555'],
['people',
[['person',
[['name', 'Bob'],
['location', 'Seattle'],
['settings', [['size', 1], ['color', 'red']]]]],
['person',
[['name', 'Joe'],
['location', 'Seattle'],
['settings', [['size', 2], ['color', 'blue']]]]]]]]
Notice that all the strings are just strings with no enclosing quotation marks, and the ints are actual ints.
We can do just a little better than this, by recognizing that the entry format actually defines a name-value pair suitable for accessing like a Python dict. Our parser can do this with just a few minor changes:
Change the struct definition to:
struct = Dict(LBRACE + ZeroOrMore(entry) + RBRACE)
and the overall parser to:
result = Dict(OneOrMore(entry)).parseString(sample)
The Dict class treats the parsed contents as a name followed by a value, which can be done recursively. With these changes, we can now access the data in result like elements in a dict:
print result['phone']
or like attributes in an object:
print result.company
Use the dump() method to view the contents of a structure or substructure:
for person in result.people:
print person.dump()
print
prints:
['person', ['name', 'Bob'], ['location', 'Seattle'], ['settings', ['size', 1], ['color', 'red']]]
- location: Seattle
- name: Bob
- settings: [['size', 1], ['color', 'red']]
- color: red
- size: 1
['person', ['name', 'Joe'], ['location', 'Seattle'], ['settings', ['size', 2], ['color', 'blue']]]
- location: Seattle
- name: Joe
- settings: [['size', 2], ['color', 'blue']]
- color: blue
- size: 2