Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
152 views
in Technique[技术] by (71.8m points)

python - Parsing nested JSON and writing it to CSV

I'm struggling with this problem. I have a JSON file and needs ti put it out to CSV, its fine if the structure is kind of flat with no deep nested items.

But in this case the nested RACES is messing me up.

How would I go about getting the data in a format like this:

VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME

for each object and each race in the object?

{
    "1": {
        "VENUE": "JOEBURG",
        "COUNTRY": "HAE",
        "ITW": "XAD",
        "RACES": {
            "1": {
                "NO": 1,
                "TIME": "12:35"
            },
            "2": {
                "NO": 2,
                "TIME": "13:10"
            },
            "3": {
                "NO": 3,
                "TIME": "13:40"
            },
            "4": {
                "NO": 4,
                "TIME": "14:10"
            },
            "5": {
                "NO": 5,
                "TIME": "14:55"
            },
            "6": {
                "NO": 6,
                "TIME": "15:30"
            },
            "7": {
                "NO": 7,
                "TIME": "16:05"
            },
            "8": {
                "NO": 8,
                "TIME": "16:40"
            }
        }
    },
    "2": {
        "VENUE": "FOOBURG",
        "COUNTRY": "ABA",
        "ITW": "XAD",
        "RACES": {
            "1": {
                "NO": 1,
                "TIME": "12:35"
            },
            "2": {
                "NO": 2,
                "TIME": "13:10"
            },
            "3": {
                "NO": 3,
                "TIME": "13:40"
            },
            "4": {
                "NO": 4,
                "TIME": "14:10"
            },
            "5": {
                "NO": 5,
                "TIME": "14:55"
            },
            "6": {
                "NO": 6,
                "TIME": "15:30"
            },
            "7": {
                "NO": 7,
                "TIME": "16:05"
            },
            "8": {
                "NO": 8,
                "TIME": "16:40"
            }
        }
    }, ...
}

I would like to output this to CSV like this:

VENUE, COUNTRY, ITW, RACES__NO, RACES__TIME
JOEBERG, HAE, XAD, 1, 12:35
JOEBERG, HAE, XAD, 2, 13:10
JOEBERG, HAE, XAD, 3, 13:40
...
...
FOOBURG, ABA, XAD, 1, 12:35
FOOBURG, ABA, XAD, 2, 13:10

So first I get the correct keys:

self.keys = self.data.keys()
keys = ["DATA_KEY"]
for key in self.keys:
    if type(self.data[key]) == dict:
        for k in self.data[key].keys():
            if k not in keys:
                if type(self.data[key][k]) == unicode:
                    keys.append(k)
                elif type(self.data[key][k]) == dict:
                    self.subkey = k
                    for sk in self.data[key][k].values():
                        for subkey in sk.keys():
                            subkey = "%s__%s" % (self.subkey, subkey)
                            if subkey not in keys:
                                keys.append(subkey)

Then add the data:

But how?

This should be a fun one for you skilled forloopers. ;-)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'd collect keys only for the first object, then assume that the rest of the format is consistent.

The following code also limits the nested object to just one; you did not specify what should happen when there is more than one. Having two or more nested structures of equal length could work (you'd 'zip' those together), but if you have structures of differing length you need to make an explicit choice how to handle those; zip with empty columns to pad, or to write out the product of those entries (A x B rows, repeating information from A each time you find a B entry).

import csv
from operator import itemgetter


with open(outputfile, 'wb') as outf:
    writer = None  # will be set to a csv.DictWriter later

    for key, item in sorted(data.items(), key=itemgetter(0)):
        row = {}
        nested_name, nested_items = '', {}
        for k, v in item.items():
            if not isinstance(v, dict):
                row[k] = v
            else:
                assert not nested_items, 'Only one nested structure is supported'
                nested_name, nested_items = k, v

        if writer is None:
            # build fields for each first key of each nested item first
            fields = sorted(row)

            # sorted keys of first item in key sorted order
            nested_keys = sorted(sorted(nested_items.items(), key=itemgetter(0))[0][1])
            fields.extend('__'.join((nested_name, k)) for k in nested_keys)

            writer = csv.DictWriter(outf, fields)
            writer.writeheader()

        for nkey, nitem in sorted(nested_items.items(), key=itemgetter(0)):
            row.update(('__'.join((nested_name, k)), v) for k, v in nitem.items())
            writer.writerow(row)

For your sample input, this produces:

COUNTRY,ITW,VENUE,RACES__NO,RACES__TIME
HAE,XAD,JOEBURG,1,12:35
HAE,XAD,JOEBURG,2,13:10
HAE,XAD,JOEBURG,3,13:40
HAE,XAD,JOEBURG,4,14:10
HAE,XAD,JOEBURG,5,14:55
HAE,XAD,JOEBURG,6,15:30
HAE,XAD,JOEBURG,7,16:05
HAE,XAD,JOEBURG,8,16:40
ABA,XAD,FOOBURG,1,12:35
ABA,XAD,FOOBURG,2,13:10
ABA,XAD,FOOBURG,3,13:40
ABA,XAD,FOOBURG,4,14:10
ABA,XAD,FOOBURG,5,14:55
ABA,XAD,FOOBURG,6,15:30
ABA,XAD,FOOBURG,7,16:05
ABA,XAD,FOOBURG,8,16:40

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...