Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
216 views
in Technique[技术] by (71.8m points)

python - Create nested JSON from CSV

I already read Create nested JSON from flat csv, but it didn't help in my case.

I have quite a big spreadsheet created with Google Docs consisting of 11 rows and 74 columns (some columns are not occupied).

I created an example on Google Drive. When exported as a CSV it looks like this:

id,name,email,phone,picture01,picture02,picture03,status
1,Alice,[email protected],2131232,"image01_01
[this is an image]",image01_02,image01_03,single
2,Bob,[email protected],2854839,image02_01,"image02_02
[description to image 2]",,married
3,Frank,[email protected],987987,image03_01,image03_02,,single
4,Shawn,[email protected],,image04_01,,,single

Now I would like to have a JSON structure, which looks like this:

{
    "persons": [
        {
            "type": "config.profile",
            "id": "1",
            "email": "[email protected]",
            "pictureId": "p01",
            "statusId": "s01"
        },
        {
            "type": "config.pictures",
            "id": "p01",
            "album": [
                {
                    "image": "image01_01",
                    "description": "this is an image"
                },
                {
                    "image": "image_01_02",
                    "description": ""
                },
                {
                    "image": "image_01_03",
                    "description": ""
                }
            ]
        },
        {
            "type": "config.status",
            "id": "s01",
            "status": "single"
        },
        {
            "type": "config.profile",
            "id": "2",
            "email": "[email protected]",
            "pictureId": "p02",
            "statusId": "s02"
        },
        {
            "type": "config.pictures",
            "id": "p02",
            "album": [
                {
                    "image": "image02_01",
                    "description": ""
                },
                {
                    "image": "image_02_02",
                    "description": "description to image 2"
                }
            ]
        },
        {
            "type": "config.status",
            "id": "s02",
            "status": "married"
        }
    ]
}

And so on for the other lines.

My theoretical approach would be to go through the CSV file per row (here starts the first problem: now every row is equal to one line, but sometimes several, thus I need to count the commas?). Each row is equal to a block of config.profile, including the id, email, pictureId, and statusId (the latter two are being generated depending on the row number).

Then for each row a config.pictures block is generated with the same id as the one inserted in the config.profile block. The album is an array of as many elements as pictures are given.

Lastly each row has a config.status block, which, again, has the same id as the one given in config.profile, and one entry of status with the corresponding status.

I'm entirely clueless how to create the nested and conditional JSON file.

I just got to the point where I convert the CSV to valid JSON, without any nesting and additional info, which are not directly given in the CSV, like the type, pictureId, statusId, and so on.

Any help is appreciated. If it is easier to program this in another script language (like ruby), I would gladly switch to those).

Before someone thinks this is a homework or whatnot. It is not. I just want to automate an otherwise very tiresome copy&paste task.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The csv module will handle the CSV reading nicely - including handling line breaks that are within quotes.

import csv
with open('my_csv.csv') as csv_file:
   for row in csv.reader(csv_file):
       # do work

The csv.reader object is an iterator - you can iterate through the rows in the CSV by using a for loop. Each row is a list, so you can get each field as row[0], row[1], etc. Be aware that this will load the first row (which just contains field names in your case).

As we have field names given to us in the first row, we can use csv.DictReader so that fields in each row can be accessed as row['id'], row['name'], etc. This will also skip the first row for us:

import csv
with open('my_csv.csv') as csv_file:
   for row in csv.DictReader(csv_file):
       # do work

For the JSON export, use the json module. json.dumps() will take Python data structures such as lists and dictionaries and return the appropriate JSON string:

import json
my_data = {'id': 123, 'name': 'Test User', 'emails': ['[email protected]', '[email protected]']}
my_data_json = json.dumps(my_data)

If you want to generate JSON output exactly as you posted, you'd do something like:

output = {'persons': []}
with open('my_csv.csv') as csv_file:
    for person in csv.DictReader(csv_file):
        output['persons'].append({
            'type': 'config.profile',
            'id': person['id'],
            # ...add other fields (email etc) here...
        })

        # ...do similar for config.pictures, config.status, etc...

output_json = json.dumps(output)

output_json will contain the JSON output that you want.

However, I'd suggest you carefully consider the structure of the JSON output that you're after - at the moment, you're defining an outer dictionary that serves no purpose, and you're adding all your 'config' data directly under 'persons' - you may want to reconsider this.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...