I'm trying to flatten deeply nested json files.
I have 22 json files which i want to gather in one pandas dataframe. I managed to flatten them with json_normalize to the second level, but I am not able to parse it further. Sometimes the jsons have more than 5 levels.
I want to extract the _id
, the actType
and all the text data which is located in the different levels of "children". Example of the Json file follows. Really appreciate your help!
{
"_id": "test1",
"actType": "FINDING",
"entries": [{
"text": "U Ergebnis:",
"isDocumentationNode": false,
"children": [{
"text": "U3: Standartext",
"isDocumentationNode": true,
"children": []
}, {
"text": "Brückner durchgeführt o.p.B.",
"isDocumentationNode": true,
"children": []
}, {
"text": "Normale k?rperliche und altersgerecht Entwicklung",
"isDocumentationNode": true,
"children": [{
"text": "J1/2",
"isDocumentationNode": false,
"children": [{
"text": "Schule:",
"isDocumentationNode": true,
"children": [{
"text": "Ziel Abitur",
"isDocumentationNode": true,
"children": [{
"text": "l?uft",
"isDocumentationNode": true,
"children": []
}, {
"text": "gef?hrdet",
"isDocumentationNode": true,
"children": []
}, {
"text": "l?uft",
"isDocumentationNode": true,
"children": []
}, {
"text": "gef?hrdet",
"isDocumentationNode": true,
"children": []
}
]
}
]
}
]
}
]
}
]
}
]
}
import pandas as pd
# load file
df = pd.read_json('test.json')
# display(df)
_id actType entries
0 test1 FINDING {'text': 'U Ergebnis:', 'isDocumentationNode': False, 'children': [{'text': 'U3: Standartext', 'isDocumentationNode': True, 'children': []}, {'text': 'Brückner durchgeführt o.p.B.', 'isDocumentationNode': True, 'children': []}, {'text': 'Normale k?rperliche und altersgerecht Entwicklung', 'isDocumentationNode': True, 'children': [{'text': 'J1/2', 'isDocumentationNode': False, 'children': [{'text': 'Schule:', 'isDocumentationNode': True, 'children': [{'text': 'Ziel Abitur', 'isDocumentationNode': True, 'children': [{'text': 'l?uft', 'isDocumentationNode': True, 'children': []}, {'text': 'gef?hrdet', 'isDocumentationNode': True, 'children': []}, {'text': 'l?uft', 'isDocumentationNode': True, 'children': []}, {'text': 'gef?hrdet', 'isDocumentationNode': True, 'children': []}]}]}]}]}]}
- This results in a nested
dict
in the 'entries'
column, but I need a flat, wide dataframe, with all keys as columns.
See Question&Answers more detail:
os