I am new to python/pandas and I am having some issues with converting a nested JSON to pandas dataframe. I am sending a query to a database and getting a JSON string back.
It's a deeply nested JSON string that contains several arrays. The response from the database contains thousands of rows. Here is the general structure of one row in the JSON string:
{
"ID": "123456",
"profile": {
"criteria": [
{
"type": "type1",
"name": "name1",
"value": "7",
"properties": []
},
{
"type": "type2",
"name": "name2",
"value": "6",
"properties": [
{
"type": "MAX",
"name": "",
"value": "100"
},
{
"type": "MIN",
"name": "",
"value": "5"
}
]
},
{
"type": "type3",
"name": "name3",
"value": "5",
"properties": []
}
]
}
}
{
"ID": "456789",
"profile": {
"criteria": [
{
"type": "type4",
"name": "name4",
"value": "6",
"properties": []
}
]
}
}
I want to flatten this JSON string using python pandas. I had problems using json_normalize since this is a deeply nested JSON string:
from cassandra.cluster import Cluster
import pandas as pd
from pandas.io.json import json_normalize
def pandas_factory(colnames, rows):
return pd.DataFrame(rows, columns=colnames)
cluster = Cluster(['xxx.xx.x.xx'], port=yyyy)
session = cluster.connect('nnnn')
session.row_factory = pandas_factory
json_string = session.execute('select json ......')
df = json_string ._current_rows
df_normalized= json_normalize(df)
print(df_normalized)
When i run this code, i get a Key error:
KeyError: 0
I need help converting this JSON string to a dataframe with only some selected columns that looks something like this: (The rest of the data can be skipped)
ID | criteria | type | name | value |
123456 1 type1 name1 7
123456 2 type2 name2 6
123456 3 type3 name3 5
456789 1 type4 name4 6
I tried to find similar problems on here but I can't seem to apply it to my JSON string.
Any help is appreciated! :)
EDIT:
The json string that is retured is a query response object: ResultSet . I think thats why I'm having some issues with using:
json_string= session.execute('select json profile from visning')
temp = json.loads(json_string)
and getting the error:
TypeError: the JSON object must be str, not 'ResultSet'
EDIT #2:
Just to see what I'm working with, I printed the the result query by using:
for line in session.execute('select json.....'):
print(line)
and got something like this:
Row(json='{"ID": null, "profile": null}')
Row(json='{"ID": "123", "profile": {"criteria": [{"type": "type1", "name": "name1", "value": "10", "properties": []}, {"type": "type2", "name": "name2", "value": "50", "properties": []}, {"type": "type3", "name": "name3", "value": "40", "properties": []}]}}')
Row(json='{"ID": "456", "profile": {"criteria": []}}')
Row(json='{"ID": "789", "profile": {"criteria": [{"type": "type4", "name": "name4", "value": "5", "properties": []}]}}')
Row(json='{"ID": "987", "profile": {"criteria": [{"type": "type5", "name": "name5", "value": "70", "properties": []}, {"type": "type6", "name": "name6", "value": "60", "properties": []}, {"type": "type7", "name": "name7", "value": "2", "properties": []}, {"type": "type8", "name": "name8", "value": "7", "properties": []}]}}')
The issue I'm having is converting this structure to a json string that can be used in json.loads():
json_string= session.execute('select json profile from visning')
json_list = list(json_string)
string= ''.join(list(map(str, json_list)))
temp = json.loads(string) <-- creates error json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
EDIT #3:
As requested below in the comments, printing
for line in session.execute('select json.....'):
print((line.json))
gets the output:
{"ID": null, "profile": null}
{"ID": "123", "profile": {"criteria": [{"type": "type1", "name": "name1", "value": "10", "properties": []}, {"type": "type2", "name": "name2", "value": "50", "properties": []}, {"type": "type3", "name": "name3", "value": "40", "properties": []}]}}
{"ID": "456", "profile": {"criteria": []}}
{"ID": "789", "profile": {"criteria": [{"type": "type4", "name": "name4", "value": "5", "properties": []}]}}
{"ID": "987", "profile": {"criteria": [{"type": "type5", "name": "name5", "value": "70", "properties": []}, {"type": "type6", "name": "name6", "value": "60", "properties": []}, {"type": "type7", "name": "name7", "value": "2", "properties": []}, {"type": "type8", "name": "name8", "value": "7", "properties": []}]}}
See Question&Answers more detail:
os