From my understanding, the "split" param is means to allow batches to be written.
E.g.
with io.open('sample_predictions_json_split.json', 'w') as out:
for i in range(0, len(df), 3):
out.write(df[i:i+3].to_json(orient='split'))
out.write('
')
This yeilds a file which looks like this:
{"columns":["A","B","C"],"index":[0,1,2],"data":[[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169]]}
{"columns":["A","B","C"],"index":[3,4,5],"data":[[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169]]}
{"columns":["A","B","C"],"index":[6,7,8],"data":[[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169]]}
The advantage here is that the file can be split by line and distributed to different servers for processing batches of data.
How do I read back all the data in to a single dataframe? pd.read_json('sample_predictions_json_split', lines=True)
does not work as expected. I can of course just use a for loop with io, but I feel Pandas should have a way to do this.
question from:
https://stackoverflow.com/questions/65922938/pandas-reading-concatenated-jsons-written-by-df-to-jsonorient-split