python - Pandas reading concatenated jsons written by df.to_json(orient='split')

Question

Welcome To Ask or Share your Answers For Others

python - Pandas reading concatenated jsons written by df.to_json(orient='split')

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas reading concatenated jsons written by df.to_json(orient='split')

From my understanding, the "split" param is means to allow batches to be written.

E.g.

with io.open('sample_predictions_json_split.json', 'w') as out:
    for i in range(0, len(df), 3):
        out.write(df[i:i+3].to_json(orient='split'))
        out.write('
')

This yeilds a file which looks like this:

{"columns":["A","B","C"],"index":[0,1,2],"data":[[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169]]}
{"columns":["A","B","C"],"index":[3,4,5],"data":[[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169]]}
{"columns":["A","B","C"],"index":[6,7,8],"data":[[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169]]}

The advantage here is that the file can be split by line and distributed to different servers for processing batches of data.

How do I read back all the data in to a single dataframe? pd.read_json('sample_predictions_json_split', lines=True) does not work as expected. I can of course just use a for loop with io, but I feel Pandas should have a way to do this.

question from:https://stackoverflow.com/questions/65922938/pandas-reading-concatenated-jsons-written-by-df-to-jsonorient-split

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:07:33+0000

You can read the whole file like this and append using df.append:

In [455]: df = pd.DataFrame()
In [456]: with open('f.json') as f:
     ...:     for line in f:
     ...:         df = df.append(pd.read_json(line))

In [457]: df
Out[457]: 
  columns  index                                        data
0       A      0  [0.7152170382, 0.8430748346, 0.1486081169]
1       B      1  [0.7152170382, 0.8430748346, 0.1486081169]
2       C      2  [0.7152170382, 0.8430748346, 0.1486081169]
0       A      3  [0.7152170382, 0.8430748346, 0.1486081169]
1       B      4  [0.7152170382, 0.8430748346, 0.1486081169]
2       C      5  [0.7152170382, 0.8430748346, 0.1486081169]
0       A      6  [0.7152170382, 0.8430748346, 0.1486081169]
1       B      7  [0.7152170382, 0.8430748346, 0.1486081169]
2       C      8  [0.7152170382, 0.8430748346, 0.1486081169]

Categories

python - Pandas reading concatenated jsons written by df.to_json(orient='split')

python - Pandas reading concatenated jsons written by df.to_json(orient='split')

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags