Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
360 views
in Technique[技术] by (71.8m points)

python - Pandas reading concatenated jsons written by df.to_json(orient='split')

From my understanding, the "split" param is means to allow batches to be written.

E.g.

with io.open('sample_predictions_json_split.json', 'w') as out:
    for i in range(0, len(df), 3):
        out.write(df[i:i+3].to_json(orient='split'))
        out.write('
')

This yeilds a file which looks like this:

{"columns":["A","B","C"],"index":[0,1,2],"data":[[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169]]}
{"columns":["A","B","C"],"index":[3,4,5],"data":[[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169]]}
{"columns":["A","B","C"],"index":[6,7,8],"data":[[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169],[0.7152170382,0.8430748346,0.1486081169]]}

The advantage here is that the file can be split by line and distributed to different servers for processing batches of data.

How do I read back all the data in to a single dataframe? pd.read_json('sample_predictions_json_split', lines=True) does not work as expected. I can of course just use a for loop with io, but I feel Pandas should have a way to do this.

question from:https://stackoverflow.com/questions/65922938/pandas-reading-concatenated-jsons-written-by-df-to-jsonorient-split

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can read the whole file like this and append using df.append:

In [455]: df = pd.DataFrame()
In [456]: with open('f.json') as f:
     ...:     for line in f:
     ...:         df = df.append(pd.read_json(line))

In [457]: df
Out[457]: 
  columns  index                                        data
0       A      0  [0.7152170382, 0.8430748346, 0.1486081169]
1       B      1  [0.7152170382, 0.8430748346, 0.1486081169]
2       C      2  [0.7152170382, 0.8430748346, 0.1486081169]
0       A      3  [0.7152170382, 0.8430748346, 0.1486081169]
1       B      4  [0.7152170382, 0.8430748346, 0.1486081169]
2       C      5  [0.7152170382, 0.8430748346, 0.1486081169]
0       A      6  [0.7152170382, 0.8430748346, 0.1486081169]
1       B      7  [0.7152170382, 0.8430748346, 0.1486081169]
2       C      8  [0.7152170382, 0.8430748346, 0.1486081169]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...