Note: Line separated json is now supported in read_json
(since 0.19.0):
In [31]: pd.read_json('{"a":1,"b":2}
{"a":3,"b":4}', lines=True)
Out[31]:
a b
0 1 2
1 3 4
or with a file/filepath rather than a json string:
pd.read_json(json_file, lines=True)
It's going to depend on the size of you DataFrames which is faster, but another option is to use str.join
to smash your multi line "JSON" (Note: it's not valid json), into valid json and use read_json:
In [11]: '[%s]' % ','.join(test.splitlines())
Out[11]: '[{"a":1,"b":2},{"a":3,"b":4}]'
For this tiny example this is slower, if around 100 it's the similar, signicant gains if it's larger...
In [21]: %timeit pd.read_json('[%s]' % ','.join(test.splitlines()))
1000 loops, best of 3: 977 μs per loop
In [22]: %timeit l=[ json.loads(l) for l in test.splitlines()]; df = pd.DataFrame(l)
1000 loops, best of 3: 282 μs per loop
In [23]: test_100 = '
'.join([test] * 100)
In [24]: %timeit pd.read_json('[%s]' % ','.join(test_100.splitlines()))
1000 loops, best of 3: 1.25 ms per loop
In [25]: %timeit l = [json.loads(l) for l in test_100.splitlines()]; df = pd.DataFrame(l)
1000 loops, best of 3: 1.25 ms per loop
In [26]: test_1000 = '
'.join([test] * 1000)
In [27]: %timeit l = [json.loads(l) for l in test_1000.splitlines()]; df = pd.DataFrame(l)
100 loops, best of 3: 9.78 ms per loop
In [28]: %timeit pd.read_json('[%s]' % ','.join(test_1000.splitlines()))
100 loops, best of 3: 3.36 ms per loop
Note: of that time the join is surprisingly fast.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…