There are many ways you could accomplish this, I spent some time evaluating performance on a not-so-large (70k) dataframe. Although @der_die_das_jojo's answer is functional, it's also pretty slow.
The answer suggested by this question actually turns out to be about 5x faster on a large dataframe.
On my test dataframe (df
):
Above method:
%time [ v.dropna().to_dict() for k,v in df.iterrows() ]
CPU times: user 51.2 s, sys: 0 ns, total: 51.2 s
Wall time: 50.9 s
Another slow method:
%time df.apply(lambda x: [x.dropna()], axis=1).to_dict(orient='rows')
CPU times: user 1min 8s, sys: 880 ms, total: 1min 8s
Wall time: 1min 8s
Fastest method I could find:
%time [ {k:v for k,v in m.items() if pd.notnull(v)} for m in df.to_dict(orient='rows')]
CPU times: user 14.5 s, sys: 176 ms, total: 14.7 s
Wall time: 14.7 s
The format of this output is a row-oriented dictionary, you may need to make adjustments if you want the column-oriented form in the question.
Very interested if anyone finds an even faster answer to this question.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…