Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.7k views
in Technique[技术] by (71.8m points)

python 3.x - Dask: Add list to a column value like pandas does

I am bit new to dask. I have large csv file and large list. Length of row of csv are equal to length of the list. I am trying to create a new column in the Dask dataframe from a list. In pandas, it pretty straight forward, however in Dask I am having hard time creating new column for it. I am avoiding to use pandas because my data is 15GB+.

Please see my tries below.

csv Data

name,text,address
john,some text here,MD
tim,some text here too,WA

Code tried

import dask.dataframe as dd
import numpy as np

ls = ['one','two']

ddf = dd.read_csv('../data/test.csv')
ddf.head()

Try #1: 
ddf['new'] = ls # TypeError: Column assignment doesn't support type list

Try #2: What should be passed here for condlist?
ddf['new'] = np.select(choicelist=ls) # TypeError: _select_dispatcher() missing 1 required positional argument: 'condlist'

Looking for this output:

   name                text address new
0  john      some text here      MD one
1   tim  some text here too      WA two

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try creating a dask dataframe and then appending it like this -

ls = dd.from_array(np.array(['one','two']))
ddf['new'] = ls

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...