Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
98 views
in Technique[技术] by (71.8m points)

Python/Pandas:How to process a column of data like a dictionary

i have a csv lie this

cel_id|PDCP.RxBytesUl
1001-1234-1|5QI1:0.0001;5QI2:0.0002;5QI3:0.0003;5QI4:0.0004;5QI5:0.0005;5QI6:0.0006;5QI7:0.0007;5QI8:0.0008;5QI9:0.0009
1001-1234-2|5QI1:0.0001;5QI2:0.0003;5QI3:0.0005;5QI4:0.0007;5QI5:0.0009;5QI6:0.0010;5QI7:0.0000;5QI8:0.0000;5QI9:0.0128
1001-1234-4|5QI1:0.0001;5QI2:0.0003;5QI3:0.0005;5QI4:0.0007;5QI5:0.0009;5QI6:0.0010;5QI7:0.0010;5QI8:0.0030;5QI9:0.0020

i would like to sum the values from column "PDCP.RxBytesUl",

PDCP.RxBytesUl = 5QI1+5QI2+5QI3+5QI4+5QI5+5QI6+5QI7+5QI8+5QI9

finally,the result is like this

 cel_id      PDCP.RxBytesUl
1001-1234-1  0.0045
1001-1234-2  0.0163
1001-1234-4  0.0095

At first I wanted to convert this column into a dict(), but I found the format was not right, i have no idea, please help me, thank you

question from:https://stackoverflow.com/questions/65868607/python-pandashow-to-process-a-column-of-data-like-a-dictionary

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use Regex based solution:

df = pd.read_csv('input.csv',delimiter='|')

df['sum'] = df['PDCP.RxBytesUl'].str.extractall(':(d+(?:.d+)?)').astype('float').unstack().sum(axis=1)
df.drop('PDCP.RxBytesUl', axis=1, inplace=True)

df:

    cel_id      sum
0   1001-1234-1 0.0045
1   1001-1234-2 0.0163
2   1001-1234-4 0.0095

Better code Suggested by Shubham :)

df['sum'] = df['PDCP.RxBytesUl'].str.extractall(':([^;]+)').astype('float').sum(level=0)
df.drop('PDCP.RxBytesUl', axis=1, inplace=True)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...