Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
131 views
in Technique[技术] by (71.8m points)

python - Adding a DataFrame column with len() of another column's values

I'm having a problem trying to get a character count column of the string values in another column, and haven't figured out how to do it efficiently.

for index in range(len(df)):
    df['char_length'][index] = len(df['string'][index]))

This apparently involves first creating a column of nulls and then rewriting it, and it takes a really long time on my data set. So what's the most effective way of getting something like

'string'     'char_length'
abcd          4
abcde         5

I've checked around quite a bit, but I haven't been able to figure it out.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Pandas has a vectorised string method for this: str.len(). To create the new column you can write:

df['char_length'] = df['string'].str.len()

For example:

>>> df
  string
0   abcd
1  abcde

>>> df['char_length'] = df['string'].str.len()
>>> df
  string  char_length
0   abcd            4
1  abcde            5

This should be considerably faster than looping over the DataFrame with a Python for loop.

Many other familiar string methods from Python have been introduced to Pandas. For example, lower (for converting to lowercase letters), count for counting occurrences of a particular substring, and replace for swapping one substring with another.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...