Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
651 views
in Technique[技术] by (71.8m points)

python - Replacing newlines with spaces for str columns through pandas dataframe

Given an example dataframe with the 2nd and 3rd columns of free text, e.g.

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo
bar'], [3,1, 'def
haha', 'love it
']]
>>> pd.DataFrame(lol)
   0  1          2          3
0  1  2        abc   foo
bar
1  3  1  def
haha  love it

The goal is to replace the to (whitespace) and strip the string in column 2 and 3 to achieve:

>>> pd.DataFrame(lol)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

How to replace newlines with spaces for specific columns through pandas dataframe?

I have tried this:

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo
bar'], [3,1, 'def
haha', 'love it
']]

>>> replace_and_strip = lambda x: x.replace('
', ' ').strip()

>>> lol2 = [[replace_and_strip(col) if type(col) == str else col for col in list(row)] for idx, row in pd.DataFrame(lol).iterrows()]

>>> pd.DataFrame(lol2)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

But there must be a better/simpler way.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use replace - first first and last strip and then replace :

df = df.replace({r's+$': '', r'^s+': ''}, regex=True).replace(r'
',  ' ', regex=True)
print (df)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...