Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
553 views
in Technique[技术] by (71.8m points)

python - How can I remove extra whitespace from strings when parsing a csv file in Pandas?

I have the following file named 'data.csv':

    1997,Ford,E350
    1997, Ford , E350
    1997,Ford,E350,"Super, luxurious truck"
    1997,Ford,E350,"Super ""luxurious"" truck"
    1997,Ford,E350," Super luxurious truck "
    "1997",Ford,E350
    1997,Ford,E350
    2000,Mercury,Cougar

And I would like to parse it into a pandas DataFrame so that the DataFrame looks as follows:

       Year     Make   Model              Description
    0  1997     Ford    E350                     None
    1  1997     Ford    E350                     None
    2  1997     Ford    E350   Super, luxurious truck
    3  1997     Ford    E350  Super "luxurious" truck
    4  1997     Ford    E350    Super luxurious truck
    5  1997     Ford    E350                     None
    6  1997     Ford    E350                     None
    7  2000  Mercury  Cougar                     None

The best I could do was:

    pd.read_table("data.csv", sep=r',', names=["Year", "Make", "Model", "Description"])

Which gets me:

    Year     Make   Model              Description
 0  1997     Ford    E350                     None
 1  1997    Ford     E350                     None
 2  1997     Ford    E350   Super, luxurious truck
 3  1997     Ford    E350  Super "luxurious" truck
 4  1997     Ford    E350   Super luxurious truck 
 5  1997     Ford    E350                     None
 6  1997     Ford    E350                     None
 7  2000  Mercury  Cougar                     None

How can I get the DataFrame without those whitespaces?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could use converters:

import pandas as pd

def strip(text):
    try:
        return text.strip()
    except AttributeError:
        return text

def make_int(text):
    return int(text.strip('" '))

table = pd.read_table("data.csv", sep=r',',
                      names=["Year", "Make", "Model", "Description"],
                      converters = {'Description' : strip,
                                    'Model' : strip,
                                    'Make' : strip,
                                    'Year' : make_int})
print(table)

yields

   Year     Make   Model              Description
0  1997     Ford    E350                     None
1  1997     Ford    E350                     None
2  1997     Ford    E350   Super, luxurious truck
3  1997     Ford    E350  Super "luxurious" truck
4  1997     Ford    E350    Super luxurious truck
5  1997     Ford    E350                     None
6  1997     Ford    E350                     None
7  2000  Mercury  Cougar                     None

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...