python - How to remove newline characters in csv columns, without removing the endrow newline character?

Question

Welcome To Ask or Share your Answers For Others

python - How to remove newline characters in csv columns, without removing the endrow newline character?

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to remove newline characters in csv columns, without removing the endrow newline character?

So I have this dataset where there are sometimes random newline characters entered into some cells, and I need to delete them.

this is what I've tried:

with open ('filepath') as inf, open('filepath', 'w') as outf:
    for line in inf:
        outf.write(line.replace('
', ''))

Unfortunately, this removed ALL newline characters, including the ones at the end of the row, which turns my csv file into a big one-liner

Does anyone know how I can only delete the random newline characters and not the 'real' endline characters?

Edit: If it helps, each 'real' new line starts with a 6 digit string of numbers (besides for the header line). Maybe some regex pattern that looks ahead to detect if there's some number string could work?

Edit2: I've tried using pandas to edit it with:

df = pd.read_csv(filepath)

for i in df.columns:
    if df[i].dtype==np.object:
        df[i] = df[i].str.replace('
','')

weirdly, this works if I copy the stuff inside the .csv into a new text file, but it doesn't work on my original csv file, and I'm not sure why.

Final Edit:

So big thanks to DDS for his help. Managed to get it to work using this:

num_cols = 48

buf = ""

with open (filepath) as inf, open (filepath, 'w') as outf:
    for line in inf:
        if len(line.split(',')) < num_cols:
            buf += line.replace('
', '')
            if len(buf.split(',')) == num_cols:
                outf.write(buf+'
')
            else: continue
            buf = ""
        else:
            outf.write(line)

question from:https://stackoverflow.com/questions/65932262/how-to-remove-newline-characters-in-csv-columns-without-removing-the-endrow-new

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T18:59:55+0000

Assuming you know the number of fields per line and no field contains the csv separator (comma): you could do like this:

    number_of_columns_in_the_table = 5 #assuming a line has 5 columns
    with open ('filepath') as inf, open('filepath', 'w') as outf:
        for line in inf:
            #check if the number of "splits equals the nummber of fields"
            if len(line.split(',')) < number_of_columns_in_the_table
               
 outf.write(line.replace('
', ''))
            else:
                outf.write(line)

EDIT

number_of_columns_in_the_table = 5 #assuming a line has 5 columns
    with open ('filepath') as inf, open('filepath', 'w') as outf:
        for line in inf:
            #check if the number of "splits equals the nummber of fields"
            if len(line.split(',')) < number_of_columns_in_the_table
               buf += line.replace('
', '');
           if len(line.split(',')) == number_of_columns_in_the_table
               outf.write( buf)
            else:
                outf.write(line)

Categories

python - How to remove newline characters in csv columns, without removing the endrow newline character?

python - How to remove newline characters in csv columns, without removing the endrow newline character?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags