Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
520 views
in Technique[技术] by (71.8m points)

python 2.7 - Pandas Bad Lines Warning Capture

Is there any way in Pandas to capture the warning produced by setting error_bad_lines = False and warn_bad_lines = True? For instance the following script:

import pandas as pd
from StringIO import StringIO
data = StringIO("""a,b,c
                   1,2,3
                   4,5,6
                   6,7,8,9
                   1,2,5
                   3,4,5""")
pd.read_csv(data, warn_bad_lines=True, error_bad_lines=False)

produces the warning:

Skipping line 4: expected 3 fields, saw 4

I'd like to store this output to a string so that I can eventually write it to a log file to keep track of records that are being skipped.

I tried using the warning module but it doesn't appear as though this "warning" is of the traditional sense. I'm using Python 2.7 and Pandas 0.16.

Any help would be greatly appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think it isn't implemented to pandas.
source1, source2

My solutions:

1. Pre or after processing

import pandas as pd
import csv      

df = pd.read_csv('data.csv', warn_bad_lines=True, error_bad_lines=False)

#compare length of rows by recommended value:
RECOMMENDED = 3

with open('data.csv') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    for row in reader:
        if (len(row) != RECOMMENDED):
            print ("Length of row is: %r" % len(row) )
            print row

#compare length of rows by length of columns in df
lencols = len(df.columns)
print lencols

with open('data.csv') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    for row in reader:
        if (len(row) != lencols):
            print ("Length of row is: %r" % len(row) )
            print row

2. Replaces sys.stdout

import pandas as pd
import os
import sys

class RedirectStdStreams(object):
    def __init__(self, stdout=None, stderr=None):
        self._stdout = stdout or sys.stdout
        self._stderr = stderr or sys.stderr

    def __enter__(self):
        self.old_stdout, self.old_stderr = sys.stdout, sys.stderr
        self.old_stdout.flush(); self.old_stderr.flush()
        sys.stdout, sys.stderr = self._stdout, self._stderr

    def __exit__(self, exc_type, exc_value, traceback):
        self._stdout.flush(); self._stderr.flush()
        sys.stdout = self.old_stdout
        sys.stderr = self.old_stderr


if __name__ == '__main__':

    devnull = open('log.txt', 'w')

    #replaces sys.stdout, sys.stderr, see http://stackoverflow.com/a/6796752/2901002
    with RedirectStdStreams(stdout=devnull, stderr=devnull):
        df = pd.read_csv('data.csv', warn_bad_lines=True, error_bad_lines=False)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...