Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
361 views
in Technique[技术] by (71.8m points)

database - converting text file (900mb, 300 cols, pipe delim) to pandas df - Dtype warning, memory errors

i have a 900mb text file (pipe delim) that i need to convert to a pandas df and, ultimately, ingest to a postgres db.

i've tried looping to chunk, but it didn't work

df = pd.DataFrame()
for chunk in pd.read_csv(r"my_file.txt", sep='|', chunksize=1000):
     df = pd.concat([df, chunk], ignore_index=True)

what else should i try? any help for a n00b is much appreciated. thank you!

EDIT (originally posted question was poorly detailed on my end, so i'm adding more to be less of in idiot when it comes to asking for help :) ): when trying to read the entire file and check nRows, using:

data = pd.read_csv(r"my_file.txt", sep='|')
print('Total rows: {0}'.format(len(data)))
print(list(data))

i'm thrown a DytpeWarning on ~50 columns (of ~300) asking to specify dtype option on import. i'm also thrown a MemoryError:

MemoryError: Unable to allocate 410. MiB for an array with shape (277, 388455) and data type object

out of curiosity, i tried reading different increments of nrows to see when the Dtype warning and file memory will be initially thrown - i'm able to read the first 2000 rows without either warning or error. i was able to read the first 240,000 rows without memory error, but with the Dtype warning on ~50 columns of the 300.

will i need to specify the Dtype in read_csv() for each column to avoid the warning?

additionally, i'm unsure how to handle the memory error - as one commenter mentioned below, 900mb isn't exactly wildly massive.

question from:https://stackoverflow.com/questions/65859914/converting-text-file-900mb-300-cols-pipe-delim-to-pandas-df-dtype-warning

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...