I am interested in streaming a custom object into a pandas dataframe. According to the documentation, any object with a read() method can be used. However, even after implementing this function I am still getting this error:
ValueError: Invalid file path or buffer object type: <class '__main__.DataFile'>
Here is a simple version of the object, and how I am calling it:
class DataFile(object):
def __init__(self, files):
self.files = files
def read(self):
for file_name in self.files:
with open(file_name, 'r') as file:
for line in file:
yield line
import pandas as pd
hours = ['file1.csv', 'file2.csv', 'file3.csv']
data = DataFile(hours)
df = pd.read_csv(data)
Am I missing something, or is it just not possible to use a custom generator in Pandas? When I call the read() method it works just fine.
EDIT:
The reason I want to use a custom object rather than concatenating the dataframes together is to see if it is possible to reduce memory usage. I have used the gensim library in the past, and it makes it really easy to use custom data objects, so I was hoping to find some similar approach.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…