Here is a little function that has been handy for me, using some pandas
features that might not have been available when the question was originally posed:
def load_large_dta(fname):
import sys
reader = pd.read_stata(fname, iterator=True)
df = pd.DataFrame()
try:
chunk = reader.get_chunk(100*1000)
while len(chunk) > 0:
df = df.append(chunk, ignore_index=True)
chunk = reader.get_chunk(100*1000)
print '.',
sys.stdout.flush()
except (StopIteration, KeyboardInterrupt):
pass
print '
loaded {} rows'.format(len(df))
return df
I loaded an 11G Stata file in 100 minutes with this, and it's nice to have something to play with if I get tired of waiting and hit cntl-c
.
This notebook shows it in action.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…