I have a question very similar to this one but I need to take it a step further by saving split data frames to csv.
import pandas as pd
import numpy as np
import os
df = pd.DataFrame({ 'CITY' : np.random.choice(['PHOENIX','ATLANTA','CHICAGO', 'MIAMI', 'DENVER'], 1000),
'DAY': np.random.choice(['Monday','Tuesday','Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], 1000),
'TIME_BIN': np.random.randint(1, 86400, size=1000),
'COUNT': np.random.randint(1, 700, size=1000)})
df['TIME_BIN'] = pd.to_datetime(df['TIME_BIN'], unit='s').dt.round('10min').dt.strftime('%H:%M:%S')
print(df)
OUTPUT:
CITY COUNT DAY TIME_BIN
0 ATLANTA 476 Thursday 12:20:00
1 PHOENIX 50 Saturday 15:40:00
2 MIAMI 250 Friday 08:20:00
3 CHICAGO 358 Monday 15:40:00
4 PHOENIX 217 Thursday 22:10:00
5 MIAMI 12 Thursday 21:40:00
6 DENVER 22 Friday 10:30:00
7 CHICAGO 645 Sunday 23:40:00
8 MIAMI 188 Sunday 08:40:00
I want to make a separate data frame for each city and save it as a .csv. The code below works but how do I do it Pythonicly without having to explicitly state each city? Real data set has about 20 cities so I don't want to repaste this 20 times. I think the code below can be done in 1-2 lines using a for loop but I don't know what it would look like. Something like "for city in df['CITY']"
df_phoenix = df[df['CITY'] == "PHOENIX"]
df_atlanta = df[df['CITY'] == "ATLANTA"]
df_chicago = df[df['CITY'] == "CHICAGO"]
df_phoenix.to_csv(os.getcwd() + "/data_phoenix.csv")
df_atlanta.to_csv(os.getcwd() + "/data_atlanta.csv")
df_chicago.to_csv(os.getcwd() + "/data_chicago.csv")
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…