I have a dataframe of customers with records for shipments they received. Unfortunately, these can overlap. I'm trying to reduce rows so that I can see dates of consecutive use. Is there any way to do this besides a brute force iterrows implementation?
Here's a sample and what I'd like to do:
df = pd.DataFrame([['A','2011-02-07','2011-02-22',1],['A','2011-02-14','2011-03-10',2],['A','2011-03-07','2011-03-15',3],['A','2011-03-18','2011-03-25',4]], columns = ['Cust','startDate','endDate','shipNo'])
df
condensedDf = df.groupby(['Cust']).apply(reductionFunction)
condensedDF
the reductionFunction will group the first 3 records into one, because in each case the start date of the next is before the end date of the prior. I'm essentially turning multiple records that overlap into one record.
Thoughts on a good "pythonic" implementation? I could do a nasty while loop within each group, but I'd prefer not to...
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…