I have a fairly complicated question. I need to select rows from a data frame within a certain set of start and end dates, and then sum those values and put them in a new dataframe.
So I start off with with data frame, df
:
import random
dates = pd.date_range('20150101 020000',periods=1000)
df = pd.DataFrame({'_id': random.choice(range(0, 1000)),
'time_stamp': dates,
'value': random.choice(range(2,60))
})
and define some start and end dates:
import pandas as pd
start_date = ["2-13-16", "2-23-16", "3-17-16", "3-24-16", "3-26-16", "5-17-16", "5-25-16", "10-10-16", "10-18-16", "10-23-16", "10-31-16", "11-7-16", "11-14-16", "11-22-16", "1-23-17", "1-29-17", "2-06-17", "3-11-17", "3-23-17", "6-21-17", "6-28-17"]
end_date = pd.DatetimeIndex(start_date) + pd.DateOffset(7)
Then what needs to happen is that I need to create a new data frame with weekly_sum
which sums the value
column of df
which occur in between the the start_date
and end_date
.
So for example, the first row of the new data frame would return the sum of the values
between 2-13-16 and 2-20-16. I imagine I'd use groupby.sum()
or something similar.
It might look like this:
id start_date end_date weekly_sum
65 2016-02-13 2016-02-20 100
Any direction is greatly appreciated!
P.S. I know my use of random.choice is a little wonky so if you have a better way of generating random numbers, I'd love to see it!
See Question&Answers more detail:
os