I have a df like:
id | start_date | end_date | price
1 | 2020-10-01 | 2020-10-3 | 1
1 | 2020-10-03 | 2020-10-4 | 1
2 | 2020-10-04 | 2020-10-6 | 2
3 | 2020-10-05 | 2020-10-5 | 3
Columns "start_date" and "end_date" are datetime64[ns].
I want to create a "date" column from the date range.
Easiest way is creating a pandas.date_range(start_date, end_date, freq="D"), then using .explode().
End result should look like:
id | start_date | end_date | price | date
1 | 2020-10-01 | 2020-10-3 | 1 | 2020-10-01
1 | 2020-10-01 | 2020-10-3 | 1 | 2020-10-02
1 | 2020-10-01 | 2020-10-3 | 1 | 2020-10-03
1 | 2020-10-03 | 2020-10-4 | 1 | 2020-10-03
1 | 2020-10-03 | 2020-10-4 | 1 | 2020-10-04
2 | 2020-10-04 | 2020-10-6 | 2 | 2020-10-04
2 | 2020-10-04 | 2020-10-6 | 2 | 2020-10-05
2 | 2020-10-04 | 2020-10-6 | 2 | 2020-10-06
3 | 2020-10-05 | 2020-10-5 | 3 | 2020-10-05
Tried so far:
df["daterange"] = pd.date_range(df["start_date"], df["end_date"])
TypeError: Cannot convert input [0 2020-10-01
1 2020-10-01
for row in df.itertuples():
df["daterange"] = pd.date_range(start=row.start_date, end=row.end_date)
ValueError: Length of values (3) does not match length of index (9)
Lambdas, apply, melt, etc. is way too slow for my dataframe size and is not useable!
/edit
Fastet method I found so far:
https://github.com/Garve/scikit-bonus
skbonus.pandas.preprocessing.DateTimeExploder(
"date",
start_column="start_date",
end_column="end_date",
frequency="d",
drop=False,
)
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…