df2 = pd.DataFrame({'person_id':[11,11,11,11,11,12,12,13,13,14,14,14,14],
'admit_date':['01/01/2011','01/01/2009','12/31/2013','12/31/2017','04/03/2014','08/04/2016',
'03/05/2014','02/07/2011','08/08/2016','12/31/2017','05/01/2011','05/21/2014','07/12/2016']})
df2 = df2.melt('person_id', value_name='dates')
df2['dates'] = pd.to_datetime(df2['dates'])
What I would like to do is
a) Exclude/filter out records from the data frame if a subject has Dec 31st
and Jan 1st
in its records. Please note that year
doesn't matter.
If a subject has either Dec 31st
or Jan 1st
, we leave them as is.
But if they have both Dec 31st
and Jan 1st
, we remove one (either Dec 31st or Jan 1st) of them. note they could have multiple entries with the same date as well. Like person_id = 11
I could only do the below
df2_new = df2['dates'] != '2017-12-31' #but this excludes if a subject has only `Dec 31st on 2017`. How can I ignore the dates and not consider `year`
df2[df2_new]
My expected output is like as shown below
For person_id = 11, we drop 12-31
because it had both 12-31
and 01-01
in their records whereas for person_id = 14, we don't drop 12-31
because it has only 12-31
in its records.
We drop 12-31
only when both 12-31
and 01-01
appear in a person's records.
See Question&Answers more detail:
os