First you need to convert the installed_date
column to datetime
:
df['installed_date'] = pd.to_datetime(df['installed_date'])
Then you can use one of the options below:
Option 1: sort
the values on installed_date
then drop_duplicates
keeping only the last row per software_id
.
df.sort_values('installed_date').drop_duplicates('software_id', keep='last')
Option 2: group
the dataframe on softaware_id
and aggregate using idxmax
to get the index of most recent date per software_id
group, then use loc
with this index to filter the required rows:
idx = df.groupby('software_id')['installed_date'].idxmax()
df.loc[idx]
Result:
software_id software_name installed_date software_version
1 8331 Intel(R) Graphics Media Accelerator Driver 2010-09-15 8.15.10.2008
5 8332 Wireless Switch Utility 2011-01-25 4.3.1400.0
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…