python - How to group dataframe by hour using timestamp with Pandas

Question

Welcome To Ask or Share your Answers For Others

python - How to group dataframe by hour using timestamp with Pandas

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to group dataframe by hour using timestamp with Pandas

I have the following dataframe structure that is indexed with a timestamp:

    neg neu norm    pol pos date
time                        
1520353341  0.000   1.000   0.0000  0.000000    0.000   
1520353342  0.121   0.879   -0.2960 0.347851    0.000   
1520353342  0.217   0.783   -0.6124 0.465833    0.000

I create a date from the timestamp:

data_frame['date'] = [datetime.datetime.fromtimestamp(d) for d in data_frame.time]

Result:

    neg neu norm    pol pos date
time                        
1520353341  0.000   1.000   0.0000  0.000000    0.000   2018-03-06 10:22:21
1520353342  0.121   0.879   -0.2960 0.347851    0.000   2018-03-06 10:22:22
1520353342  0.217   0.783   -0.6124 0.465833    0.000   2018-03-06 10:22:22

I want to group by hour, while getting the mean for all the values, except the timestamp, that should be the hour from where the group started. So this is the result I want to archive:

    neg neu norm    pol pos
time                    
1520352000  0.027989    0.893233    0.122535    0.221079    0.078779
1520355600  0.028861    0.899321    0.103698    0.209353    0.071811

The closest I have gotten so far has been with this answer:

data = data.groupby(data.date.dt.hour).mean()

Results:

    neg neu norm    pol pos
date                    
0   0.027989    0.893233    0.122535    0.221079    0.078779
1   0.028861    0.899321    0.103698    0.209353    0.071811

But I cant figure out how to keep the timestamp that takes in account he hour where the grouby started.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:27:39+0000

I came across this gem, pd.DataFrame.resample, after I posted my round-to-hour solution.

# Construct example dataframe
times = pd.date_range('1/1/2018', periods=5, freq='25min')
values = [4,8,3,4,1]
df = pd.DataFrame({'val':values}, index=times)

# Resample by hour and calculate medians
df.resample('H').median()

Or you can use groupby with Grouper if you don't want times as index:

df = pd.DataFrame({'val':values, 'times':times})
df.groupby(pd.Grouper(level='times', freq='H')).median()

Categories

python - How to group dataframe by hour using timestamp with Pandas

python - How to group dataframe by hour using timestamp with Pandas

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags