Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
49 views
in Technique[技术] by (71.8m points)

python - How to assign a fix value to all hour of a day in pandas

I have a half-hourly dataframe with two columns. I would like to take all the hours of a day, then do some calculation which returns one number and assign that to all half-hours of that day. Below is an example code:

dates = pd.date_range("2003-01-01 08:30:00","2003-01-05",freq="30min")
data = np.transpose(np.array([np.random.rand(dates.shape[0]),np.random.rand(dates.shape[0])*100]))
data[0:50,0]=np.nan # my actual dataframe includes nan
df = pd.DataFrame(data = data,index =dates,columns=["DATA1","DATA2"])
print(df)
                        DATA1      DATA2
2003-01-01 08:30:00       NaN  79.990866
2003-01-01 09:00:00       NaN   5.461791
2003-01-01 09:30:00       NaN  68.892447
2003-01-01 10:00:00       NaN  44.823338
2003-01-01 10:30:00       NaN  57.860309
...                       ...        ...
2003-01-04 22:00:00  0.394574  31.943657
2003-01-04 22:30:00  0.140950  78.275981

Then I would like to apply the following function which returns one numbre:

def my_f(data1,data2):
    y = data1[data2>20]
    return np.median(y) 

This function selects all data in DATA1 based on a condition (DATA2>20) then takes the median of all these data. How can I create a third column (let's say result) and assign back this fixed number (y) for all half-hours data of that day?

My guess is I should use something like this:

daily_tmp = df.resample('D').apply(my_f)
df['results'] = b.reindex(df.index,method='ffill')

If this approach is correct, how can I pass my_f with two arguments to resample.apply()? Or is there any other way to do the similar task?

question from:https://stackoverflow.com/questions/65947049/how-to-assign-a-fix-value-to-all-hour-of-a-day-in-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

My solution assumes that you have a fairly small dataset. Please let me know if it is not the case.

I would decompose your goal as follows: (1) group data by day (2) for each day, compute some complicated function (3) assign the resulted value in to half-hours.

# specify the day for each datapoint
df['day'] = df.index.map(lambda x: x.strftime('%Y-%m-%d'))
# compute a complicated function for each day and store the result
mapping = {}
for day, data_for_the_day in df.groupby(by='day'):
    # assign to mapping[day] the result of a complicated function
    mapping[day] = np.mean(data_for_the_day[data_for_the_day['Data2'] > 20]['Data1'])

# assign the values to half-hours
df['result'] = df.index.map(lambda x: mapping.get(x.strftime('%Y-%m-%d'), np.nan) if x.strftime('%M')=='30' else np.nan)

That's not the neatest solution, but it is straight-forward, easy-to-understand, and works well on small datasets.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...