I have a df as shown below
B_ID No_Show Session slot_num Cumulative_no_show
1 0.4 S1 1 0.4
2 0.3 S1 2 0.7
3 0.8 S1 3 1.5
4 0.3 S1 4 1.8
5 0.6 S1 5 2.4
6 0.8 S1 6 3.2
7 0.9 S1 7 4.1
8 0.4 S1 8 4.5
9 0.6 S1 9 5.1
12 0.9 S2 1 0.9
13 0.5 S2 2 1.4
14 0.3 S2 3 1.7
15 0.7 S2 4 2.4
20 0.7 S2 5 3.1
16 0.6 S2 6 3.7
17 0.8 S2 7 4.5
19 0.3 S2 8 4.8
The code to create above df is shown below.
import pandas as pd
import numpy as np
df = pd.DataFrame({'B_ID': [1,2,3,4,5,6,7,8,9,12,13,14,15,20,16,17,19],
'No_Show': [0.4,0.3,0.8,0.3,0.6,0.8,0.9,0.4,0.6,0.9,0.5,0.3,0.7,0.7,0.6,0.8,0.3],
'Session': ['s1','s1','s1','s1','s1','s1','s1','s1','s1','s2','s2','s2','s2','s2','s2','s2','s2'],
'slot_num': [1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8],
})
df['Cumulative_no_show'] = df.groupby(['Session'])['No_Show'].cumsum()
and a list called walkin_no_show = [ 0.3, 0.4, 0.3, 0.4, 0.3, 0.4 and so on with length 1000]
From the above when ever u_cumulative > 0.8 create a new row just below that with
df[No_Show] = walkin_no_show[i]
and its Session and slot_num should be same as previous one and create a new column called u_cumulative by subtracting (1 - walkin_no_show[i]) from the previous.
Expected Output:
B_ID No_Show Session slot_num Cumulative_no_show u_cumulative
1 0.4 S1 1 0.4 0.4
2 0.3 S1 2 0.7 0.7
3 0.8 S1 3 1.5 1.5
walkin1 0.3 S1 3 1.5 0.8
4 0.3 S1 4 1.8 1.1
walkin2 0.4 S1 4 1.8 0.5
5 0.6 S1 5 2.4 1.1
walkin3 0.3 S1 5 2.4 0.4
6 0.8 S1 6 3.2 1.2
walkin4 0.4 S1 6 3.2 0.6
7 0.9 S1 7 4.1 1.5
walkin5 0.3 S1 7 4.1 0.8
8 0.4 S1 8 4.5 1.2
walkin6 0.4 S1 8 4.5 0.6
9 0.6 S1 9 5.1 1.2
12 0.9 S2 1 0.9 0.9
walkin1 0.3 S2 1 0.9 0.2
13 0.5 S2 2 1.4 0.7
14 0.3 S2 3 1.7 1.0
walkin2 0.4 S2 3 1.7 0.4
15 0.7 S2 4 2.4 1.1
walkin3 0.3 S2 4 2.4 0.4
20 0.7 S2 5 3.1 1.1
walkin4 0.4 S2 5 3.1 0.5
16 0.6 S2 6 3.7 1.1
walkin5 0.3 S2 6 3.7 0.4
17 0.8 S2 7 4.5 1.2
walkin6 0.4 S2 7 4.5 0.6
19 0.3 S2 8 4.8 0.9
I tried below code minor edit. As answered by @Ben.T on the below mentioned my question.
create new rows based the values of one of the column in pandas or numpy
Thanks @Ben.T. Full credit to you..
def create_u_columns (ser):
l_index = []
arr_ns = ser.to_numpy()
# array for latter insert
arr_idx = np.zeros(len(ser), dtype=int)
walkin_id = 1
for i in range(len(arr_ns)-1):
if arr_ns[i]>0.8:
# remove 1 to u_no_show
arr_ns[i+1:] -= (1-walkin_no_show[arr_idx])
# increment later idx to add
arr_idx[i] = walkin_id
walkin_id +=1
#return a dataframe with both columns
return pd.DataFrame({'u_cumulative': arr_ns, 'mask_idx':arr_idx}, index=ser.index)
df[['u_cumulative', 'mask_idx']]= df.groupby(['Session']['Cumulative_no_show'].apply(create_u_columns)
# select the rows
df_toAdd = df.loc[df['mask_idx'].astype(bool), :].copy()
# replace the values as wanted
df_toAdd['No_Show'] = walkin_no_show[mask_idx]
df_toAdd['B_ID'] = 'walkin'+df_toAdd['mask_idx'].astype(str)
df_toAdd['u_cumulative'] -= 1
# add 0.5 to index for later sort
df_toAdd.index += 0.5
new_df_0.8 = pd.concat([df,df_toAdd]).sort_index()
.reset_index(drop=True).drop('mask_idx', axis=1)
Also I would like to iterarate over a list. where we can change (arr_ns[i]>0.8) [0.8, 0.9, 1.0] and create 3 df such as new_df_0.8, new_df_0.9 and new_df_1.0
See Question&Answers more detail:
os