create new rows based the values of one of the column in pandas or numpy

Question

Welcome To Ask or Share your Answers For Others

create new rows based the values of one of the column in pandas or numpy

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

create new rows based the values of one of the column in pandas or numpy

I have a data frame as shown below. which is doctors appointment data.

B_ID   No_Show   Session  slot_num  Cumulative_no_show
    1     0.4       S1        1       0.4   
    2     0.3       S1        2       0.7      
    3     0.8       S1        3       1.5        
    4     0.3       S1        4       1.8       
    5     0.6       S1        5       2.4         
    6     0.8       S1        6       3.2       
    7     0.9       S1        7       4.1        
    8     0.4       S1        8       4.5   
    9     0.6       S1        9       5.1     
    12    0.9       S2        1       0.9    
    13    0.5       S2        2       1.4       
    14    0.3       S2        3       1.7        
    15    0.7       S2        4       2.4         
    20    0.7       S2        5       3.1          
    16    0.6       S2        6       3.7       
    17    0.8       S2        7       4.5        
    19    0.3       S2        8       4.8

From the above when ever u_cumulative > 0.8 create a new row just below that with No_Show = 0.0 and its Session and slot_num should be same as previous one and create a new column called u_cumulative by subtracting 1 from the previous.

Expected Output:

B_ID   No_Show   Session  slot_num  Cumulative_no_show    u_cumulative
    1     0.4       S1        1       0.4                 0.4
    2     0.3       S1        2       0.7                 0.7
    3     0.8       S1        3       1.5                 1.5
walkin1   0.0       S1        3       1.5                 0.5
    4     0.3       S1        4       1.8                 0.8      
    5     0.6       S1        5       2.4                 1.4
walkin2   0.0       S1        5       2.4                 0.4    
    6     0.8       S1        6       3.2                 1.2
walkin3   0.0       S1        6       3.2                 0.2      
    7     0.9       S1        7       4.1                 1.1
walkin4   0.0       S1        7       4.1                 0.1               
    8     0.4       S1        8       4.5                 0.5   
    9     0.6       S1        9       5.1                 1.1
walkin5   0.0       S1        7       5.1                 0.1
    12    0.9       S2        1       0.9                 0.9
walkin1   0.0       S2        1       0.9                -0.1
    13    0.5       S2        2       1.4                 0.4    
    14    0.3       S2        3       1.7                 0.7       
    15    0.7       S2        4       2.4                 1.4
walkin2   0.0       S2        4       2.4                 0.4      
    20    0.7       S2        5       3.1                 1.1
walkin3   0.0       S2        5       3.1                 0.1       
    16    0.6       S2        6       3.7                 0.7                    
    17    0.8       S2        7       4.5                 1.5
walkin4   0.0       S2        7       4.5                 0.5       
    19    0.3       S2        8       4.8                 0.8

I tried below to calculate u_cumulative

def create_u_columns (ser):
    arr_ns = ser.to_numpy()
    arr_sn = np.ones(len(ser))
    for i in range(len(arr_ns)-1):
        if arr_ns[i]>0.6:
            # remove 1 to u_no_show
            arr_ns[i+1:] -= 1
        else:
            # increment u_slot_num
            arr_sn[i+1:] += 1
    #return a dataframe with both columns
    return pd.DataFrame({'U_slot_num':arr_sn, 'U_No_show': arr_ns}, index=ser.index)

df[['U_slot_num', 'u_cumulative']] = df.groupby(['Session'])['Cumulative_No_show'].apply(create_u_columns)

But I am not able create new rows based on the logic explained above.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:25:53+0000

you can do it by slightly modify the function by creating a count column where to add the later walkin rows:

def create_u_columns (ser):
    l_index = []
    arr_ns = ser.to_numpy()
    # array for latter insert
    arr_idx = np.zeros(len(ser), dtype=int)
    walkin_id = 1
    for i in range(len(arr_ns)-1):
        if arr_ns[i]>0.8:
            # remove 1 to u_no_show
            arr_ns[i+1:] -= 1
            # increment later idx to add
            arr_idx[i] = walkin_id
            walkin_id +=1
    #return a dataframe with both columns
    return pd.DataFrame({'u_cumulative': arr_ns, 'mask_idx':arr_idx}, index=ser.index)

df[['u_cumulative', 'mask_idx']]= df.groupby(['Session'])['Cumulative_no_show'].apply(create_u_columns)

Now you need to work on the row that need to be added:

# select the rows
df_toAdd = df.loc[df['mask_idx'].astype(bool), :].copy()
# replace the values as wanted
df_toAdd['No_Show'] = 0
df_toAdd['B_ID'] = 'walkin'+df_toAdd['mask_idx'].astype(str)
df_toAdd['u_cumulative'] -= 1
# add 0.5 to index for later sort
df_toAdd.index += 0.5

now you just need to concat this dataframe to the original one, sort_index, reset_index if needed to get a cleaner one and drop the extra column created earlier

new_df = pd.concat([df,df_toAdd]).sort_index()
           .reset_index(drop=True).drop('mask_idx', axis=1)

print (new_df)
       B_ID  No_Show Session  slot_num  Cumulative_no_show  u_cumulative
0         1      0.4      S1         1                 0.4           0.4
1         2      0.3      S1         2                 0.7           0.7
2         3      0.8      S1         3                 1.5           1.5
3   walkin1      0.0      S1         3                 1.5           0.5
4         4      0.3      S1         4                 1.8           0.8
5         5      0.6      S1         5                 2.4           1.4
6   walkin2      0.0      S1         5                 2.4           0.4
7         6      0.8      S1         6                 3.2           1.2
8   walkin3      0.0      S1         6                 3.2           0.2
9         7      0.9      S1         7                 4.1           1.1
10  walkin4      0.0      S1         7                 4.1           0.1
11        8      0.4      S1         8                 4.5           0.5
12        9      0.6      S1         9                 5.1           1.1
13       12      0.9      S2         1                 0.9           0.9
14  walkin1      0.0      S2         1                 0.9          -0.1
15       13      0.5      S2         2                 1.4           0.4
16       14      0.3      S2         3                 1.7           0.7
17       15      0.7      S2         4                 2.4           1.4
18  walkin2      0.0      S2         4                 2.4           0.4
19       20      0.7      S2         5                 3.1           1.1
20  walkin3      0.0      S2         5                 3.1           0.1
21       16      0.6      S2         6                 3.7           0.7
22       17      0.8      S2         7                 4.5           1.5
23  walkin4      0.0      S2         7                 4.5           0.5
24       19      0.3      S2         8                 4.8           0.8

Categories

create new rows based the values of one of the column in pandas or numpy

create new rows based the values of one of the column in pandas or numpy

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags