To calculate a volume weighted moving average (VWMA) I am collecting a sum(price*volume) and dividing it by the sum(volume).
I need a faster way to get a value from the previous row and add it to a value on the current row.
I have the following dataframe:
import pandas as pd
from itertools import repeat
df = pd.DataFrame({'dtime': ['16:00', '15:00', '14:00', '13:00', '12:00', '11:00', '10:00', '09:00', '08:00', '07:00', '06:00', '05:00', '04:00', '03:00', '02:00', '01:00'],
'time': [1800, 1740, 1680, 1620, 1560, 1500, 1440, 1380, 1320, 1260, 1200, 1140, 1080, 1020, 960, 900],
'price': [100.1, 102.7, 108.5, 105.3, 107.1, 103.4, 101.8, 102.7, 101.6, 99.8, 100.2, 97.7, 99.3, 100.1, 102.5, 103.9],
'volume': [6.0, 6.5, 5.4, 6.3, 6.4, 7.1, 6.7, 6.2, 5.7, 1.2, 2.4, 3.9, 5.2, 8.9, 7.2, 6.5]
}, columns = ['dtime', 'time', 'price', 'volume']).set_index('dtime')
df.insert(df.shape[1], "PV", df['price']*df['volume'])
df.insert(df.shape[1], "flag", list(repeat(0.0,len(df))))
df.insert(df.shape[1], "PVsum_2", list(repeat(0.0,len(df))))
df.insert(df.shape[1], "Vsum_2", list(repeat(0.0,len(df))))
df.insert(df.shape[1], "VWMA_2", list(repeat(0.0,len(df))))
Which is
df =
time price volume PV flag PVsum_2 Vsum_2 VWMA_2
dtime
16:00 1800 100.1 6.0 600.60 0.0 0.0 0.0 0.0
15:00 1740 102.7 6.5 667.55 0.0 0.0 0.0 0.0
14:00 1680 108.5 5.4 585.90 0.0 0.0 0.0 0.0
13:00 1620 105.3 6.3 663.39 0.0 0.0 0.0 0.0
12:00 1560 107.1 6.4 685.44 0.0 0.0 0.0 0.0
11:00 1500 103.4 7.1 734.14 0.0 0.0 0.0 0.0
10:00 1440 101.8 6.7 682.06 0.0 0.0 0.0 0.0
09:00 1380 102.7 6.2 636.74 0.0 0.0 0.0 0.0
08:00 1320 101.6 5.7 579.12 0.0 0.0 0.0 0.0
07:00 1260 99.8 1.2 119.76 0.0 0.0 0.0 0.0
06:00 1200 100.2 2.4 240.48 0.0 0.0 0.0 0.0
05:00 1140 97.7 3.9 381.03 0.0 0.0 0.0 0.0
04:00 1080 99.3 5.2 516.36 0.0 0.0 0.0 0.0
03:00 1020 100.1 8.9 890.89 0.0 0.0 0.0 0.0
02:00 960 102.5 7.2 738.00 0.0 0.0 0.0 0.0
01:00 900 103.9 6.5 675.35 0.0 0.0 0.0 0.0
Right now I am using a for loop to check each row if 'flag' is set.
#----pseudo code----
#for each row in df (from bottom to top, excluding the very bottom row)
# if flag[row] is not set:
# PVsum_2[row] = PV[row] + PV[row + 1]
# Vsum_2[row] = volume[row] + volume[row + 1]
# VWMA_2[row] = PVsum_2[row] / Vsum_2[row]
# flag[row] = 1.0
#----pseudo code----
my_dict = {'dtime' : 0,
'time' : 1,
'price' : 2,
"volume" : 3,
'PV' : 4,
'check' : 5,
'PVsum_2': 6,
'Vsum_2' : 7,
'VWMA_2' : 8}
for row in reversed(range(len(df)-1)):
# if flag value is not set (i.e. flag == 0)
if not df['flag'][row]:
# sum of current and previous PV (price*volume) values
a = df['PV'][row] + df['PV'][row+1]
df.iloc[row, my_dict['PVsum_2']-1] = a
# sum of current and previous volumes
b = df['volume'][row] + df['volume'][row+1]
df.iloc[row, my_dict['Vsum_2']-1] = b
# PVsum_2 / Vsum_2
c = (a / b) if b != 0.0 else 0.0
df.iloc[row, my_dict['VWMA_2']-1] = c
# set check value to 1.0
df.iloc[row, my_dict['flag']-1] = 1.0
but this takes too long on large sets of data (500+ rows)
I'm looking for something faster and more elegant.
The dataframe should look like this when it is done (notice the bottom row has not been altered):
df =
time price volume PV flag PVsum_2 Vsum_2 VWMA_2
dtime
16:00 1800 100.1 6.0 600.60 1.0 1268.15 12.5 101.452000
15:00 1740 102.7 6.5 667.55 1.0 1253.45 11.9 105.331933
14:00 1680 108.5 5.4 585.90 1.0 1249.29 11.7 106.776923
13:00 1620 105.3 6.3 663.39 1.0 1348.83 12.7 106.207087
12:00 1560 107.1 6.4 685.44 1.0 1419.58 13.5 105.154074
11:00 1500 103.4 7.1 734.14 1.0 1416.20 13.8 102.623188
10:00 1440 101.8 6.7 682.06 1.0 1318.80 12.9 102.232558
09:00 1380 102.7 6.2 636.74 1.0 1215.86 11.9 102.173109
08:00 1320 101.6 5.7 579.12 1.0 698.88 6.9 101.286957
07:00 1260 99.8 1.2 119.76 1.0 360.24 3.6 100.066667
06:00 1200 100.2 2.4 240.48 1.0 621.51 6.3 98.652381
05:00 1140 97.7 3.9 381.03 1.0 897.39 9.1 98.614286
04:00 1080 99.3 5.2 516.36 1.0 1407.25 14.1 99.804965
03:00 1020 100.1 8.9 890.89 1.0 1628.89 16.1 101.173292
02:00 960 102.5 7.2 738.00 1.0 1413.35 13.7 103.164234
01:00 900 103.9 6.5 675.35 0.0 0.00 0.0 0.000000
Eventually new data will be added to the top of the data frame as seen below, and will need to be updated again.
df =
time price volume PV flag PVsum_2 Vsum_2 VWMA_2
dtime
19:00 1980 100.1 6.0 600.60 0.0 0.0 0.0 0.0
18:00 1920 102.7 6.5 667.55 0.0 0.0 0.0 0.0
17:00 1860 108.5 5.4 585.90 0.0 0.0 0.0 0.0
16:00 1800 100.1 6.0 600.60 1.0 1268.15 12.5 101.452000
15:00 1740 102.7 6.5 667.55 1.0 1253.45 11.9 105.331933
14:00 1680 108.5 5.4 585.90 1.0 1249.29 11.7 106.776923
13:00 1620 105.3 6.3 663.39 1.0 1348.83 12.7 106.207087
12:00 1560 107.1 6.4 685.44 1.0 1419.58 13.5 105.154074
11:00 1500 103.4 7.1 734.14 1.0 1416.20 13.8 102.623188
10:00 1440 101.8 6.7 682.06 1.0 1318.80 12.9 102.232558
09:00 1380 102.7 6.2 636.74 1.0 1215.86 11.9 102.173109
08:00 1320 101.6 5.7 579.12 1.0 698.88 6.9 101.286957
07:00 1260 99.8 1.2 119.76 1.0 360.24 3.6 100.066667
06:00 1200 100.2 2.4 240.48 1.0 621.51 6.3 98.652381
05:00 1140 97.7 3.9 381.03 1.0 897.39 9.1 98.614286
04:00 1080 99.3 5.2 516.36 1.0 1407.25 14.1 99.804965
03:00 1020 100.1 8.9 890.89 1.0 1628.89 16.1 101.173292
02:00 960 102.5 7.2 738.00 1.0 1413.35 13.7 103.164234
01:00 900 103.9 6.5 675.35 0.0 0.00 0.0 0.000000