vectorization - Efficient computation of a loop of integrals in Python

Question

Welcome To Ask or Share your Answers For Others

vectorization - Efficient computation of a loop of integrals in Python

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

vectorization - Efficient computation of a loop of integrals in Python

I was wondering how to speed up the following code in where I compute a probability function which involves nummerical integrals and then I compute some confidence margins.

Some possibilities that I have thought about are Numba or vectorization of the code

EDIT: I have made minor modifications because there was a mistake. I am looking for some modifications that provide major time improvements (I know that there are some minor changes that would provide some minor time improvements, such as repeated functions, but I am not concerned about them) The code is:

# -*- coding: utf-8 -*-
"""
Created on Tue Jan 26 17:05:46 2021

@author: Ignacio
"""

import numpy as np
from scipy.integrate import simps

def pdf(V,alfa_points):
    alfa=np.linspace(0,2*np.pi,alfa_points)
    return simps(1/np.sqrt(2*np.pi)/np.sqrt(sigma_R2)*np.exp(-(V*np.cos(alfa)-eR)**2/2/sigma_R2)*1/np.sqrt(2*np.pi)/np.sqrt(sigma_I2)*np.exp(-(V*np.sin(alfa)-eI)**2/2/sigma_I2),alfa)
    
def find_nearest(array,value):
    array=np.asarray(array)
    idx = (np.abs(array-value)).argmin()
    return array[idx]

N = 20
n=np.linspace(0,N-1,N)
d=1
sigma_An=0.1
sigma_Pn=0.2

An=np.ones(N)
Pn=np.zeros(N)

Vs=np.linspace(0,30,1000)
inc=np.max(Vs)/len(Vs)
th=np.linspace(0,np.pi/2,250)
R=np.sum(An*np.cos(Pn+2*np.pi*np.sin(th[:,np.newaxis])*n*d),axis=1)
I=np.sum(An*np.sin(Pn+2*np.pi*np.sin(th[:,np.newaxis])*n*d),axis=1)

fmin=np.zeros(len(th))
fmax=np.zeros(len(th))
for tt in range(len(th)):
    eR=np.exp(-sigma_Pn**2/2)*np.sum(An*np.cos(Pn+2*np.pi*np.sin(th[tt])*n*d))
    eI=np.exp(-sigma_Pn**2/2)*np.sum(An*np.sin(Pn+2*np.pi*np.sin(th[tt])*n*d)) 
    
    sigma_R2=1/2*np.sum(An*sigma_An**2)+1/2*(1-np.exp(-sigma_Pn**2))*np.sum(An**2)+1/2*np.sum(np.cos(2*(Pn+2*np.pi*np.sin(th[tt])*n*d))*((An**2+sigma_An**2)*np.exp(-2*sigma_Pn**2)-An**2*np.exp(-sigma_Pn**2)))
    sigma_I2=1/2*np.sum(An*sigma_An**2)+1/2*(1-np.exp(-sigma_Pn**2))*np.sum(An**2)-1/2*np.sum(np.cos(2*(Pn+2*np.pi*np.sin(th[tt])*n*d))*((An**2+sigma_An**2)*np.exp(-2*sigma_Pn**2)-An**2*np.exp(-sigma_Pn**2))) 
    
    PDF=np.zeros(len(Vs))
    for vv in range(len(Vs)):
        PDF[vv]=pdf(Vs[vv],100)
    total=simps(PDF,Vs)
    values=np.cumsum(PDF)*inc/total
    xval_05=find_nearest(values,0.05)
    fmin[tt]=Vs[values==xval_05]
    xval_95=find_nearest(values,0.95)
    fmax[tt]=Vs[values==xval_95]

question from:https://stackoverflow.com/questions/65903777/efficient-computation-of-a-loop-of-integrals-in-python

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T18:54:42+0000

This version's speedup: 31x

A simple profiling (%%prun) reveals that most of the time is spent in simps.

You are in control of the integration done in pdf(): for example, you can use the trapeze method instead of Simpson with negligible numerical difference if you increase a bit the resolution of alpha. In fact, the higher resolution obtained by a higher sampling of alpha more than makes up for the difference between simps and trapeze (see picture at the bottom as for why). This is by far the highest speedup. We go one bit further by implementing the trapeze method ourselves instead of using scipy, since it is so simple. This alone yields marginal gain, but opens the door for a more drastic optimization (below, about pdf2D.

Also, the remaining simps(PDF, ...) goes faster when it knows that the dx step is constant, so we can just say so instead of passing the whole alpha array.

You can avoid doing the loop to compute PDF and use np.vectorize(pdf) directly on Vs, or better (as in the code below), do a 2-D version of that calculation.

There are some other minor things (such as using an index directly fmin[tt] = Vs[closest(values, 0.05)] instead of finding the index, returning the value, and then using a boolean mask for where values == xval_05), or taking all the constants (including alpha) outside functions and avoid recalculating every time.

This above gives us a 5.2x improvement. There is a number of things I don't understand in your code, e.g. why having An (ones) and Pn (zeros)?

But, importantly, another ~6x speedup comes from the observation that, since we are implementing our own trapeze method by using numpy primitives, we can actually do it in 2D in one go for the whole PDF.

The final speed up of the code below is 31x. I believe that a better understanding of "the big picture" of what you want to do would yield additional, perhaps substantial, speed gains.

Modified code:

import numpy as np
from scipy.integrate import simps


alpha_points = 200  # more points as we'll use trapeze not simps
alpha = np.linspace(0, 2*np.pi, alpha_points)
cosalpha = np.cos(alpha)
sinalpha = np.sin(alpha)
d_alpha = np.mean(np.diff(alpha))  # constant dx
coeff = 1 / np.sqrt(2*np.pi)
Vs=np.linspace(0,30,1000)
d_Vs = np.mean(np.diff(Vs))  # constant dx
inc=np.max(Vs)/len(Vs)

def f2D(Vs, eR, sigma_R2, eI, sigma_I2):
    a = coeff / np.sqrt(sigma_R2)
    b = coeff / np.sqrt(sigma_I2)
    y = a * np.exp(-(np.outer(cosalpha, Vs) - eR)**2 / 2 / sigma_R2) * b * np.exp(-(np.outer(sinalpha, Vs) - eI)**2 / 2 / sigma_I2)
    return y

def pdf2D(Vs, eR, sigma_R2, eI, sigma_I2):
    y = f2D(Vs, eR, sigma_R2, eI, sigma_I2)
    s = y.sum(axis=0) - (y[0] + y[-1]) / 2  # our own impl of trapeze, on 2D y
    return s * d_alpha

def closest(a, val):
    return np.abs(a - val).argmin()

N = 20
n = np.linspace(0,N-1,N)
d = 1
sigma_An = 0.1
sigma_Pn = 0.2

An=np.ones(N)
Pn=np.zeros(N)

th = np.linspace(0,np.pi/2,250)
R = np.sum(An*np.cos(Pn+2*np.pi*np.sin(th[:,np.newaxis])*n*d),axis=1)
I = np.sum(An*np.sin(Pn+2*np.pi*np.sin(th[:,np.newaxis])*n*d),axis=1)

fmin=np.zeros(len(th))
fmax=np.zeros(len(th))
for tt in range(len(th)):
    eR=np.exp(-sigma_Pn**2/2)*np.sum(An*np.cos(Pn+2*np.pi*np.sin(th[tt])*n*d))
    eI=np.exp(-sigma_Pn**2/2)*np.sum(An*np.sin(Pn+2*np.pi*np.sin(th[tt])*n*d)) 
    
    sigma_R2=1/2*np.sum(An*sigma_An**2)+1/2*(1-np.exp(-sigma_Pn**2))*np.sum(An**2)+1/2*np.sum(np.cos(2*(Pn+2*np.pi*np.sin(th[tt])*n*d))*((An**2+sigma_An**2)*np.exp(-2*sigma_Pn**2)-An**2*np.exp(-sigma_Pn**2)))
    sigma_I2=1/2*np.sum(An*sigma_An**2)+1/2*(1-np.exp(-sigma_Pn**2))*np.sum(An**2)-1/2*np.sum(np.cos(2*(Pn+2*np.pi*np.sin(th[tt])*n*d))*((An**2+sigma_An**2)*np.exp(-2*sigma_Pn**2)-An**2*np.exp(-sigma_Pn**2))) 
    
    PDF=pdf2D(Vs, eR, sigma_R2, eI, sigma_I2)
    total = simps(PDF, dx=d_Vs)
    values = np.cumsum(PDF) * inc / total
    fmin[tt] = Vs[closest(values, 0.05)]
    fmax[tt] = Vs[closest(values, 0.95)]

Note: most of the fmin and fmax are np.allclose() compared with the original function, but some of them have a small error: after some digging, it turns out that the implementation here is more precise as that function f() can be pretty abrupt, and more alpha points actually help (and more than compensate the minuscule lack of precision due to using trapeze instead of Simpson).

For example, at index tt=244, vv=400:

Categories

vectorization - Efficient computation of a loop of integrals in Python

vectorization - Efficient computation of a loop of integrals in Python

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags