How to make this function code run quick in R - should be around 1 second and its around 19 now

Question

Welcome To Ask or Share your Answers For Others

How to make this function code run quick in R - should be around 1 second and its around 19 now

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

How to make this function code run quick in R - should be around 1 second and its around 19 now

#BEGIN CODE
my.kernel <- function(Yt){
  for (i in 1:length(Yt)) {
    Yt[i] <- ifelse(abs(Yt[i]) <= 1, (35/32)*(1 - Yt[i]^2)^3, 0)}
  Yt}

# Print results

my.kernel.density.estimator <- function(y,Yt,h){
  result <- 0
  for(i in 1:length(Yt)){
    result <- result + (1/(length(Yt)*h))*my.kernel((Yt[i]-y)/h)}
  result}

# Print results

my.loglik.cv <- function(Yt,h){
  result <- 0
  for(i in 1:length(Yt)){
    result <- result + log(my.kernel.density.estimator(Yt[i],Yt[-i],h))}
  result}

 # Print the results
 # END CODE

Yt, h and y can be any vector/number. Here is one example.

Yt<- seq(0, 10, 0.01)
h <- 1
y<- 1

The main point is to understand how to make it run faster.

question from:https://stackoverflow.com/questions/65869207/how-to-make-this-function-code-run-quick-in-r-should-be-around-1-second-and-it

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:24:26+0000

In R mathematical operations are vectorized. In other words you do not need to apply the same mathematical operation on each vector element separately, you can perform it on all vector elements simultaneously.

The function

my.kernel <- function(Yt){
  for (i in 1:length(Yt)) {
    Yt[i] <- ifelse(abs(Yt[i]) <= 1, (35/32)*(1 - Yt[i]^2)^3, 0)}
  Yt}

can be rewritten as

my.kernel.vec <- function(x) ifelse(abs(x) <= 1, (35/32)*(1 - x^2)^3, 0)

Yt <- seq(0, 10, 0.01)
h <- 1
y <- 1

all.equal(my.kernel(Yt),
          my.kernel.vec(Yt))
#output
TRUE

the difference in speed is not minor:

library(microbenchmark)


microbenchmark(my.kernel(Yt),
               my.kernel.vec(Yt))

Unit: microseconds
              expr    min     lq     mean  median     uq    max neval cld
     my.kernel(Yt) 1110.8 1179.2 1438.136 1311.35 1708.9 6756.4   100   b
 my.kernel.vec(Yt)   54.3   66.3  104.204   70.20   74.3 3495.4   100  a

That is quite of a speed up.

Similarly

my.kernel.density.estimator <- function(y,Yt,h){
  result <- 0
  for(i in 1:length(Yt)){
    result <- result + (1/(length(Yt)*h))*my.kernel((Yt[i]-y)/h)}
  result}

can be changed to utilize R vectorized operations

my.kernel.density.estimator.vec <- function(y,Yt,h) sum((1/(length(Yt)*h))*my.kernel.vec((Yt-y)/h))


all.equal(my.kernel.density.estimator.vec(1, Yt, 1),
          my.kernel.density.estimator(1, Yt, 1))
#output
TRUE

microbenchmark(my.kernel.density.estimator.vec(1, Yt, 1),
               my.kernel.density.estimator(1, Yt, 1))

Unit: microseconds
                                      expr    min     lq     mean  median      uq    max neval cld
 my.kernel.density.estimator.vec(1, Yt, 1)   57.8   59.6  101.918   63.10   70.25 3716.4   100  a 
     my.kernel.density.estimator(1, Yt, 1) 2110.8 2163.6 2285.316 2231.35 2283.20 7826.7   100   b

Finally in

my.loglik.cv <- function(Yt,h){
  result <- 0
  for(i in 1:length(Yt)){
    result <- result + log(my.kernel.density.estimator(Yt[i],Yt[-i],h))}
  result}

you need to loop in order to create vectors Yt[i] and Yt[-i] so I left it as is.

microbenchmark(my.loglik.cv.vec(Yt, 1),
               my.loglik.cv(Yt, 1), times = 10)

Unit: milliseconds
                    expr       min        lq       mean     median        uq       max neval cld
 my.loglik.cv.vec(Yt, 1)   59.1957   59.6794   79.13856   90.46365   92.7877   93.4487    10  a 
     my.loglik.cv(Yt, 1) 2240.7176 2280.7737 2309.83982 2299.39885 2343.6714 2412.8111    10   b

Not to mention the speedup on larger vectors:

Yt <- seq(0, 10, 0.001)

microbenchmark(my.loglik.cv.vec(Yt, 1),
               my.loglik.cv(Yt, 1), times = 1)
Unit: seconds
                    expr        min         lq       mean     median         uq        max neval
 my.loglik.cv.vec(Yt, 1)   5.460431   5.460431   5.460431   5.460431   5.460431   5.460431     1
     my.loglik.cv(Yt, 1) 230.221194 230.221194 230.221194 230.221194 230.221194 230.221194     1

Categories

How to make this function code run quick in R - should be around 1 second and its around 19 now

How to make this function code run quick in R - should be around 1 second and its around 19 now

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags