"Vectorization" in R, is a vector processing in R's interpreter's view. Take the function cumsum
as an example. On entry, R interpreter sees that a vector x
is passed into this function. However, the work is then passed to C language that R interpreter can not analyze / track. While C is doing work, R is just waiting. By the time that R's interpreter comes back to work, a vector has been processed. So in R's view, it has issued a single instruction but processed a vector. This is an analogy to the concept of SIMD - "single instruction, multiple data".
Not just the cumsum
function that takes a vector and returns a vector is seen as "vectorization" in R, functions like sum
that takes a vector and returns a scalar is also a "vectorization".
Simply put: whenever R calls some compiled code for a loop, it is a "vectorization". If you wonder why this kind of "vectorization" is useful, it is because a loop written by a compiled language is faster than a loop written in an interpreted language. The C loop is translated to machine language that a CPU can understand. However, if a CPU wants to execute an R loop, it needs R's interpreter's help to read it, iteration by iteration. This is like, if you know Chinese (the hardest human language), you can respond to someone speaking Chinese to you faster; otherwise, you need a translator to first translator Chinese to you sentence after sentence in English, then you respond in English, and the translator make it back to Chinese sentence by sentence. The effectiveness of communication is largely reduced.
x <- runif(1e+7)
## R loop
system.time({
sumx <- 0
for (x0 in x) sumx <- sumx + x0
sumx
})
# user system elapsed
# 1.388 0.000 1.347
## C loop
system.time(sum(x))
# user system elapsed
# 0.032 0.000 0.030
Be aware that "vectorization" in R is just an analogy to SIMD but not a real one. A real SIMD uses CPU's vector registers for computations hence is a true parallel computing via data parallelism. R is not a language where you can program CPU registers; you have to write compiled code or assembly code for that purpose.
R's "vectorization" does not care how a loop written in a compiled language is really executed; after all that is beyond R's interpreter's knowledge. Regarding whether these compiled code will be executed with SIMD, read Does R leverage SIMD when doing vectorized calculations?
More on "vectorization" in R
I am not a Julia user, but Bogumi? Kamiński has demonstrated an impressive feature of that language: loop fusion. Julia can do this, because, as he points out, "vectorization in Julia is implemented in Julia", not outside the language.
This reveals a downside of R's vectorization: speed often comes at a price of memory usage. I am not saying that Julia won't have this problem (as I don't use it, I don't know), but this is definitely true for R.
Here is an example: Fastest way to compute row-wise dot products between two skinny tall matrices in R. rowSums(A * B)
is a "vectorization" in R, as both "*"
and rowSums
are coded in C language as a loop. However, R can not fuse them into a single C loop to avoid generating the temporary matrix C = A * B
into RAM.
Another example is R's recycling rule or any computations relying on such rule. For example, when you add a scalar a
to a matrix A
by A + a
, what really happens is that a
is first replicated to be a matrix B
that has the same dimension with A
, i.e., B <- matrix(a, nrow(A), ncol(A))
, then an addition between two matrices are calculated: A + B
. Clearly the generation of the temporary matrix B
is undesired, but sorry, you can't do it better unless you write your own C function for A + a
and call it in R. This is described as "such a fusion is possible only if explicitly implemented" in Bogumi? Kamiński's answer.
To deal with the memory effects of many temporary results, R has a sophisticated mechanism called "garbage collection". It helps, but memory can still explode if you generate some really big temporary result somewhere in your code. A good example is the function outer
. I have written many answers using this function, but it is particularly memory-unfriendly.
I might have been off-topic in this edit, as I begin to discuss the side effect of "vectorization". Use it with care.
- Put memory usage in mind; there might be a more memory efficient vectorized implementation. For example, as mentioned in the linked thread on row-wise dot products between two matrices,
c(crossprod(x, y))
is better than sum(x * y)
.
- Be prepared to use CRAN R packages that have compiled code. If you find existing vectorized functions in R limited to do your task, explore CRAN for possible R packages that can do it. You can ask a question with your coding bottleneck on Stack Overflow, and somebody may point you to the right function in the right package.
- Be happy to write your own compiled code.