Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
284 views
in Technique[技术] by (71.8m points)

r - Ordering Permutation in Rcpp i.e. base::order()

I have a ton of code using the base::order() command and I am really too lazy to code around that in rcpp. Since Rcpp only supports sort, but not order, I spent 2 minutes creating this function:

// [[Rcpp::export]]
Rcpp::NumericVector order_cpp(Rcpp::NumericVector invec){
  int leng = invec.size();
  NumericVector y = clone(invec);
  for(int i=0; i<leng; ++i){
    y[sum(invec<invec[i])] = i+1;
  }
  return(y);
}

It somehow works. If the vectors are containing unique numbers, I get the same result as order(). If they are not unique, results are different, but not wrong (no unique solution really).

Using it:

c=sample(1:1000,500)
all.equal(order(c),order_cpp(c))
microbenchmark(order(c),order_cpp(c))

Unit: microseconds
         expr      min       lq   median       uq      max neval
     order(c)   33.507   36.223   38.035   41.356   78.785   100
 order_cpp(c) 2372.889 2427.071 2466.312 2501.932 2746.586   100

Ouch! I need an efficient algorithm. Ok, so I dug up a bubblesort implementation and adapted it:

 // [[Rcpp::export]]
Rcpp::NumericVector bubble_order_cpp2(Rcpp::NumericVector vec){                                  
       double tmp = 0;
       int n = vec.size();
              Rcpp::NumericVector outvec = clone(vec);
       for (int i = 0; i <n; ++i){
         outvec[i]=static_cast<double>(i)+1.0;
       }

       int no_swaps;
       int passes;
       passes = 0;
       while(true) {
           no_swaps = 0;
           for (int i = 0; i < n - 1 - passes; ++i) {
               if(vec[i] > vec[i+1]) {
                   no_swaps++;
                   tmp = vec[i];
                   vec[i] = vec[i+1];
                   vec[i+1] = tmp;
                   tmp = outvec[i]; 
                   outvec[i] = outvec[i+1];  
                   outvec[i+1] = tmp; 
               };
           };
           if(no_swaps == 0) break;
           passes++;
       };       
       return(outvec);
}

Well, it's better - but not great:

microbenchmark(order(c),order_cpp(c),bubble_order_cpp2(c),sort(c),c[order(c)])
Unit: microseconds
                 expr      min       lq    median        uq      max neval
             order(c)   33.809   38.034   40.1475   43.3170   72.144   100
         order_cpp(c) 2339.080 2435.675 2478.5385 2526.8350 3535.637   100
 bubble_order_cpp2(c)  219.752  231.977  234.5430  241.1840  322.383   100
              sort(c)   59.467   64.749   68.2205   75.4645  148.815   100
          c[order(c)]   38.336   41.204   44.3735   48.1460   93.878   100

Another finding: It's faster to order than to sort.

Well, then for shorter vectors:

c=sample(1:100)
 microbenchmark(order(c),order_cpp(c),bubble_order_cpp2(c),sort(c),c[order(c)])
Unit: microseconds
                 expr    min       lq   median       uq     max neval
             order(c) 10.566  11.4710  12.8300  14.1880  63.089   100
         order_cpp(c) 95.689 100.8200 102.7825 107.3105 198.018   100
 bubble_order_cpp2(c)  9.962  11.1700  12.0750  13.2830  64.598   100
              sort(c) 39.242  41.5065  42.5620  46.3355 155.758   100
          c[order(c)] 11.773  12.6790  13.5840  15.9990  82.710   100

Oh well, I have overlooked an RcppArmadillo function:

// [[Rcpp::export]]
Rcpp::NumericVector ordera(arma::vec x) {
  return(Rcpp::as<Rcpp::NumericVector>(Rcpp::wrap(arma::sort_index( x )+1)) );
}

microbenchmark(order(c),order_(c),ordera(c))
Unit: microseconds
      expr   min     lq median     uq    max neval
  order(c) 9.660 11.169 11.773 12.377 46.185   100
 order_(c) 4.529  5.133  5.736  6.038 34.413   100
 ordera(c) 4.227  4.830  5.434  6.038 60.976   100
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's a simple version leveraging Rcpp sugar to implement an order function. We put in a check for duplicates so that we guarantee that things work 'as expected'. (There is also a bug with Rcpp's sort method when there are NAs, so that may want to be checked as well -- this will be fixed by the next release).

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
IntegerVector order_(NumericVector x) {
  if (is_true(any(duplicated(x)))) {
    Rf_warning("There are duplicates in 'x'; order not guaranteed to match that of R's base::order");
  }
  NumericVector sorted = clone(x).sort();
  return match(sorted, x);
}

/*** R
library(microbenchmark)
x <- runif(1E5)
identical( order(x), order_(x) )
microbenchmark(
  order(x),
  order_(x)
)
*/

gives me

> Rcpp::sourceCpp('~/test-order.cpp')

> set.seed(456)

> library(microbenchmark)

> x <- runif(1E5)

> identical( order(x), order_(x) )
[1] TRUE

> microbenchmark(
+   order(x),
+   order_(x)
+ )
Unit: milliseconds
      expr      min       lq   median       uq      max neval
  order(x) 15.48007 15.69709 15.86823 16.21142 17.22293   100
 order_(x) 10.81169 11.07167 11.40678 11.87135 48.66372   100
> 

Of course, if you're comfortable with the output not matching R, you can remove the duplicated check -- x[order_(x)] will still be properly sorted; more specifically, all(x[order(x)] == x[order_(x)]) should return TRUE.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...