optimization - How can I rank observations in-group faster?

Question

Welcome To Ask or Share your Answers For Others

optimization - How can I rank observations in-group faster?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

optimization - How can I rank observations in-group faster?

I have a really simple problem, but I'm probably not thinking vector-y enough to solve it efficiently. I tried two different approaches and they've been looping on two different computers for a long time now. I wish I could say the competition made it more exciting, but ... bleh.

rank observations in group

I have long data (many rows per person, one row per person-observation) and I basically want a variable, that tells me how often the person has been observed already.

I have the first two columns and want the third one:

person  wave   obs
pers1   1999   1
pers1   2000   2
pers1   2003   3
pers2   1998   1
pers2   2001   2

Now I'm using two loop-approaches. Both are excruciatingly slow (150k rows). I'm sure I'm missing something, but my search queries didn't really help me yet (hard to phrase the problem).

Thanks for any pointers!

# ordered dataset by persnr and year of observation
person.obs <- person.obs[order(person.obs$PERSNR,person.obs$wave) , ]

person.obs$n.obs = 0

# first approach: loop through people and assign range
unp = unique(person.obs$PERSNR)
unplength = length(unp)
for(i in 1:unplength) {
   print(unp[i])
   person.obs[which(person.obs$PERSNR==unp[i]),]$n.obs = 
1:length(person.obs[which(person.obs$PERSNR==unp[i]),]$n.obs)
    i=i+1
   gc()
}

# second approach: loop through rows and reset counter at new person
pnr = 0
for(i in 1:length(person.obs[,2])) {
  if(pnr!=person.obs[i,]$PERSNR) { pnr = person.obs[i,]$PERSNR
  e = 0
  }
  e=e+1
  person.obs[i,]$n.obs = e
  i=i+1
  gc()
}

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T22:31:13+0000

The answer from Marek in this question has proven very useful in the past. I wrote it down and use it almost daily since it was fast and efficient. We'll use ave() and seq_along().

foo <-data.frame(person=c(rep("pers1",3),rep("pers2",2)),year=c(1999,2000,2003,1998,2011))

foo <- transform(foo, obs = ave(rep(NA, nrow(foo)), person, FUN = seq_along))
foo

  person year obs
1  pers1 1999   1
2  pers1 2000   2
3  pers1 2003   3
4  pers2 1998   1
5  pers2 2011   2

Another option using plyr

library(plyr)
ddply(foo, "person", transform, obs2 = seq_along(person))

  person year obs obs2
1  pers1 1999   1    1
2  pers1 2000   2    2
3  pers1 2003   3    3
4  pers2 1998   1    1
5  pers2 2011   2    2

Categories

optimization - How can I rank observations in-group faster?

optimization - How can I rank observations in-group faster?

rank observations in group

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags