Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
575 views
in Technique[技术] by (71.8m points)

r - Cumulative number of unique values in a column up to current row

I have a data frame, donorInfo, with donor information:

id        giftdate     giftamt
002       2001-01-05     25.00
033       2001-05-08     50.00
054       2001-09-22    125.00
125       2001-11-05     40.00
042       2001-12-04     75.00
...           ...         ...

I'd like to create a column that shows the cumulative number of unique donor id's up to that date. I think it's something like:

donorInfo$numUnique <- apply/lapply (donorInfo, 1, FUN=nrow(unique(donorInfo$id)))

unfortunately this isn't working and I'm wondering how to remedy things. Thanks for any suggestions.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can do this with duplicated() and cumsum() (taking advantage of the fact that Boolean-valued logical vectors can be coerced to numeric vectors):

# Example data.frame with some duplicated ids
df <- read.table(text="
id   giftdate giftamt
 2 2001-01-05      25
33 2001-05-08      50
 2 2001-09-22     125
33 2001-11-05      40
42 2001-12-04      75", header=T)

cumsum(!duplicated(df$id))
# [1] 1 2 2 2 3

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...