Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.2k views
in Technique[技术] by (71.8m points)

dataframe - Counting Number of Times Each Row is Duplicated in R

In my dataset, I want to count the number of times each row appears in my dataset, which consists of five columns. I tried using table; however, this seems to only work with seeing how many times one column, not multiple, is duplicated since I get the error

attempt to make a table with >= 2^31 elements

As a quick example, say my dataframe is as follows:

dat <- data.frame(
SSN = c(204,401,204,666,401), 
Name=c("Blossum","Buttercup","Blossum","MojoJojo","Buttercup"), 
Age = c(7,8,7,43,8), 
Gender = c(0,0,0,1,0)
)

How do I add another column with how many times each row appears in this dataframe?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

With dplyr, we could group by all columns:

dat %>%
  group_by(across(everything())) %>%
  mutate(n = n())
# # A tibble: 5 x 5
# # Groups:   SSN, Name, Age, Gender [3]
#     SSN Name        Age Gender     n
#   <dbl> <chr>     <dbl>  <dbl> <int>
# 1   204 Blossum       7      0     2
# 2   401 Buttercup     8      0     2
# 3   204 Blossum       7      0     2
# 4   666 MojoJojo     43      1     1
# 5   401 Buttercup     8      0     2

(mutate(n = n()) is has a shortcut, add_tally(), if you prefer. Use summarize(n = n() or count() if you want to collapse the data frame to the unique rows while adding counts)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...