Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
491 views
in Technique[技术] by (71.8m points)

"Adding missing grouping variables" message in dplyr in R

I have a portion of my script that was running fine before, but recently has been producing an odd statement after which many of my other functions do not work properly. I am trying to select the 8th and 23rd positions in a ranked list of values for each site to find the 25th and 75th percentile values for each day in a year for each site for 30 years. My approach was as follows (adapted for the four line dataset - slice(3) would be slice(23) for my full 30 year dataset usually):

library(“dplyr”)

mydata

structure(list(station_number = structure(c(1L, 1L, 1L, 1L), .Label = "01AD002", class = "factor"), 
year = 1981:1984, month = c(1L, 1L, 1L, 1L), day = c(1L, 
1L, 1L, 1L), value = c(113, 8.329999924, 15.60000038, 149
)), .Names = c("station_number", "year", "month", "day", "value"), class = "data.frame", row.names = c(NA, -4L))    

  value <- mydata$value
  qu25 <- mydata %>% 
          group_by(month, day, station_number) %>% 
          arrange(desc(value)) %>% 
          slice(3) %>% 
          select(value)

Before, I would be left with a table that had one value per site to describe the 25th percentile (since the arrange function seems to order them highest to lowest). However, now when I run these lines, I get a message:

Adding missing grouping variables: `month`, `day`, `station_number`

This message doesn’t make sense to me, as the grouping variables are clearly present in my table. Also, again, this was working fine until recently. I have tried:

  • detatch(“plyr”) – since I have it loaded before dplyr
  • dplyr:: group_by – placing this directly in the group_by line
  • uninstalling and re-intstalling dplyr, although this was for another issue I was having

Any idea why I might be receiving this message and why it may have stopped working?

Thanks for any help.

Update: Added dput example with one site, but values for January 1st for multiple years. The hope would be that the positional value is returned once grouped, for instance slice(3) would hopefully return the 15.6 value for this smaller subset.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For consistency sake the grouping variables should be always present when defined earlier and thus are added when select(value) is executed. ungroup should resolve it:

qu25 <- mydata %>% 
  group_by(month, day, station_number) %>%
  arrange(desc(value)) %>% 
  slice(2) %>% 
  ungroup() %>%
  select(value)

The requested result is without warnings:

> mydata %>% 
+   group_by(month, day, station_number) %>%
+   arrange(desc(value)) %>% 
+   slice(2) %>% 
+   ungroup() %>%
+   select(value)
# A tibble: 1 x 1
  value
  <dbl>
1   113

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...