Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
380 views
in Technique[技术] by (71.8m points)

r - purrr:map and glm - issues with call

This issue is related to Pipe '.' dot causes trouble in glm call.

purrr:map is wonderful for subgroup analysis and/or model comparison. However, when using glm, the call is messed up and causing issues, e.g. when computing pseudo-R2s. The reason is that update doesn't work with the ugly call, and thus pscl::pR2 cannot compute the log-likelihood of the base model.

pacman::p_load(tidyverse)

#sample data
pacman::p_load(ISLR)
mydata = ISLR::Default

#nest data, students and non-students
Default_nested = Default %>% group_by(student) %>% nest 

#fit glms
formul= default ~income+balance

glms = Default_nested %>% 
  mutate(model=map(data,glm,formula=formul,family='binomial')) 

#pscl::pR2 throwing error
pacman::p_load(pscl)
glms %>% mutate(pr2=map(model,pR2))

Now we can take a look at the first submodel. The call looks strange (formula=..1) even though formula contains the right formula.

> glms$model[[1]]$call
.f(formula = ..1, family = "binomial", data = .x[[i]])
> glms$model[[1]]$formula
default ~ income + balance
> glms$model[[1]]$data
# A tibble: 7,056 x 3
   default balance income
   <fct>     <dbl>  <dbl>
 1 No         730. 44362.

What is the cleanest way to be able to use pscl::pR2 when you have many (more than 2 in this example) glm objects in your tibble?

Edit:

Overview of solution strategies:

(A) "fix" the glm object, so that update can be applied to it:

glms %>% mutate(model = map(model,function(x){x$call = call2("glm",formula=x$formula,data=quote(Default),family='binomial');x})) %>%
  mutate(pr2=map(model,pR2)) %>% unnest(pr2)

This 'runs', however, the computed R2 is off. So this solution strategy is probably a dead-end.

(B) Write a wrapper for `glm, as proposed by Artem. This should work fine. Downside: the calls look ugly.

I expanded on Artem's proposed solution to create glm3.

glm3 <- function(formula,data,family) {  
  eval(rlang::expr( glm(!!rlang::enexpr(data),
                        formula=!!formula,
                        family=!!family ) ))}

glms3 <- Default_nested %>% mutate( model=map(data,glm3,formula=formul,family='binomial'),pr2=map(model,pR2) )
glms3 %>% unnest(pr2)

(C) In this particular case (pseudo R2s), simply write a better pseudo-r2 function. Since it's probably the only major statistic that doesn't work within purrr::map, this may actually make sense. I put together the psr2glm function.

psr2glm=function(glmobj){

  L.base=
    logLik(
      glm(formula = reformulate('1',gsub( " .*$", "", deparse(glmobj$formula) )),
          data=glmobj$data,
          family = glmobj$family))

  n=length(glmobj$residuals)

  L.full=logLik(glmobj)
  D.full <- -2 * L.full
  D.base <- -2 * L.base
  G2 <- -2 * (L.base - L.full)

  return(data.frame(McFadden = 1-L.full/L.base, 
                    CoxSnell = 1 - exp(-G2/n),
                    Nagelkerke = (1 - exp((D.full - D.base)/n))/(1 - exp(-D.base/n))))

}

It works:

glms = Default_nested %>% 
  mutate(model=map(data,glm,formula=formul,family='binomial')) 
glms %>% mutate(pr2=map(model,psr2glm)) %>% unnest(pr2)

I consider proposing changes to DescTools:::PseudoR2, however, I first need to check if the solution is general.

The key to this idea is to skip update and instead directly call glm. All required information pieces are within the glm object, even within purrr::map. Nice side effect of using psr2glm: unnest's output looks nice.

(D) Change either glm or update. Given that the glm object actually contains all necessary information, one could consider the observed behavior a bug. So it should be fixed in base R.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

One way is to define a wrapper for glm() that puts data directly inside the call by manually constructing the expression and then evaluating it:

glm2 <- function(.df, ...) {
  eval(rlang::expr(glm(!!rlang::enexpr(.df),!!!list(...)))) }

glms <- Default_nested %>%
    mutate( model = map(data,glm2,formula=formul,family="binomial"),
            pr2   = map(model,pscl::pR2) )
# # A tibble: 2 x 4
#   student data                 model  pr2      
#   <fct>   <list>               <list> <list>   
# 1 No      <tibble [7,056 × 3]> <glm>  <dbl [6]>
# 2 Yes     <tibble [2,944 × 3]> <glm>  <dbl [6]>

Validation:

## Perform the computation by hand and ensure that it's identical to glms$pr2
glm(Default_nested$data[[1]], formula=default~income+balance, family="binomial") %>%
  pscl::pR2() %>% identical( glms$pr2[[1]] )     # TRUE
glm(Default_nested$data[[2]], formula=default~income+balance, family="binomial") %>%
  pscl::pR2() %>% identical( glms$pr2[[2]] )     # TRUE

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...