Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
297 views
in Technique[技术] by (71.8m points)

Subset by Chi Sq Values from GAM results in R

I've run a for-loop in R that generates models for a binomial GAM for 200 different random data combinations (200 different set.seed values).

The for-loop and GAMs run just fine, and they store the models in the appropriate list, model[[i]], with each list element representing a model for a certain set.seed iteration.

I can run summary() on an individual list element (model[[5]], for example) and get something like this:

Approximate significance of smooth terms:
                      edf Ref.df Chi.sq  p-value    
s(Petal.Width)  1.133e+00      9  5.414 0.008701 ** 
s(Sepal.Length) 8.643e-01      9  2.338 0.067509 .  

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.361   Deviance explained = 32.7%
-REML = 83.084  Scale est. = 1         n = 160

Since I've got 200 of these elements in my model list, I was wondering if there's a quick way to go through and count how many times (out of the total 200) that Chi.sq value is equal to 0 for the Petal.Width variable. Basically, since I have the GAMs set up with bs = "cs", the number of times that Chi.sq is equal to 0 represents how often the variable-selection process removed that variable from the model.

Here's a cleaned-up version of the code I used for the for-loop to build the model:


a <- c(1:200)
model <- list()


for (i in a) {
  
#In my version there's some extra code here that subsets iris based on i 
  
  model[[i]] <- gam(Species~ s(Petal.Width, bs="cs") + 
                      s(Sepal.Length, bs="cs"),
                    data = iris,
                    family = binomial,
                    method = "REML")

}
question from:https://stackoverflow.com/questions/65926846/subset-by-chi-sq-values-from-gam-results-in-r

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I've found the tidy() function from the broom package to be helpful in this type of situation.
here's your model (I cut down to number of iterations to 10 to make it go faster,
also it was throwing warnings using mgcv::gam())

a <- c(1:10)
model <- list()

for (i in a) {
  model[[i]] <- gam(Species~ s(Petal.Width, bs="cs") + 
                      s(Sepal.Length, bs="cs"),
                    data = iris,
                    family = binomial,
                    method = "REML")
}

If you needed or wanted the list of all the models, then this second loop will extract the statistic from each model and combine them into a dataframe:

results_df <- data.frame(term=c("Petal.Width","Sepal.Length"))
for(m in model){
  results_df <- bind_cols(results_df, model=tidy(m)$statistic)
}
results_df

with output:

          term    model...2    model...3    model...4    model...5    model...6
1  Petal.Width 1.304372e-09 1.304372e-09 1.304372e-09 1.304372e-09 1.304372e-09
2 Sepal.Length 3.844738e-16 3.844738e-16 3.844738e-16 3.844738e-16 3.844738e-16
     model...7    model...8    model...9   model...10   model...11
1 1.304372e-09 1.304372e-09 1.304372e-09 1.304372e-09 1.304372e-09
2 3.844738e-16 3.844738e-16 3.844738e-16 3.844738e-16 3.844738e-16

you can rename the columns to make them nicer, etc.
also you could combine the modeling and the combining of the statistics if that's all you want to keep.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...