Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
308 views
in Technique[技术] by (71.8m points)

r - lapply with anonymous function call to svytable results in object 'x' not found

I have a survey data set that I'm creating contingency tables for. Each column in the data frame is a question and generally speaking, the questions tend to group together. So to make life easy, I've been using lapply to loop through sections and return the contingency tables with the following code:

> out <- lapply(dat[,162:170], function(x) round(prop.table(table(x,dat$seg_2),2),3)*100)
> out
$r3a_1
            
x               1    2
  Don't Know  1.9  1.4
  No         14.2  4.9
  Yes        83.9 93.7

$r3a_2
            
x               1    2
  Don't Know  2.7  1.7
  No         14.8  6.6
  Yes        82.4 91.6

etc...

As you can see, I'm looping through columns 162:170 and creating a prop table that shows the different responses between groups 1 and 2.

However, I'd like to weight this data. So I'm using the survey package to create a simple weighted survey design object called dat_weight and using svytable() instead of table(). I can run the updated code on a single column manually:

> round(prop.table(svytable(~dat[,162] + dat$seg_2, dat_weight),2),3)*100 
            dat$seg_2
dat[, 162]      1    2
  Don't Know  2.5  2.7
  No         16.5  5.4
  Yes        80.9 91.9

However, when I try to use lapply it doesn't work:

> out <- lapply(dat[,162:170], function(x) round(prop.table(svytable(~x + dat$seg_2, dat_weight),2),3)*100)

Error in eval(expr, envir, enclos) : object 'x' not found 

Clearly the anonymous function call and svytable aren't playing nicely together. I've tried creating a for loop which doesn't work either. I'm guessing this has something to do with scoping but I'm at a loss as to how to fix it.

Surely there has to be a way to loop through chunks of this survey and avoid having to create a unique line of code for each column. Any help would be greatly appreciated.

Edit to add some sample data:

> library("survey")
> dat <- structure(list(r3a_1 = structure(c(3L, 2L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Don't Know", 
"No", "Yes"), class = "factor"), r3a_2 = structure(c(3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L), .Label = c("Don't Know", "No", "Yes"), class = "factor"), 
    r3a_3 = structure(c(3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("Don't Know", 
    "No", "Yes"), class = "factor"), r3a_4 = structure(c(3L, 
    2L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 2L, 3L, 2L, 2L, 
    3L, 3L, 3L, 1L), .Label = c("Don't Know", "No", "Yes"), class = "factor"), 
    r3a_5 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L, 2L, 
    2L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 1L), .Label = c("Don't Know", 
    "No", "Yes"), class = "factor"), r3a_6 = structure(c(3L, 
    3L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 2L, 2L, 3L, 
    2L, 3L, 3L, 3L), .Label = c("Don't Know", "No", "Yes"), class = "factor"), 
    r3a_7 = structure(c(1L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 2L, 
    3L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("Don't Know", 
    "No", "Yes"), class = "factor"), r3a_8 = structure(c(3L, 
    2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 2L, 
    2L, 2L, 3L, 3L), .Label = c("Don't Know", "No", "Yes"), class = "factor"), 
    r3a_9 = structure(c(1L, 3L, 2L, 2L, 3L, 2L, 2L, 3L, 3L, 3L, 
    3L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 3L, 3L), .Label = c("Don't Know", 
    "No", "Yes"), class = "factor"), weight = c(0.34, 0.34, 0.34, 
    0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.43, 
    0.43, 0.43, 0.34, 0.34, 0.34, 0.34, 0.34), seg_2 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L), .Label = c("1", "2"), class = "factor")), .Names = c("r3a_1", 
"r3a_2", "r3a_3", "r3a_4", "r3a_5", "r3a_6", "r3a_7", "r3a_8", 
"r3a_9", "weight", "seg_2"), row.names = c(NA, 20L), class = "data.frame")

> dat_weight <- svydesign(ids = ~1, weights = ~weight, data = dat)

From there you can get the weighted and unweighted tables:

round(prop.table(table(dat[,1],dat$seg_2),2),3)*100  #unweighted

round(prop.table(svytable(~dat[,1] + dat$seg_2, dat_weight),2),3)*100   #weighted

However, this works:

lapply(dat[,1:9], function(x) round(prop.table(table(x,dat$seg_2),2),3)*100)

While this doesn't:

lapply(dat[,1:9], function(x) round(prop.table(svytable(~x + dat$seg_2, dat_weight),2),3)*100)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Ok, well, it seems the svytable function is picky and will only look up data in the design object. It doesn't seem to look for x in the enclosing environment. So an alternative approach is to dynamically build the formula. So instead of passing in the columns of data themselves, we pass in names of columns form the data.frame. Then we plug those into the formula and then they are resolved by the design object which points to the original data.frame. Here's a bit of working code using your sample data

lapply(names(dat)[1:9], function(x) round(prop.table(
    svytable(bquote(~.(as.name(x)) + seg_2), dat_weight),
2),3)*100)

So here we use bquote to build the formula. The .() allows us to plug in expressions and here we take the character value in x and convert it to a proper name object. Thus is goes from "r3a_9" to r3a_9.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...