Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
855 views
in Technique[技术] by (71.8m points)

r - Kruskal-Wallis test: create lapply function to subset data.frame?

I have a data set of values (val) grouped by multiple categories (distance & phase). I would like to test each category by Kruskal-Wallis test, where val is dependent variable, distance is a factor, and phase split my data in 3 groups.

As such, I need to specify the subset of the data within Kruskal-Wallis test and then apply the test to each of groups. BUT, I can not get my subsetting to work!

In R help, it is specified that the subset is an optional vector specifying a subset of observations to be used. But how to correctly put this to my lapply function?

My dummy data:

# create data
val<-runif(60, min = 0, max = 100)
distance<-floor(runif(60, min=1, max=3))
phase<-rep(c("a", "b", "c"), 20)

df<-data.frame(val, distance, phase)

# get unique groups
ii<-unique(df$phase)

# get basic statistics per group
aggregate(val ~ distance + phase, df, mean)

# run Kruskal test, specify the subset
kruskal.test(df$val ~df$distance,
             subset = phase == "c")

This works well, so my subset should be correctly set as a vector. But how to use this in a lapply function?

# DOES not work!!
lapply(ii, kruskal.test(df$val ~ df$distance,
                        subset = df$phase == as.character(ii))) 

My overall goal is to create a function from kruskal.test, and save all statistics for each group into one table.

All help is highly appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Usually you would start by splitting, and then lapplying.

Something like

lapply(split(df, df$phase), function(d) { kruskal.test(val ~ distance, data=d) })

would yield a list, indexed by the phase, of the results of kruskal.test.

Your final expression does not work because lapply expects a function, and applying kruskal.test does not result in a function, it results in the result of running that test. If you surround it with a function definition with the index, then it would work, just be a little less idiomatic.

lapply(ii, function(i) { kruskal.test(df$val ~ df$distance, subset=df$phase==i )})

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...