Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
224 views
in Technique[技术] by (71.8m points)

r - Extend contigency table with proportions (percentages)

I have a contingency table of counts, and I want to extend it with corresponding proportions of each group.

Some sample data (tips data set from ggplot2 package):

library(ggplot2)

head(tips, 3)
#   total_bill tip    sex smoker day   time size
# 1         17 1.0 Female     No Sun Dinner    2
# 2         10 1.7   Male     No Sun Dinner    3
# 3         21 3.5   Male     No Sun Dinner    3

First, use table to count smoker vs non-smoker, and nrow to count total number of subjects:

table(tips$smoker)
#  No Yes 
# 151  93 

nrow(tips)
# [1] 244

Then, I want to calculate percentage of smokers vs. non smokers. Something like this (ugly code):

# percentage of smokers
options(digits = 2)

transform(as.data.frame(table(tips$smoker)), percentage_column = Freq / nrow(tips) * 100)
#   Var1 Freq percentage_column
# 1   No  151                62
# 2  Yes   93                38

Is there a better way to do this?

(even better it would be to do this on a set of columns (which I enumerate) and have output somewhat nicely formatted) (e.g., smoker, day, and time)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If it's conciseness you're after, you might like:

prop.table(table(tips$smoker))

and then scale by 100 and round if you like. Or more like your exact output:

tbl <- table(tips$smoker)
cbind(tbl,prop.table(tbl))

If you wanted to do this for multiple columns, there are lots of different directions you could go depending on what your tastes tell you is clean looking output, but here's one option:

tblFun <- function(x){
    tbl <- table(x)
    res <- cbind(tbl,round(prop.table(tbl)*100,2))
    colnames(res) <- c('Count','Percentage')
    res
}

do.call(rbind,lapply(tips[3:6],tblFun))
       Count Percentage
Female    87      35.66
Male     157      64.34
No       151      61.89
Yes       93      38.11
Fri       19       7.79
Sat       87      35.66
Sun       76      31.15
Thur      62      25.41
Dinner   176      72.13
Lunch     68      27.87

If you don't like stack the different tables on top of each other, you can ditch the do.call and leave them in a list.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...