dataframe - Make contingency table from rows in R

Question

Welcome To Ask or Share your Answers For Others

dataframe - Make contingency table from rows in R

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

dataframe - Make contingency table from rows in R

I have a covid data frame with 376 columns, 7 rows with covid infection numbers of 376 different days in 7 countries. I've matched them different severity categories and now I'm trying to make a contingency table containing the severity categories as columns and countries as rows. I've written a function and it works but I'm still wondering if there is a more elegant solution maybe including such a thing as a table() function with an aggregation for each row.

My code:

  severity <- function(x,countries){
    sev = c("Leicht","Mittel","Schwer")
    res=matrix(ncol=3,nrow=7)
    colnames(res) = sev
    rownames(res) = countries
    for (i in 1:nrow(x)){
      for (s in 1:length(sev)){
        res[i,s]=length(x2[i,x2[i,]==sev[s]])
      }
    }
    return(res)
}
r = severity(x2,covid_world2[,1]) #covid_world2 countains the countrynames, x2 the data with the categories
x = rbind(r,"Z" = colSums(r))
ctable=cbind(x,"S" = rowSums(x))

This is just an example of the first two rows in x2 (namely, representing the countries Canada,Germany)

dput(head(covid_world2[, 1:20]))

output is:

structure(list(Country = c("Canada", "France", "Germany", "Italy", 
"Japan", "United Kingdom"), X1_22_20 = c(0, 0, 0, 0, 1.58132191886678e-08, 
0), X1_23_20 = c(0, 0, 0, 0, 1.58132191886678e-08, 0), X1_24_20 = c(0, 
3.06403006266968e-08, 0, 0, 1.58132191886678e-08, 0), X1_25_20 = c(0, 
4.59604509400452e-08, 0, 0, 1.58132191886678e-08, 0), X1_26_20 = c(2.6495573093152e-08, 
4.59604509400452e-08, 0, 0, 3.16264383773357e-08, 0), X1_27_20 = c(2.6495573093152e-08, 
4.59604509400452e-08, 1.19354613321966e-08, 0, 3.16264383773357e-08, 
0), X1_28_20 = c(5.2991146186304e-08, 6.12806012533936e-08, 4.77418453287863e-08, 
0, 5.53462671603374e-08, 0), X1_29_20 = c(5.2991146186304e-08, 
7.6600751566742e-08, 4.77418453287863e-08, 0, 5.53462671603374e-08, 
0), X1_30_20 = c(5.2991146186304e-08, 7.6600751566742e-08, 4.77418453287863e-08, 
0, 8.69727055376731e-08, 0), X1_31_20 = c(1.05982292372608e-07, 
7.6600751566742e-08, 5.96773066609828e-08, 3.3078723093808e-08, 
1.18599143915009e-07, 2.94611506927399e-08), X02_01_2020 = c(1.05982292372608e-07, 
9.19209018800904e-08, 9.54836906575725e-08, 3.3078723093808e-08, 
1.58132191886678e-07, 2.94611506927399e-08), X02_02_2020 = c(1.05982292372608e-07, 
9.19209018800904e-08, 1.19354613321966e-07, 3.3078723093808e-08, 
1.58132191886678e-07, 2.94611506927399e-08), X02_03_2020 = c(1.05982292372608e-07, 
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08, 
1.58132191886678e-07, 1.1784460277096e-07), X02_04_2020 = c(1.05982292372608e-07, 
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08, 
1.73945411075346e-07, 1.1784460277096e-07), X02_05_2020 = c(1.3247786546576e-07, 
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08, 
1.8185202066968e-07, 1.3257517811733e-07), X02_06_2020 = c(1.3247786546576e-07, 
9.19209018800904e-08, 1.43225535986359e-07, 3.3078723093808e-08, 
1.8185202066968e-07, 1.3257517811733e-07), X02_07_2020 = c(1.85469011652064e-07, 
9.19209018800904e-08, 1.55160997318555e-07, 4.9618084640712e-08, 
1.8185202066968e-07, 1.3257517811733e-07), X02_08_2020 = c(1.85469011652064e-07, 
1.68521653446832e-07, 1.55160997318555e-07, 4.9618084640712e-08, 
1.89758630264014e-07, 1.9149747950281e-07), X02_09_2020 = c(1.85469011652064e-07, 
1.68521653446832e-07, 1.67096458650752e-07, 4.9618084640712e-08, 
1.89758630264014e-07, 2.06228054849179e-07)), row.names = c(NA, 
6L), class = "data.frame")

question from:https://stackoverflow.com/questions/66049824/make-contingency-table-from-rows-in-r

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T03:16:02+0000

library(dplyr)
library(tidyr)

severity <- c("Leicht", "Mitte", "Schwer")
ranges <- c(0, 0.01, 0.05, Inf)

df %>% 
  dplyr::mutate(across(starts_with("X"), ~cut(., ranges, right = F, labels = severity))) %>% 
  tidyr::pivot_longer(cols = -Country,
                      names_to = "Date",
                      values_to = "Severity") %>% 
  dplyr::mutate(across(2, ~ as.Date(., tryFormats = "X%m_%d_%y"))) %>% # can delete line
  tidyr::pivot_wider(id_cols = Country,
                     names_from = Severity,
                     values_from = Severity,
                     values_fn = length)

 Country        Leicht
  <chr>           <int>
1 Canada             19
2 France             19
3 Germany            19
4 Italy              19
5 Japan              19
6 United Kingdom     19

How it works

mutate will apply the function cut across all columns that start with "X". Cut uses the provided ranges (excluding the upper bound) and labels values within those range with the appropriate label from the vector severity.

cut(0.01, range, right = F)
[1] [0.01,0.05)
Levels: [0,0.01) [0.01,0.05) [0.05,Inf)

You can see that 0.01 is included in the range [0.01,0.05) because the right bound is not inclusive due to right = F. Adding label = severity to this function will apply the proper label of "Mitte"

tidyr::pivot_longer will pivot all columns except Country to a long format. The column names will be stored in a new column called Date. The values of the columns after mutate will be stored in a new column called Severity.
The final mutate changes the column Date to a date vector. So "X2_01_20" becomes a date 2020-02-01. Note: you can delete this pipe, if you only care about the tabulation done in the next step. I included it in case you needed to look at things by date.
tidyr::pivot_wider will count the severity values for each country

Categories

dataframe - Make contingency table from rows in R

dataframe - Make contingency table from rows in R

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags