Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
307 views
in Technique[技术] by (71.8m points)

r - Grouping Rows in GTSummary

I am trying to group some rows/variables (both categorical and continuous) to help with the table readability in a large dataset.

Here is the dummy dataset:

library(gtsummary)
library(tidyverse)
library(gt)
set.seed(11012021)

# Create Dataset
PIR <- 
  tibble(
    siteidn = sample(c("1324", "1329", "1333", "1334"), 5000, replace = TRUE, prob = c(0.2, 0.45, 0.15, 0.2)) %>% factor(),
    countryname = sample(c("NZ", "Australia"), 5000, replace = TRUE, prob = c(0.3, 0.7)) %>% factor(),
    hospt = sample(c("Metropolitan", "Rural"), 5000, replace = TRUE, prob = c(0.65, 0.35)) %>% factor(),
    age = rnorm(5000, mean = 60, sd = 20),
    apache2 = rnorm(5000, mean = 18.5, sd=10),
    apache3 = rnorm(5000, mean = 55, sd=20),
    mechvent = sample(c("Yes", "No"), 5000, replace = TRUE, prob = c(0.4, 0.6)) %>% factor(),
    sex = sample(c("Female", "Male"), 5000, replace = TRUE) %>% factor(),
    patient = TRUE
  ) %>%
  mutate(patient_id = row_number())%>% 
  group_by(
    siteidn) %>% mutate(
      count_site = row_number() == 1L) %>%
  ungroup()%>% 
  group_by(
    patient_id) %>% mutate(
      count_pt = row_number() == 1L) %>%
  ungroup()

Then I use the following code to generate my table:

t1 <- PIR %>% 
  select(patientn = count_pt, siten = count_site, age, sex, apache2, apache3,  apache2, mechvent, countryname) %>% 
  tbl_summary(
    by = countryname,
    missing = "no", 
    statistic = list(
      patientn ~ "{n}",
      siten ~ "{n}",
      age ~ "{mean} ({sd})",
      apache2 ~ "{mean} ({sd})",
      mechvent ~ "{n} ({p}%)",
      sex ~ "{n} ({p}%)",
      apache3 ~ "{mean} ({sd})"),
    label = list(
      siten = "Number of ICUs",
      patientn = "Number of Patients",
      age = "Age",
      apache2 = "APACHE II Score",
      mechvent = "Mechanical Ventilation",
      sex = "Sex",
      apache3 = "APACHE III Score")) %>% 
  modify_header(stat_by = "**{level}**") %>%
  add_overall(col_label = "**Overall**")
  
t2 <- PIR %>% 
  select(patientn = count_pt, siten = count_site, age, sex, apache2, apache3,  apache2, mechvent, hospt) %>% 
  tbl_summary(
    by = hospt,
    missing = "no", 
     statistic = list(
      patientn ~ "{n}",
      siten ~ "{n}",
      age ~ "{mean} ({sd})",
      apache2 ~ "{mean} ({sd})",
      mechvent ~ "{n} ({p}%)",
      sex ~ "{n} ({p}%)",
      apache3 ~ "{mean} ({sd})"),
    label = list(
      siten = "Number of ICUs",
      patientn = "Number of Patients",
      age = "Age",
      apache2 = "APACHE II Score",
      mechvent = "Mechanical Ventilation",
      sex = "Sex",
      apache3 = "APACHE III Score")) %>%  
  modify_header(stat_by = "**{level}**")

tbl <-
  tbl_merge(
    tbls = list(t1, t2),
    tab_spanner = c("**Country**", "**Hospital Type**")
  ) %>%
  modify_spanning_header(stat_0_1 ~ NA) %>%
  modify_footnote(everything() ~ NA)

This produces the following table:

Table 1

I would like to group certain rows together for ease of reading. Ideally, I would like the table to look like this:

Table 1 Ideal

I have attempted using the gt package, with the following code:

tbl <-
  tbl_merge(
    tbls = list(t1, t2),
    tab_spanner = c("**Country**", "**Hospital Type**")
  ) %>%
  modify_spanning_header(stat_0_1 ~ NA) %>%
  modify_footnote(everything() ~ NA) %>% 
as_gt() %>%  
  gt::tab_row_group(
    group = "Severity of Illness Scores",
    rows = 7:8) %>%  
  gt::tab_row_group(
    group = "Patient Demographics",
    rows = 3:6) %>%  
  gt::tab_row_group(
    group = "Numbers",
    rows = 1:2)

This produces the desired table:

Table 1 Sections

There are a couple of issues I'm having with the way that I'm doing this.

  1. When I try to use the row names (variables), an error message comes up (Can't subset columns that don't exist...). Is there a way to do this by using the variable names? With larger tables, I am getting into some trouble with using the row numbers method of assigning row names. This is particularly true when there is a single variable that loses its place as it's moved to the end to account for the grouped rows.

  2. Is there a way to do this prior to piping into tbl_summary? Although I like the output of this table, I use Word as my output document for statistical reports and would like the ability to be able to format the tables in Word if need be (or by my collaborators). I usually use gtsummary::as_flextable for table output.

Thanks again,

Ben


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
  1. When I try to use the row names (variables), an error message comes up (Can't subset columns that don't exist...). Is there a way to do this by using the variable names? With larger tables, I am getting into some trouble with using the row numbers method of assigning row names. This is particularly true when there is a single variable that loses its place as it's moved to the end to account for the grouped rows.

There are two ways to go about this, 1. build separate tables for each group, then stack them, and 2. add a grouping column to .$table_body then group the tibble by the new variable.

library(gtsummary)
library(dplyr)
packageVersion("gtsummary")
#> '1.3.6'

# Method 1 - Stack separate tables
t1 <- trial %>% select(age) %>% tbl_summary()
t2 <- trial %>% select(grade) %>% tbl_summary()

tbl1 <-
  tbl_stack(
    list(t1, t2), 
    group_header = c("Demographics", "Tumor Characteristics")
  ) %>%
  modify_footnote(all_stat_cols() ~ NA)

# Method 2 - build a grouping variable
tbl2 <-
  trial %>%
  select(age, grade) %>%
  tbl_summary() %>%
  modify_table_body(
    mutate,
    group_variable = case_when(variable == "age" ~ "Deomgraphics",
                               variable == "grade" ~ "Tumor Characteristics")
  ) %>%
  modify_table_body(group_by, group_variable)

enter image description here

2.Is there a way to do this prior to piping into tbl_summary? Although I like the output of this table, I use Word as my output document for statistical reports and would like the ability to be able to format the tables in Word if need be (or by my collaborators). I usually use gtsummary::as_flextable for table output.

The examples above modify the table before exporting to gt format, so you can export these example to flextable. However, flextable does not have the same built-in header row functionality (or at least I am unaware of it, and don't use it in as_flex_table()), and the output would look like the table below. I recommend installing the dev version of gt from GitHub and export to RTF (supported by Word)--they've made many updates to RTF output in the last months, and it may work for you.

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...