Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
316 views
in Technique[技术] by (71.8m points)

summary - R gtsummary Row with Categorical Variable Totals

I have a dataset of approximately 700,000 patients where I have hospital site IDs (factor variable). I would like to create a row where the number of hospitals is visible (this is separate to the number of patients). I have 3 categorical variables as my columns in addition to an overall column.

At the moment, there is a separate row for each hospital id with the number of patients in each site for each category.

My code is as follows:

t1 <- PIR %>% 
  select(siteidn, countryname) %>% 
    tbl_summary(by = countryname ,missing = "no",
                label = list(
                 siteidn = "Number of ICUs"),
            statistic = list(
              all_continuous() ~ "{mean} ({sd})",
              all_categorical() ~ "{n} ({p}%)")) %>%
  bold_labels() %>% 
  italicize_levels() %>% 
  add_overall()

t2 <- PIR %>% 
  select(siteidn, hospt) %>% 
    tbl_summary(by = hospt ,missing = "no",
                label = list(
                 siteidn = "Number of ICUs"),
            statistic = list(
              all_continuous() ~ "{mean} ({sd})",
              all_categorical() ~ "{n} ({p}%)")) %>% 
      bold_labels() %>% 
      italicize_levels()

t3 <- PIR %>% 
  select(siteidn, iculevelname) %>% 
    tbl_summary(by = iculevelname ,missing = "no",
                label = list(
                 siteidn = "Number of ICUs"),
            statistic = list(
              all_continuous() ~ "{mean} ({sd})",
              all_categorical() ~ "{n} ({p}%)")) %>% 
      bold_labels() %>% 
      italicize_levels()

tbl_merge(
  tbls = list(t1, t2, t3),
  tab_spanner = c("**Country**", "**Hospital Type**", "**ICU Level**"))

This produces the following table:

Table 1

As can be seen, there is a separate row for each hospital ID. I'd like to have a single row where there are totals of the number of hospitals in each tier (i.e. total number of hospitals in Aus, NZ, Metropolitan, etc).

My questions are:

  1. Is there a way to get a total row for a factor variable that is not the patient number?
  2. Is it possible to have an overall column inserted after merging the tables (so that the overall column does not come under the Country heading)?
  3. Is there a way to create a row for the number of patients and not have those details in the headings?

Thanks all for your time.

Ben

ADDIT: Here is an image of what I would like the table to look like. I apologise for it's crudeness. I would like to have just one row for the factor variable of total Number of ICUs, rather than have a row of each ICU with the number of patients in it (Red Ink).

Additionally, is there a way to group the 2 rows under a common heading similar to the factor variables (Green Ink).

I appreciate that my R skills are rudementary. Thank you all for your patience!

Ben

Table_Ideal

question from:https://stackoverflow.com/questions/65623256/r-gtsummary-row-with-categorical-variable-totals

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I agree with Ben, always good to include a dataset we can run on our machine, and an example of what you would like the output look like. Below is a code example that adresses most of your questions.

  1. Is there a way to get a total row for a factor variable that is not the patient number?

I am not sure what you're looking for here. More details please.

  1. Is it possible to have an overall column inserted after merging the tables (so that the overall column does not come under the Country heading)?

Yes, you can use the modify_spanning_header() function to remove the header above the Overall column.

  1. Is there a way to create a row for the number of patients and not have those details in the headings?

Yes, if you create a new column in your dataset that is TRUE for all observations, we can summarize that column and report the N.

Also, if you're only doing cross tabulations of a single variable, you should look into the tbl_cross() function. It adds the total rows automatically.

library(gtsummary)
library(tidyverse)
set.seed(20210108)

# create dummy dataset
PIR <- 
  tibble(
    siteidn = sample(c("1325", "1324", "1329"), 100, replace = TRUE) %>% factor(),
    countryname = sample(c("NZ", "Australia"), 100, replace = TRUE) %>% factor(),
    hospt = sample(c("Metro", "Rural"), 100, replace = TRUE) %>% factor(),
    patient = TRUE
  ) %>%
  group_by(siteidn) %>%
  mutate(
    count_site = row_number() == 1L # one TRUE per site
  ) %>%
  ungroup() %>%
  labelled::set_variable_labels(siteidn = "Number of ICUs", # Assigning labels 
                                patient = "N")

t1 <- PIR %>% 
  select(patient, siteidn, countryname) %>% 
  tbl_summary(
    by = countryname,
    missing = "no", 
    statistic = patient ~ "{n}" # only print N for the top row
  ) %>% 
  modify_header(stat_by = "**{level}**") %>% # Remove the Ns from the header row
  add_overall(col_label = "**Overall**")
t2 <- PIR %>% 
  select(patient, siteidn, hospt) %>% 
  tbl_summary(
    by = hospt,
    missing = "no", 
    statistic = patient ~ "{n}" # only print N for the top row
  ) %>%
  modify_header(stat_by = "**{level}**") # Remove the Ns from the header row

tbl <-
  tbl_merge(
    tbls = list(t1, t2),
    tab_spanner = c("**Country**", "**Hospital Type**")
  ) %>%
  bold_labels() %>% 
  italicize_levels() %>%
  # remove spanning header for overall column, use `show_header_names(tbl)` to print column names
  modify_spanning_header(stat_0_1 ~ NA) %>%
  modify_footnote(everything() ~ NA) # remove footnote, as it's not informative in this setting

enter image description here

EDIT: After clarification from original poster, adding another example of how one could present the Ns.

The table below shows two ways to show the Ns for the patients and the number of sites. The first is on two lines with two variables, and the last line is a way the information can be presented on a single line.

t1 <- PIR %>% 
  select(patient, site_only = count_site, combination = count_site, countryname) %>% 
  tbl_summary(
    by = countryname,
    missing = "no", 
    statistic = list(c(patient, site_only) ~ "{n}", 
                     combination ~ "Site N {n}; Total N {N}")
  )

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...