Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
682 views
in Technique[技术] by (71.8m points)

r - ggplots stored in plot list to respect variable values at time of plot generation within for loop

I have an elaborate plot routine that generates box plots with additional layers of scatter and adds them to a plot list.

The routine generates correct plots if they are created during the for loop directly via print(current_plot_complete).

However, if they are added to a plot list during the for loop which is printed only at the end, then the plots are incorrect: the final indices are used to generate all plots (instead of the current index at the time the plot is generated). This seems to be default ggplot2 behavior and I am looking for a solution to circumvent it in the current use case.

The issue seems to be within y = eval(parse(text=(paste0(COL_i)))) where the global environment is used (and thus the final index value) instead of the current values at the time of loop execution.

I tried various approaches to make eval() use the correct variable values, e.g. local(…) or specifying the environment – but without success.

A very simplified MWE is provided below.

enter image description here

MWE

The original routine is much more elaborate than this MWE such that the for loop can not be replaced easily with members of the apply family.

# create some random data
data_temp <- data.frame(
"a" = sample(x = 1:100, size  = 50),
"b" = rnorm(n = 50, mean = 45, sd = 1),
"c" = sample(x = 20:70, size  = 50), 
"d" = rnorm(n = 50, mean = 40, sd = 15),
"e" = rnorm(n = 50, mean = 50, sd = 10),
"f" = rnorm(n = 50, mean = 45, sd = 1),
"g" = sample(x = 20:70, size  = 50)
)
COLs_current <- c("a", "b", "c", "d", "e") # define COLs of data to include in box plots
choice_COLs <- c("a", "d")      # define COLs of data to add scatter to

plot_list <- list(NA)
plot_index <- 1

for (COL_i in choice_COLs) {

  COL_i_index <- which(COL_i == COLs_current)

  # Generate "basis boxplot" (to plot scatterplot on top)
  boxplot_scores <- data_temp %>% 
    gather(COL, score, all_of(COLs_current)) %>%
    ggplot(aes(x = COL, y = score)) +
    geom_boxplot() 

  # Get relevant data of COL_i for scattering: data of 4th quartile
  quartile_values <- quantile(data_temp[[COL_i]])
  threshold <- quartile_values["75%"]           # threshold = 3. quartile value
  data_temp_filtered <- data_temp %>%
    filter(data_temp[[COL_i]] > threshold) %>%  # filter the data of the 4th quartile
    dplyr::select(COLs_current)                 

  # Create layer of scatter for 4th quartile of COL_i
  scatter_COL_i <- geom_point(data=data_temp_filtered, mapping = aes(x = COL_i_index, y = eval(parse(text=(paste0(COL_i))))), color= "orange")

  # add geom objects to create final plot for COL_i
  current_plot_complete <- boxplot_scores + scatter_COL_i 

  print(current_plot_complete)

  plot_list[[plot_index]] <- current_plot_complete 
  plot_index <- plot_index + 1
}

plot_list
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think the problem is that ggplot uses lazy evaluation. When the list is rendered, the loop index has its final value, and that is the one used to render all the plots in the list.

This post is relevant.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...