Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
248 views
in Technique[技术] by (71.8m points)

r - How to get the min and max values of a column?

I have a dataset where the values are collapsed so each row has multiple inputs per one column.

For example:

Gene   Score1                      
Gene1  NA, NA, NA, 0.03, -0.3 
Gene2  NA, 0.2, 0.1, ., .   

I am looking to make 2 new columns that select the min and max values of that column. In reality I have 70 columns so I coded to get all the min and max columns at once with:

get_range <- function(x) {
  x <- type.convert(str_split(x, ",\s+", simplify = TRUE), na.strings = ".")
  x <- t(apply(x, 1L, function(i) {
    i <- i[!is.na(i)]
    if (length(i) < 1L) c(NA_real_, NA_real_) else range(i)
  }))
  dimnames(x)[[2L]] <- c("min", "max")
  x
}

dt <- dt[, c(Gene = .(Gene), lapply(.SD, get_range)), .SDcols = -"Gene"]

However, my min and max columns outputted from the code look like this:

Gene   Score1.min  Score1.max                     
Gene1    1             5 
Gene2    3             5

Expected output actually is:

Gene   Score1.min  Score1.max                     
Gene1    -0.3          0.03 
Gene2    0.1           0.2

The values are nothing like the actual values I had at the start, I have no idea how my code is getting these as the output - is there something my code making the values no longer be treated as the numbers they originally were?

Input data:

structure(list(Gene = c("Gene1", "Gene2"), Score1 = c("NA, NA, NA, 0.03, -0.3", 
"NA, 0.2, 0.1, ., .")), row.names = c(NA, -2L), class = c("data.table", 
"data.frame"))
question from:https://stackoverflow.com/questions/65940361/how-to-get-the-min-and-max-values-of-a-column

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

type.convert only considers strings in na.strings as missing values. By default, this is "NA". You set na.strings = ".", which means "NA" are no longer counted as missing. Instead, you need na.strings = c(".", "NA") because both appear in your data.

## The string split result is `character`, of course, with both `"."` and `"NA"` values
(ss = str_split(dt$Score1, ",\s+", simplify = TRUE))
#      [,1] [,2]  [,3]  [,4]   [,5]  
# [1,] "NA" "NA"  "NA"  "0.03" "-0.3"
# [2,] "NA" "0.2" "0.1" "."    "."   

## What you have creates a factor with `"NA"` as a level
type.convert(ss, na.strings = c("."))
#      [,1] [,2] [,3] [,4] [,5]
# [1,] NA   NA   NA   0.03 -0.3
# [2,] NA   0.2  0.1  <NA> <NA>
# Levels: -0.3 0.03 0.1 0.2 NA

## Here is the solution to get it to be numeric with `type.convert`
type.convert(ss, na.strings = c(".", "NA"))
#      [,1] [,2] [,3] [,4] [,5]
# [1,]   NA   NA   NA 0.03 -0.3
# [2,]   NA  0.2  0.1   NA   NA

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...