r - subset rows with (1) ALL and (2) ANY columns larger than a specific value

Question

Welcome To Ask or Share your Answers For Others

r - subset rows with (1) ALL and (2) ANY columns larger than a specific value

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - subset rows with (1) ALL and (2) ANY columns larger than a specific value

I have a data frame with an id column and some (potentially many) columns with values, here 'v1', 'v2':

df <- data.frame(id = c(1:5), v1 = c(0,15,9,12,7), v2 = c(9,32,6,17,11))
#   id v1 v2
# 1  1  0  9
# 2  2 15 32
# 3  3  9  6
# 4  4 12 17
# 5  5  7 11

How can I extract rows where ALL values are larger than a certain value, say 10, which should return:
```
#   id v1 v2
# 2  2 15 32
# 4  4 12 17
```
How can I extract rows with ANY (at least one) value is larger than 10:
```
#   id v1 v2
# 2  2 15 32
# 4  4 12 17
# 5  5  7 11
```

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:34:52+0000

See functions all() and any() for the first and second parts of your questions respectively. The apply() function can be used to run functions over rows or columns. (MARGIN = 1 is rows, MARGIN = 2 is columns, etc). Note I use apply() on df[, -1] to ignore the id variable when doing the comparisons.

Part 1:

> df <- data.frame(id=c(1:5), v1=c(0,15,9,12,7), v2=c(9,32,6,17,11))
> df[apply(df[, -1], MARGIN = 1, function(x) all(x > 10)), ]
  id v1 v2
2  2 15 32
4  4 12 17

Part 2:

> df[apply(df[, -1], MARGIN = 1, function(x) any(x > 10)), ]
  id v1 v2
2  2 15 32
4  4 12 17
5  5  7 11

To see what is going on, x > 10 returns a logical vector for each row (via apply() indicating whether each element is greater than 10. all() returns TRUE if all element of the input vector are TRUE and FALSE otherwise. any() returns TRUE if any of the elements in the input is TRUE and FALSE if all are FALSE.

I then use the logical vector resulting from the apply() call

> apply(df[, -1], MARGIN = 1, function(x) all(x > 10))
[1] FALSE  TRUE FALSE  TRUE FALSE
> apply(df[, -1], MARGIN = 1, function(x) any(x > 10))
[1] FALSE  TRUE FALSE  TRUE  TRUE

to subset df (as shown above).

Categories

r - subset rows with (1) ALL and (2) ANY columns larger than a specific value

r - subset rows with (1) ALL and (2) ANY columns larger than a specific value

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags