Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
448 views
in Technique[技术] by (71.8m points)

r - NA matches NA, but is not equal to NA. Why?

In R Language Definition, NA values are briefly described, a portion of which says

... In particular, FALSE & NA is FALSE, TRUE | NA is TRUE. NA is not equal to any other value or to itself; testing for NA is done using is.na. However, an NA value will match another NA value in match.

Regarding the statement "NA is not equal to any other value or to itself",


Updated: The question, revised again, is

What is the reasoning, if any, behind NA matching NA in match, and nowhere else in the language?

It doesn't make sense to me that a missing value, unknown by anyone (or it would not be missing), would match another missing value of the same type. Since I posted this, I came across something in example(match) that provides some reasoning. Character coercion changes its type. I can erase it completely if I like.

match(NA, NA)
# [1] 1
match(NA, NA_real_)
# [1] 1
match(NA_character_, NA_real_)
# [1] 1
match(paste(NA), NA)
# [1] NA
gsub("NA", "", NA)
# [1] NA
gsub("NA", "", paste(NA))
# [1] ""
is.na(NA)
# [1] TRUE
is.na(paste(NA))
# [1] FALSE

Apologies for stirring the pot, but some of the documentation is unclear about this. It might boil down to the R parser/deparser and the fact that you can turn anything into a text character object in R.


Original Post:

Now referring to "However, an NA value will match another NA value in match."

If NA is it not equal to itself, why is it matched with itself in match? and also in identical? Is this done on purpose?

NA == NA  ## expecting TRUE
# [1] NA
NA != NA
# [1] NA
x <- NA
x == x
# [1] NA
match(NA, NA)
# [1] 1
identical(NA, NA)
# [1] TRUE
all.equal(NA, NA)
# [1] TRUE
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's a matter of convention. There are good reasons for the way == works. NA is a special value in R that is supposed to represent data that is missing and should be treated differently from the rest of data. There are innumerable very subtle bugs that could come up if we started comparing missing values as if they were known or as if two missing values were equal to each other.

Think of NA as meaning "I don't know what's there". The correct answer to 3 > NA is obviously NA because we don't know if the missing value is larger than 3 or not. Well, it's the same for NA == NA. They are both missing values but the true values could be quite different, so the correct answer is "I don't know."

R doesn't know what you are doing in your analysis, so instead of potentially introducing bugs that would later end up being published and embarrassing you, it doesn't allow comparison operators to think NA is a value.

match() was written with a more specific purpose in mind: finding the indexes of matching values. If you ask the question "Should I match 3 with NA", a reasonable answer is "no." Different (and very useful) convention, and justified because R pretty much knows what you are trying to do when you invoke match(). Now, should we match NA with NA for this purpose? It could be argued.

Come to think of it, I suppose it is a a little odd that the authors of match() chose to allow NA to match to itself by default. You can imagine cases where you might use match() to find NA rows in table along with other values, but it's dangerous. You just have to be a bit more careful about knowing whether you have any NA values in x and only permitting them if you really wanted to. You can change this behavior by specifying incomparables=NA when calling match().


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...