I think the way to think about the difference between labels
and levels
(ignoring the labels()
function that Tommy describes in his answer) is that levels
is intended to tell R which values to look for in the input (x
) and what order to use in the levels of the resulting factor
object, and labels
is to change the values of the levels after the input has been coded as a factor ... as suggested by Tommy's answer, there is no part of the factor
object returned by factor()
that is called labels
... just the levels, which have been adjusted by the labels
argument ... (clear as mud).
For example:
> f <- factor(x=c("a","b","c"),levels=c("c","d","e"))
> f
[1] <NA> <NA> c
Levels: c d e
> str(f)
Factor w/ 3 levels "c","d","e": NA NA 1
Because the first two elements of x
were not found in levels
, the first two elements of f
are NA
. Because "d"
and "e"
were included in levels
, they show up in the levels of f
even though they did not occur in x
.
Now with labels
:
> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("C","D","E"))
> f
[1] <NA> <NA> C
Levels: C D E
After R figures out what should be in the factor, it re-codes the levels. One can of course use this to do brain-frying things such as:
> f <- factor(c("a","b","c"),levels=c("c","d","e"),labels=c("a","b","c"))
> f
[1] <NA> <NA> a
Levels: a b c
Another way to think about levels
is that factor(x,levels=L1,labels=L2)
is equivalent to
f <- factor(x,levels=L1)
levels(f) <- L2
I think an appropriately phrased version of this example might be nice for Pat Burns's R inferno -- there are plenty of factor puzzles in section 8.2, but not this particular one ...