Here's an example:
#Create a data frame
> d<- data.frame(a=1:3, b=2:4)
> d
a b
1 1 2
2 2 3
3 3 4
#currently, there are no levels in the `a` column, since it's numeric as you point out.
> levels(d$a)
NULL
#Convert that column to a factor
> d$a <- factor(d$a)
> d
a b
1 1 2
2 2 3
3 3 4
#Now it has levels.
> levels(d$a)
[1] "1" "2" "3"
You can also handle this when reading in your data. See the colClasses
and stringsAsFactors
parameters in e.g. readCSV()
.
Note that, computationally, factoring such columns won't help you much, and may actually slow down your program (albeit negligibly). Using a factor will require that all values are mapped to IDs behind the scenes, so any print of your data.frame requires a lookup on those levels -- an extra step which takes time.
Factors are great when storing strings which you don't want to store repeatedly, but would rather reference by their ID. Consider storing a more friendly name in such columns to fully benefit from factors.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…