Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
242 views
in Technique[技术] by (71.8m points)

Replace contents of factor column in R dataframe

I need to replace the levels of a factor column in a dataframe. Using the iris dataset as an example, how would I replace any cells which contain virginica with setosa in the Species column?

I expected the following to work, but it generates a warning message and simply inserts NAs:

iris$Species[iris$Species == 'virginica'] <- 'setosa'
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I bet the problem is when you are trying to replace values with a new one, one that is not currently part of the existing factor's levels:

levels(iris$Species)
# [1] "setosa"     "versicolor" "virginica" 

Your example was bad, this works:

iris$Species[iris$Species == 'virginica'] <- 'setosa'

This is what more likely creates the problem you were seeing with your own data:

iris$Species[iris$Species == 'virginica'] <- 'new.species'
# Warning message:
# In `[<-.factor`(`*tmp*`, iris$Species == "virginica", value = c(1L,  :
#   invalid factor level, NAs generated

It will work if you first increase your factor levels:

levels(iris$Species) <- c(levels(iris$Species), "new.species")
iris$Species[iris$Species == 'virginica'] <- 'new.species'

If you want to replace "species A" with "species B" you'd be better off with

levels(iris$Species)[match("oldspecies",levels(iris$Species))] <- "newspecies"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...