Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
352 views
in Technique[技术] by (71.8m points)

r - error in running factor() on a column of a data frame

I have a dataframe which has several columns. I want to run the factor() function on one of the columns, say name my_col. Initially I did it this way

df[,"my_col"]<-factor((df[,"my_col"]))

It gave the following error

Error: 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?

On referring to a similar question on SO my problem was solved.

Now if instead of the first method I try the following code, it works perfectly without giving any error

df$"my_col"<-factor(df$"my_col")

Why's that? Is there a difference between accessing a column via df$vec_name and df[,vec_name]?

Update:

str(df)
Classes 'tbl_df', 'tbl' and 'data.frame':   160 obs. of  8 variables:
$ area     : int  1 1 1 1 1 1 1 1 1 1 ...
$ temp     : int  1 1 1 1 1 1 1 1 1 1 ...
$ size     : int  1 1 1 1 1 1 1 1 1 1 ...
$ storage  : int  1 1 1 1 1 2 2 2 2 2 ...
$ my_col   : int  1 2 3 4 5 1 2 3 4 5 ...
$ texture  : num  2.9 2.3 2.5 2.1 1.9 1.8 2.6 3 2.2 2 ...
$ flavor   : num  3.2 2.5 2.8 2.9 2.8 3 3.1 3 3.2 2.8 ...
$ moistness: num  3 2.6 2.8 2.4 2.2 1.7 2.4 2.9 2.5 1.9 ...
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your data is a tbl_df. I don't have your data, but we can look at an example using mtcars.

library(dplyr)

tbl_df(mtcars)[, "mpg"]
# Source: local data frame [32 x 1]
# 
#      mpg
#    (dbl)
# 1   21.0
# 2   21.0
# 3   22.8
# 4   21.4
# 5   18.7
# 6   18.1
# 7   14.3
# 8   24.4
# 9   22.8
# 10  19.2
# ..   ...

It's still a data frame, whereas in base R it would have been dropped to an atomic vector. dplyr:::`[.tbl_df` does not drop single columns, as is done in [.data.frame from base R. This is why we can't run factor() on it.

factor(tbl_df(mtcars)[, "mpg"])
# Error in sort.list(y) : 'x' must be atomic for 'sort.list'
# Have you called 'sort' on a list?

So you'll need to use [[, as in df[["my_col"]], or just use $.

df[["my_col"]] <- factor(df[["my_col"]])

Note: When you use the $ operator you can do it without the quotes around the column name.

df$my_col <- factor(df$my_col)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...