Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
514 views
in Technique[技术] by (71.8m points)

r - Changing values when converting column type to numeric

I have a data file with the format from above.
I loaded it into R, and tried to plot a histogram with the values from the dist column and I have got the error "x must be numeric".Therefore I tried to change the format.

> head(data)

    V1        V2
1 type gene_dist
2    A     64667
3    A     76486
4    A     97416
5    A     30876
6    A     88018

> summary(data)
    V1            V2     
 A   : 67   100    :  1  
 B   :122   100906 :  1  
 type:  1   102349 :  1  
            1033   :  1  
            10544  :  1  
            10745  :  1  
            (Other):184  

I tried to set the format for the column using sapply but the values are changed:

> data[,2]<-sapply(data[,2],as.numeric)

> head(data)
    V1  V2
1 type 190
2    A 146
3    A 166
4    A 189

summary(data)
    V1            V2        
 A   : 67   Min.   :  1.00  
 B   :122   1st Qu.: 48.25  
 type:  1   Median : 95.50  
            Mean   : 95.50  
            3rd Qu.:142.75  
            Max.   :190.00 

Does anyone know why is this happening?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It looks like your second column is a factor. You need to use as.character before as.numeric. This is because factors are stored internally as integers with a table to give the factor level labels. Just using as.numeric will only give the internal integer codes. There is no need to use sapply since these functions are vectorized.

data[,2] <- as.numeric(as.character(data[,2]))

It is likely that the column is a factor because there are some non-numeric characters in some of the entries. Any such entries will be converted to NA with the appropriate warning, but you may want to investigate this in your raw data.

As a side note, data is a poor (though not invalid) choice for a variable name since there is a base function of the same name.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...