Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
317 views
in Technique[技术] by (71.8m points)

bigdata - R{ff}:How to add a new column which depends on other elements in the same row in ffdf object?

I have an ffdf objetct (23Mx4) and a character vector with the values "TUMOR" or "NORMAL" and each value has a name, an unique icgc_specimen_id, so this way I indicate if a certain specimen is a Normal cell or Tumor cell.

> head(expresion,4)
ffdf (all open) dim=c(23939146,4), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
                               PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix PhysicalIsMatrix PhysicalElementNo
icgc_donor_id                 icgc_donor_id      integer       integer FALSE           FALSE            FALSE                 1
icgc_specimen_id           icgc_specimen_id      integer       integer FALSE           FALSE            FALSE                 2
gene_id                             gene_id      integer       integer FALSE           FALSE            FALSE                 3
normalized_read_count normalized_read_count       double        double FALSE           FALSE            FALSE                 4
                      PhysicalFirstCol PhysicalLastCol PhysicalIsOpen
icgc_donor_id                        1               1           TRUE
icgc_specimen_id                     1               1           TRUE
gene_id                              1               1           TRUE
normalized_read_count                1               1           TRUE
ffdf data
         icgc_donor_id icgc_specimen_id      gene_id normalized_read_count
1         DO3868           SP8217       SERINC1               9.276133e-05
2         DO3868           SP8217       SERINC2               1.925742e-04
3         DO3868           SP8217       SERINC3               2.531452e-05
4         DO3868           SP8217       SERINC4               4.811070e-07
5         DO3868           SP8217       SERINC5               4.402422e-07
6         DO3868           SP8217       SERP1                 7.620133e-05
7         DO3868           SP8217       SNX13                 1.088022e-05
8         DO3868           SP8217       SNX10                 5.652351e-06
:                    :                :            :                     :
23939139  DO2341           SP5052       FCRLB                 8.290500e-07
23939140  DO2341           SP5052       FDFT1                 7.108729e-05
23939141  DO2341           SP5052       FDPSL2A               7.999602e-08
23939142  DO2341           SP5052       GRIPAP1               6.532955e-05
23939143  DO2341           SP5052       GRINL1A               1.156511e-05
23939144  DO2341           SP5052       GRIP1                 2.465546e-06
23939145  DO2341           SP5052       GRIP2                 1.486814e-06
23939146  DO2341           SP5052       GRK1                  1.678295e-08
> head(specimen_type)
SP3358  SP6685 SP12716  SP8109 SP12780  SP8097 
"TUMOR" "TUMOR" "TUMOR" "TUMOR" "TUMOR" "TUMOR" 

I want to add a column to the ffdf called sp_type to know in each row if I'm working on a Tumor or a Normal cell.

In a normal data frame I would do:

expresion$sp_type <- specimen_type[expresion$icgc_specimen_id]

I can't find a way to do the same in an ffdf object.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I would write something like this:

require(ETLUtils)
require(ffbase)
expresion$sp_type <- with(expresion[c('icgc_specimen_id')], 
 recoder(as.character(icgc_specimen_id), from = names(specimen_type), to = specimen_type))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...