Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
737 views
in Technique[技术] by (71.8m points)

r - Push up and tighten Dataframe. General solution

I want to push up (metaphorically) the dataframe in ordner to get rid of the spaces (NA-Values)

My Data:

> dput(df1)
structure(list(ID = c("CN1-1", "CN1-1", "CN1-1", "CN1-10", "CN1-10", 
"CN1-10", "CN1-11", "CN1-11", "CN1-11", "CN1-12", "CN1-12", "CN1-12", 
"CN1-13", "CN1-13", "CN1-13"), v1 = c(0.37673, NA, NA, 1.019972, 
NA, NA, 0.515152, NA, NA, 0.375139, NA, NA, 0.508125, NA, NA), 
    v2 = c(NA, 0.732, NA, NA, 0, NA, NA, 0.748, NA, NA, 0.466, 
    NA, NA, 0.57, NA), v2 = c(NA, NA, 0.357, NA, NA, 0.816, NA, 
    NA, 0.519, NA, NA, 0.206, NA, NA, 0.464)), .Names = c("ID", 
"v1", "v2", "v2"), row.names = c(NA, 15L), class = "data.frame")
> 

Looks like:

       ID       v1    v2    v2
1   CN1-1 0.376730    NA    NA
2   CN1-1       NA 0.732    NA
3   CN1-1       NA    NA 0.357
4  CN1-10 1.019972    NA    NA
5  CN1-10       NA 0.000    NA
6  CN1-10       NA    NA 0.816
7  CN1-11 0.515152    NA    NA
8  CN1-11       NA 0.748    NA
9  CN1-11       NA    NA 0.519
10 CN1-12 0.375139    NA    NA
11 CN1-12       NA 0.466    NA
12 CN1-12       NA    NA 0.206
13 CN1-13 0.508125    NA    NA
14 CN1-13       NA 0.570    NA
15 CN1-13       NA    NA 0.464

Please note: I'm not sure if the pattern is consistent over all rows. It could also be possible, that one or more variables are prominent 2+ times per ID Group.

Desired output:

       ID       v1    v2    v2
1   CN1-1 0.376730 0.732 0.357
2  CN1-10 1.019972 0.000 0.816

...

My idea was to melt then get rid of all NA values and then dcast. Any better approach?

EDIT:

duplicated could look like this.

16 CN1-x 0.508125    NA    NA
17 CN1-x       NA 0.570    NA
18 CN1-x       NA    NA 0.464
19 CN1-x       NA    NA 0.134
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
do.call(rbind,
        lapply(split(df1, df1$ID), function(a)
            data.frame(ID = a$ID[1], lapply(a[-1], sum, na.rm = TRUE))))
#           ID       v1    v2  v2.1
#CN1-1   CN1-1 0.376730 0.732 0.357
#CN1-10 CN1-10 1.019972 0.000 0.816
#CN1-11 CN1-11 0.515152 0.748 0.519
#CN1-12 CN1-12 0.375139 0.466 0.206
#CN1-13 CN1-13 0.508125 0.570 0.464

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...