Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
387 views
in Technique[技术] by (71.8m points)

r - Combining vectors of unequal length into a data frame

I have a list of vectors which are time series of inequal length. My ultimate goal is to plot the time series in a ggplot2 graph. I guess I am better off first merging the vectors in a dataframe (where the shorter vectors will be expanded with NAs), also because I want to export the data in a tabular format such as .csv to be perused by other people.

I have a list that contains the names of all the vectors. It is fine that the column titles be set by the first vector, which is the longest. E.g.:

> mylist
[[1]]
[1] "vector1"

[[2]]
[1] "vector2"

[[3]]
[1] "vector3"

etc.

I know the way to go is to use Hadley's plyr package but I guess the problem is that my list contains the names of the vectors, not the vectors themselves, so if I type:

do.call(rbind, mylist)

I get a one-column df containing the names of the dfs I wanted to merge.

> do.call(rbind, actives)
      [,1]           
 [1,] "vector1" 
 [2,] "vector2" 
 [3,] "vector3" 
 [4,] "vector4" 
 [5,] "vector5" 
 [6,] "vector6" 
 [7,] "vector7" 
 [8,] "vector8" 
 [9,] "vector9" 
[10,] "vector10"

etc.

Even if I create a list with the object themselves, I get an empty dataframe :

mylist <- list(vector1, vector2)
mylist
[[1]]
        1         2         3         4         5         6         7         8         9        10        11        12 
0.1875000 0.2954545 0.3295455 0.2840909 0.3011364 0.3863636 0.3863636 0.3295455 0.2954545 0.3295455 0.3238636 0.2443182 
       13        14        15        16        17        18        19        20        21        22        23        24 
0.2386364 0.2386364 0.3238636 0.2784091 0.3181818 0.3238636 0.3693182 0.3579545 0.2954545 0.3125000 0.3068182 0.3125000 
       25        26        27        28        29        30        31        32        33        34        35        36 
0.2727273 0.2897727 0.2897727 0.2727273 0.2840909 0.3352273 0.3181818 0.3181818 0.3409091 0.3465909 0.3238636 0.3125000 
       37        38        39        40        41        42        43        44        45        46        47        48 
0.3125000 0.3068182 0.2897727 0.2727273 0.2840909 0.3011364 0.3181818 0.2329545 0.3068182 0.2386364 0.2556818 0.2215909 
       49        50        51        52        53        54        55        56        57        58        59        60 
0.2784091 0.2784091 0.2613636 0.2329545 0.2443182 0.2727273 0.2784091 0.2727273 0.2556818 0.2500000 0.2159091 0.2329545 
       61 
0.2556818 

[[2]]
        1         2         3         4         5         6         7         8         9        10        11        12 
0.2824427 0.3664122 0.3053435 0.3091603 0.3435115 0.3244275 0.3320611 0.3129771 0.3091603 0.3129771 0.2519084 0.2557252 
       13        14        15        16        17        18        19        20        21        22        23        24 
0.2595420 0.2671756 0.2748092 0.2633588 0.2862595 0.3549618 0.2786260 0.2633588 0.2938931 0.2900763 0.2480916 0.2748092 
       25        26        27        28        29        30        31        32        33        34        35        36 
0.2786260 0.2862595 0.2862595 0.2709924 0.2748092 0.3396947 0.2977099 0.2977099 0.2824427 0.3053435 0.3129771 0.2977099 
       37        38        39        40        41        42        43        44        45        46        47        48 
0.3320611 0.3053435 0.2709924 0.2671756 0.2786260 0.3015267 0.2824427 0.2786260 0.2595420 0.2595420 0.2442748 0.2099237 
       49        50        51        52        53        54        55        56        57        58        59        60 
0.2022901 0.2251908 0.2099237 0.2213740 0.2213740 0.2480916 0.2366412 0.2251908 0.2442748 0.2022901 0.1793893 0.2022901 

but

do.call(rbind.fill, mylist)
data frame with 0 columns and 0 rows

I have tried converting the vectors to dataframes, but there is no cbind.fill function, so plyr complains that the dataframes are of different length.

So my questions are:

  • Is this the best approach? Keep in mind that the goals are a) a ggplot2 graph and b) a table with the time series, to be viewed outside of R

  • What is the best way to get a list of objects starting with a list of the names of those objects?

  • What the best type of graph to highlight the patterns of 60 timeseries? The scale is the same, but I predict there'll be a lot of overplotting. Since this is a cohort analysis, it might be useful to use color to highlight the different cohorts in terms of recency (as a continuous variable). But how to avoid overplotting? The differences will be minimal so faceting might leave the viewer unable to grasp the difference.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think that you may be approaching this the wrong way:

If you have time series of unequal length then the absolute best thing to do is to keep them as time series and merge them. Most time series packages allow this. So you will end up with a multi-variate time series and each value will be properly associated with the same date.

So put your time series into zoo objects, merge them, then use my qplot.zoo function to plot them. That will deal with switching from zoo into a long data frame.

Here's an example:

> z1 <- zoo(1:8, 1:8)
> z2 <- zoo(2:8, 2:8)
> z3 <- zoo(4:8, 4:8)
> nm <- list("z1", "z2", "z3")
> z <- zoo()
> for(i in 1:length(nm)) z <- merge(z, get(nm[[i]]))
> names(z) <- unlist(nm)
> z
  z1 z2 z3
1  1 NA NA
2  2  2 NA
3  3  3 NA
4  4  4  4
5  5  5  5
6  6  6  6
7  7  7  7
8  8  8  8
> 
> x.df <- data.frame(dates=index(x), coredata(x))
> x.df <- melt(x.df, id="dates", variable="val")
> ggplot(na.omit(x.df), aes(x=dates, y=value, group=val, colour=val)) + geom_line() + opts(legend.position = "none")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...