I conducted a study that, in retrospect (one lives, one learns :-)) appears to generate multilevel data. Now I'm trying to restructure the dataset from wide to long so that I can analyse it using e.g. lme4.
In doing so, I encounter an, um, challenge, that I've ran into a few times before, but for which I've never found a good solution. I've searched again this time, but I probably use the wrong keywords - or this problem is much rarer than I thought.
Basically, in this dataset, the variablenames indicate for which measure data is collected. I asked participants to grade (rate) interventions (could be anything really). Each intervention is in one of 6 behavioral domains. In addition, participants rated each intervention either when it was presented on its own, or simultaneously with one other intervention, or with two other interventions. There were three types of interventions, and they were all rated before (t0) and after (t1) I presented them with some information.
So, in effect, I have a dataframe that can be regenerated like this:
### Elements of the variable names
measurementMomentsVector <- c("t0", "t1");
interventionTypesVector <- c("fear", "know", "scd");
nrOfInterventionsSimultaneouslyVector <- c(1, 2, 3);
behaviorDomainsVector <- c("diet", "pox", "alc", "smoking", "traff", "adh");
### Generate a vector with all variable names
variableNames <-
apply(expand.grid(measurementMomentsVector,
interventionTypesVector,
nrOfInterventionsSimultaneouslyVector,
behaviorDomainsVector),
1, paste0, collapse="_");
### Generate 5 'participants' worth of data
wideData <- data.frame(matrix(rnorm(5*length(variableNames)), nrow=5));
### Assign names
names(wideData) <- variableNames;
### Add unique id variable for every participants
wideData$id <- 1:5;
So using head(wideData)[, 1:5]
you can see roughly what the dataframe looks like:
t0_fear_1_diet t1_fear_1_diet t0_know_1_diet t1_know_1_diet t0_scd_1_diet
1 -0.9338191 0.9747453 1.0069036 0.3500103 -0.844699708
2 0.8921867 1.3687834 -1.2005791 0.2747955 1.316768219
3 1.6200200 0.5245470 -1.2910586 1.3211912 -0.174795144
4 0.1543738 0.7535642 0.4726131 -0.3464789 -0.009190702
5 -1.3676692 -0.4491574 -2.0902003 -0.3484678 -2.537501824
Now, I want to convert this data to a long dataframe, with 6 variables, for example 'id', 'measurementMoment', 'interventionType', 'nrOfInterventionsSimultaneously', 'behaviorDomain', and 'evaluation', where the first variable denotes the participants to which a record belongs, the last variable is the score (rating, grade, evaluation) the participants gave a specific intervention, and the four variables in between indicate which intervention is being rated exactly.
I can probably write some 'custom' code just for this problem, but I expect R 'has something for this'. I've been playing around with reshape2, e.g.:
longData <- reshape(wideData, varying=1:(ncol(wideData)-1),
idvar="id",
sep="_", direction="long")
But it doesn't manage to guess the time-varying variables:
Error in guess(varying) :
failed to guess time-varying variables from their names
I have been struggling with this a few times now, and I don't manage to find any answers online. And now I really need to move on, so I thought I'd try this as a last effort before resorting to writing something custom-made :-)
I would greatly appreciate any pointers anybody can give!!!
See Question&Answers more detail:
os