Inspired by a comment from @gsk3 on a question about reshaping data, I started doing a little bit of experimentation with reshaping data where the variable names have character suffixes instead of numeric suffixes.
As an example, I'll load the dadmomw
dataset from one of the UCLA ATS Stata learning webpages (see "Example 4" on the webpage).
Here's what the dataset looks like:
library(foreign)
dadmom <- read.dta("https://stats.idre.ucla.edu/stat/stata/modules/dadmomw.dat")
dadmom
# famid named incd namem incm
# 1 1 Bill 30000 Bess 15000
# 2 2 Art 22000 Amy 18000
# 3 3 Paul 25000 Pat 50000
When trying to reshape from this wide format to long, I run into a problem. Here's what I do to reshape the data.
reshape(dadmom, direction="long", idvar=1, varying=2:5,
sep="", v.names=c("name", "inc"), timevar="dadmom",
times=c("d", "m"))
# famid dadmom name inc
# 1.d 1 d 30000 Bill
# 2.d 2 d 22000 Art
# 3.d 3 d 25000 Paul
# 1.m 1 m 15000 Bess
# 2.m 2 m 18000 Amy
# 3.m 3 m 50000 Pat
Note the swapped column names for "name" and "inc"; changing v.names
to c("inc", "name")
doesn't solve the problem.
reshape
seems very picky about wanting the columns to be named in a fairly standard way. For example, I can reshape the data correctly (and easily) if I first rename the columns:
dadmom2 <- dadmom # Just so we can continue experimenting with the original data
# Change the names of the last four variables to include a "."
names(dadmom2)[2:5] <- gsub("(d$|m$)", "\.\1", names(dadmom2)[2:5])
reshape(dadmom2, direction="long", idvar=1, varying=2:5,
timevar="dadmom")
# famid dadmom name inc
# 1.d 1 d Bill 30000
# 2.d 2 d Art 22000
# 3.d 3 d Paul 25000
# 1.m 1 m Bess 15000
# 2.m 2 m Amy 18000
# 3.m 3 m Pat 50000
My questions are:
- Why is R swapping the columns in the example I've provided?
- Can I get to this result with base R
reshape
without changing the variable names before reshaping?
- Are there other approaches that could be considered instead of
reshape
?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…