r - Subsetting from a Data Frame

Question

Welcome To Ask or Share your Answers For Others

r - Subsetting from a Data Frame

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Subsetting from a Data Frame

I'm early in the process of learning R. Say I have a data frame with a column named "Gender". If I want to retrieve all rows where Gender is "female" there are at least two ways I can do this:

FemaleSmokers <- df[df$Gender=="female", , drop = FALSE]
FemaleSmokers <- subset(df, Gender=="female")

1) Is there a best practice on when to use one over the other? 2) In the first approach, why do I need to preface the column with the name of the data frame when R should know which data frame I working with.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:31:41+0000

Hope this worked out example will help you

df<-data.frame( Name = c("mark", "joe", "cathy","zoya"), 
               Gender = c("Male","Male","Female", "Female"))
  Name Gender
1  mark   Male
2   joe   Male
3 cathy Female
4  zoya Female

subsetting of a dataframe (df) is done by 
df[row,column] 
For example, df[1:2,1:2]
 Name Gender
1 mark   Male
2  joe   Male

In your case, we are evaluating a condition on the dataframe
# both are valid
df[df$Gender == "Female",] or  df[df[,2] == "Female",]

which is nothing but indexing the df as

df[c(3,4),] or df[c(FALSE,FALSE,TRUE,TRUE),]
df$Gender == "Female"
[1] FALSE FALSE  TRUE  TRUE

df[c(3,4),] Which basically rows 3 and 4, and all columns So, you are basically extracting variables to pass them as index. To extract variables of specific column from a data frame we use $ on dataframe help("$") help("[").

one more useful resource http://www.ats.ucla.edu/stat/r/modules/subsetting.htm

Rethinking about your Q, Why to preface the Column with df when R needs to know the df you are working with ! I could not have a better explanation than above, You need to extract the variable to pass row indexes where your condition has been evaluated TRUE. Probably in dataframe columns are not referred as variables.

But, I have a good news, where things work like you think. Where, columns are referred to as variables. It is datatable. Where columns are referred as variables, thus making easy to understand syntax for indexing, joining and other data manipulations. It is an amazing package, and easy to master it.

require(data.table)
DT<-data.table(df)
 Name Gender
1:  mark   Male
2:   joe   Male
3: cathy Female
4:  zoya Female

DT[Gender == "Female"]
    Name Gender
1: cathy Female
2:  zoya Female

Yes, you don't need to preface the df again, just passing columns. Best part is, it is more efficient, faster and easier to use compared to data.frame I hope it helps.

Categories

r - Subsetting from a Data Frame

r - Subsetting from a Data Frame

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags