Hope this worked out example will help you
df<-data.frame( Name = c("mark", "joe", "cathy","zoya"),
Gender = c("Male","Male","Female", "Female"))
Name Gender
1 mark Male
2 joe Male
3 cathy Female
4 zoya Female
subsetting of a dataframe (df) is done by
df[row,column]
For example, df[1:2,1:2]
Name Gender
1 mark Male
2 joe Male
In your case, we are evaluating a condition on the dataframe
# both are valid
df[df$Gender == "Female",] or df[df[,2] == "Female",]
which is nothing but indexing the df as
df[c(3,4),] or df[c(FALSE,FALSE,TRUE,TRUE),]
df$Gender == "Female"
[1] FALSE FALSE TRUE TRUE
df[c(3,4),]
Which basically rows 3 and 4, and all columns
So, you are basically extracting variables to pass them as index. To extract variables of specific column from a data frame we use $ on dataframe help("$") help("[").
one more useful resource http://www.ats.ucla.edu/stat/r/modules/subsetting.htm
Rethinking about your Q, Why to preface the Column with df when R needs to know the df you are working with ! I could not have a better explanation than above, You need to extract the variable to pass row indexes where your condition has been evaluated TRUE
. Probably in dataframe columns are not referred as variables.
But, I have a good news, where things work like you think. Where, columns are referred to as variables. It is datatable
. Where columns are referred as variables, thus making easy to understand syntax for indexing, joining and other data manipulations. It is an amazing package, and easy to master it.
require(data.table)
DT<-data.table(df)
Name Gender
1: mark Male
2: joe Male
3: cathy Female
4: zoya Female
DT[Gender == "Female"]
Name Gender
1: cathy Female
2: zoya Female
Yes, you don't need to preface the df again, just passing columns. Best part is, it is more efficient, faster and easier to use compared to data.frame
I hope it helps.