I have to make a multiple regression model but i'm facing some problems and I cannot figure out what went wrong. Initially, I have a dataframe like this:
> str(house)
'data.frame': 77955 obs. of 10 variables:
$ Type : chr "Apartment" "Apartment" "Apartment" "Apartment" ...
$ Postal.District: int 1 1 3 5 3 3 5 5 5 5 ...
$ Market.Segment : chr "CCR" "CCR" "RCR" "OCR" ...
$ Tenure : num 2 2 2 2 2 2 2 2 2 2 ...
$ Type.of.Sale : chr "Resale" "Resale" "New Sale" "New Sale" ...
$ No..of.Units : int 1 1 1 1 1 1 1 1 1 1 ...
$ Price.... : int 3548000 3490000 1987000 1745000 1227000 1702000 1899000 704380 1129960 1145540 ...
$ Area..Sqft. : int 1518 1518 1055 1044 635 700 1249 441 764 764 ...
$ Type.of.Area : chr "Strata" "Strata" "Strata" "Strata" ...
$ Floor.Level : chr "46 to 50" "46 to 50" "26 to 30" "06 to 10" ...
I converted all characters to factors, and converted those factors to numeric as numeric is necessary to get corrolation.
#Convert to factor
house$Floor.Level <- factor(house$Floor.Level)
house$Type <- factor(house$Type)
house$Market.Segment <- factor(house$Market.Segment)
house$Type.of.Sale <- factor(house$Type.of.Sale)
house$Type.of.Area <- factor(house$Type.of.Area)
house$Tenure<-factor(house$Tenure)
house$Area..Sqft.<-factor(house$Area..Sqft.)
#Convert to numeric
house$Floor.Level <- as.numeric(house$Floor.Level)
house$Type <- as.numeric(house$Type)
house$Market.Segment <- as.numeric(house$Market.Segment)
house$Type.of.Sale <- as.numeric(house$Type.of.Sale)
house$Type.of.Area <- as.numeric(house$Type.of.Area)
house$Tenure<-as.numeric(house$Tenure)
house$Area..Sqft.<-as.numeric(house$Area..Sqft.)
house$Postal.District<- as.numeric(house$Postal.District)
house$No..of.Units<- as.numeric(house$No..of.Units)
house$Price....<- as.numeric(house$Price....)
I then split the data into train and test data and use step AIC to find the model:
#Split data into Train and Test dataset
#Select 75% of sample of 1000 for train dataset
data <- house[sample(nrow(house), 77955), ]
split <- sample(seq_len(nrow(data)), size = floor(0.8 * nrow(data)))
train <- data[split, ]
test <- data[-split, ]
fit <- lm(No..of.Units ~ ., data=train)
step<- stepAIC(fit, direction="both", trace = TRUE)
Here is the part that would give me error when I run. Use $adj.r.squared to retrieve adjusted r2 value:
#train data
summary(No..of.Units ~ Type + Postal.District + Market.Segment + Tenure +
Type.of.Sale + Price.... + Area..Sqft. + Type.of.Area + Floor.Level,
data = train)$adj.r.squared
#test data
summary(No..of.Units ~ Type + Postal.District + Market.Segment + Tenure +
Type.of.Sale + Price.... + Area..Sqft. + Type.of.Area + Floor.Level,
data = test)$adj.r.squared
I want to find out how all the factors would affect the number of units sold which is why I placed it as the dependent variable. However, it would return the error - Error in summary(No..of.Units ~ Type + Postal.District + Market.Segment + :
$ operator is invalid for atomic vectors
Sorry if there is too much/too little code here. I'm quite new to R so I have no idea what affected my code to produce the error. Do let me know if there is anything else I should add and thank you for your help.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…