I've seen this question come up a lot but have yet to find a satisfactory solution, particularly for my case.
I am running partial least squares regression in R using pls() package, and would then like to calculate root mean square error of prediction using RMSEP() on newdata using the fitted model. This throws up the error, and I believe it is specifically because I am coding the function as follows:
plsr( Y ~ X[whatever , whatever ] ...
where I need to index specific parts of dataframe$X. Here is an example:
library(pls)
gasoline <- gasoline
#Split dataframe between training and testing data
set.seed(123)
split <- sample.split(gasoline$octane, SplitRatio = 0.70)
gasoline$train <- split
gas.fit <- plsr(octane ~ NIR[ ,1:10] + NIR[ ,20:30],
ncomp = 10,
data = gasoline[gasoline$train ,],
validation = "LOO",
scale = FALSE,
center = TRUE,
method = "simpls"
)
#I can use RMSEP() on the fitted model
RMSEP(gas.fit)
#I can use the fitted model to predict octane of my test set
predict(gas.fit, newdata = gasoline[!gasoline$train ,])
#But I cannot get the RMSEP of the test predictions
RMSEP(gas.fit, estimate = "test", newdata = gasoline[!gasoline$train ,])
This last command throws up the error:
Error in eval(predvars, data, env) : object 'NIR' not found
What I know:
I know the object 'NIR' should be present, since I've opted to combine train and test data into a single dataframe.
RMSEP() function works fine on models of style "plsr( Y ~ X[whatever , whatever ]" as long as you don't call newdata.
predict() function works fine in both cases.
What I've tried:
Mevik & Wehrens (2007) insist we use the format
plsr( octane ~ NIR,
...
data = gasoline
...)
and not
plsr( gasoline$octane ~ gasoline$NIR,
which is more akin to what I am doing in my example, but not exactly the same. Even so, I've tried the following adjustment:
gas.fit <- plsr(octane ~ NIR,
ncomp = 10,
data = c(
gasoline[gasoline$train ,]$NIR[ , 1:10],gasoline[gasoline$train ,]$NIR[ ,20:30]
),
validation = "LOO",
scale = FALSE,
center = TRUE,
method = "simpls"
)
But this is no good either ('envir' not of length one); also it means I have to include an additional gasoline$octane as well which further violates the length criterion.
I'd really like to find a solution to this approach as my end use goal is to include the plsr() model in a for() loop of the style:
gas.fit <- plsr(octane ~ NIR[ ,i:(i+20)],
as part of a Moving Window PLSR algorithm.
question from:
https://stackoverflow.com/questions/65836553/error-in-evalpredvars-data-env-object-not-found-in-r-pls