Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
239 views
in Technique[技术] by (71.8m points)

r - Tidymodels Tuning Recipe Parameters

Using tidymodels, I really love the possibility of tuning not only model parameters, but also some recipes steps. For example the number of components in step_pls(). The issue is that I'm finding trouble in limiting the range of possible values. For example, if I want to use step_umap I would like to limit the search space to 2:5 components. When I replace step_pls() by step_umap(), the following code causes the session to crash. It tries to build umap with around 50 components... So basically, my question is, while using grid_random or grid_max_entropy, how can I limit the range of search for a specific tuning parameter?

Note: also tried something like param_grid%>%grid_random(size=5,num_comp() %>% range_set(c(3, 5))). But seems to be ignored.

Thanks

# Load Packages -----------------------------------------------------------
require(tidyverse)
require(lubridate)
require(tidymodels)
require(rsample)
require(themis)
require(recipes)
require(embed)
# Load Data ---------------------------------------------------------------
data<-read_csv("....data.csv")

# Modelling - Data Partition ----------------------------------------------
split_prop <- 0.80
init_split <- initial_time_split(data, prop = split_prop)

set_train<-training(init_split)
set_test<-testing(init_split)

# Modelling - Resamples ---------------------------------------------------
valid_folds <- rsample::vfold_cv(set_train,v=5)

# Modelling - Data Transf -------------------------------------------------
recip_train <- recipe(label ~ .,
                      data = set_train)%>%
  step_normalize(all_predictors())%>%
  step_pls(all_predictors(),outcome = "label",num_comp = tune())

# Modelling - Model Specs ---------------------------------------------------
model_glm <- linear_reg()%>%
  set_args(penalty=tune(),
           mixture=tune())%>%
  set_mode("regression") %>%
  set_engine("glmnet")

# Workflow ------------------------------------------------------------------
wflw <- workflow() %>%
    add_recipe(recip_train) %>%
    add_model(model_glm)

# Modelling - Tuning Control -------------------------------------------------
ctr_tune <- control_grid(
  verbose = TRUE,
  allow_par = TRUE,
  extract = NULL,
  save_pred = TRUE,
  pkgs = NULL
  )

param_grid<-wflw %>%
    parameters()%>% 
    finalize(set_train)%>%  
    grid_max_entropy(size = 5)

# Modelling - Tuning ---------------------------------------------------------
tuning <- tune_grid(object = wflw,
                    resamples =  valid_folds,
                    grid = param_grid,
                    control = ctr_tune,
                    metrics = metric_set(rmse))
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you have a specific range for num_comp that you want to try out, I wouldn't bother with going to the workflow and getting the parameters, etc. I would set up the tuning grid with the parameters directly:

library(dials)
#> Loading required package: scales
grid_max_entropy(penalty(),
                 mixture(),
                 num_comp(range = c(2, 5)),
                 size = 5)
#> # A tibble: 5 x 3
#>         penalty mixture num_comp
#>           <dbl>   <dbl>    <int>
#> 1 0.00161        0.721         5
#> 2 0.751          0.376         4
#> 3 0.00000000974  0.395         3
#> 4 0.000107       0.0747        4
#> 5 0.0000000451   0.906         3

Created on 2020-07-19 by the reprex package (v0.3.0)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...