I am working with R. I am following this tutorial (https://cran.r-project.org/web/packages/GA/vignettes/GA.html) and am learning how to optimize functions using the "genetic algorithm".
The entire process is illustrated in the code below:
Part 1: Generate some sample data ("train_data")
Part 2: Define the "fitness function" : the objective of my problem is to generate 7 random numbers
"[1]"
(between 80 and 100)
"[2]"
(between 85 and 120)
"[3]"
(between 80 and 100)
"[4]"
(between 90 and 120)
"[5]"
(between 90 and 140)
"[6]"
(between 180 and 400)
"[7]"
(between 365 and 720)
and use these numbers to perform a series of data manipulation procedures on the train data. At the end of these data manipulation procedures, a "total" mean variable is calculated.
Part 3: The purpose of the "genetic algorithm" is to find the set of these 7 numbers that produce the largest value of the "total".
Below, I illustrate this entire process :
Part 1
#load libraries
library(dplyr)
library(GA)
# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)
Part 2
#define fitness function
fitness <- function(x) {
x1 = x[1]
x2 = x1 + x[2]
x3 = x[3]
x4 = x3 + x[4]
#bin data according to random criteria
train_data <- train_data %>%
mutate(cat = ifelse(a1 <= x1 & b1 <= x3, "a",
ifelse(a1 <= x2 & b1 <= x4, "b", "c")))
train_data$cat = as.factor(train_data$cat)
#new splits
a_table = train_data %>%
filter(cat == "a") %>%
select(a1, b1, c1, cat)
b_table = train_data %>%
filter(cat == "b") %>%
select(a1, b1, c1, cat)
c_table = train_data %>%
filter(cat == "c") %>%
select(a1, b1, c1, cat)
#calculate quantile ("quant") for each bin
table_a = data.frame(a_table%>% group_by(cat) %>%
mutate(quant = ifelse(c1 > x[5],1,0 )))
table_b = data.frame(b_table%>% group_by(cat) %>%
mutate(quant = ifelse(c1 > x[6],1,0 )))
table_c = data.frame(c_table%>% group_by(cat) %>%
mutate(quant = ifelse(c1 > x[7],1,0 )))
#group all tables
final_table = rbind(table_a, table_b, table_c)
# calculate the total mean : this is what needs to be optimized
mean = mean(final_table$quant)
}
Part 3
#run the genetic algorithm (20 times to keep it short):
GA <- ga(type = "real-valued",
fitness = fitness,
lower = c(80, 1, 80, 1, 90,180, 365), upper = c(100, 20, 100, 20, 140,400,720),
popSize = 50, maxiter = 20, run = 20)
The above code (Part 1, Part 2, Part 3) all work fine.
Problem: Now, I am trying to produce some the of the visual plots from the tutorial:
First Plot - This Works:
plot(GA)
But I can't seem to produce the other plots from the tutorial:
Second Plot (using any 2 vairables): Does Not Work
lbound <- c(80,80,80,80,0,0,0)
ubound <- c(120,120,120,120,1,1,1)
curve(fitness, from = lbound, to = ubound, n = 1000)
points(GA@solution, GA@fitnessValue, col = 2, pch = 19)
Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
Third Plot(using any 2 variables) : Does Not Work
x <- random_2 <- seq(80, 120, by = 0.1)
f <- outer(x1, x2, fitness)
persp3D(x1, x2, fitness, theta = 50, phi = 20, col.palette = bl2gr.colors)
Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
Error in z[-1, -1] : object of type 'closure' is not subsettable
Fourth Plot (using any 2 variables): Does Not Work
filled.contour(x1, x2, fitness, color.palette = bl2gr.colors)
Error in min(x, na.rm = na.rm) : invalid 'type' (list) of argument
Can someone please show me how to fix these errors?
Thanks
See Question&Answers more detail:
os