While setting k
is the correct way to go, fx = TRUE
is definitely not right: it will force using pure regression spline without penalization.
locations of knots
For penalized regression spline, the exact locations are not important, as long as:
k
is adequately big;
- the spread of knots has good, reasonable coverage.
By default:
- natural cubic regression spline
bs = 'cr'
places knots by quantile;
- B-splines family (
bs = 'bs'
, bs = 'ps'
, bs = 'ad'
) place knots evenly.
Compare the following:
library(mgcv)
## toy data
set.seed(0); x <- sort(rnorm(400, 0, pi)) ## note, my x are not uniformly sampled
set.seed(1); e <- rnorm(400, 0, 0.4)
y0 <- sin(x) + 0.2 * x + cos(abs(x))
y <- y0 + e
## fitting natural cubic spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
cr_knots <- cr_fit$smooth[[1]]$xp ## extract knots locations
## fitting B-spline
bs_fit <- gam(y ~ s(x, bs = 'bs', k = 20))
bs_knots <- bs_fit$smooth[[1]]$knots ## extract knots locations
## summary plot
par(mfrow = c(1,2))
plot(x, y, col= "grey", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
plot(x, y, col= "grey", main = "B-spline");
lines(x, bs_fit$linear.predictors, col = 2, lwd = 2)
abline(v = bs_knots, lty = 2)
You can see the difference in knots placement.
Setting your own knots locations:
You can also provide your customized knots locations via the knots
argument of gam()
(yes, knots are not fed to s()
, but to gam()
). For example, you can do evenly spaced knots for cr
:
xlim <- range(x) ## get range of x
myfit <- gam(y ~ s(x, bs = 'cr', k =20),
knots = list(x = seq(xlim[1], xlim[2], length = 20)))
Now you can see that:
my_knots <- myfit$smooth[[1]]$xp
plot(x, y, col= "grey", main = "my knots");
lines(x, myfit$linear.predictors, col = 2, lwd = 2)
abline(v = my_knots, lty = 2)
However, there is usually no need to set knots yourself. But if you do want to do this, you must be clear what you are doing. Also, the number of knots you provided must match k
in the s()
.