r - mgcv gam() error: model has more coefficients than data

Question

Welcome To Ask or Share your Answers For Others

r - mgcv gam() error: model has more coefficients than data

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - mgcv gam() error: model has more coefficients than data

I am using GAM (generalized additive models) for my dataset. This dataset has 32 observations, with 6 predictor variables and a response variable (namely power). I am using gam() function of the mgcv package to fit the models. Whenever, I try to fit a model I do get an error message as:

Error in gam(formula.hh, data = data, na.action = na.exclude,  : 
  Model has more coefficients than data

From this error message, I infer that I have more predictor variables as compared to the number of observations. I guess this error is generated during cross-validation procedures. Is there any way to handle this error?

I am using following code for this,

library(mgcv)
formula.hh <- as.formula(power ~ s(temperature) 
                                + s(prevday1) + s(prevday2)
                                + s(prev_2_hour) + s(prev_instant1))
model <- gam(formula.hh, data = data, na.action = na.exclude)

Here, I am attaching the data with dput() function

> dput(data)
data <- structure(list(power = c(250.615931666667, 252.675878333333, 
1578.209605, 186.636575166667, 1062.07912666667, 1031.481235, 
1584.38902166667, 276.973836666667, 401.620463333333, 1622.50827666667, 
273.825153333333, 1511.37474333333, 291.460865, 215.138178333333, 
247.509348333333, 1140.21383833333, 1680.63441666667, 1742.44168333333, 
592.162706166667, 1610.7307, 615.857495, 1664.13551, 464.973065, 
1956.2482, 1767.94469333333, 1869.02678333333, 1806.731, 1746.3731, 
549.216605, 1425.42390166667, 1900.32575, 1766.18103333333), 
    temperature = c(31, 30, 28, 28, 27, 31, 32, 32, 30.5, 33, 
    33, 30, 32, 24, 30, 26, 28, 32, 34, 25, 32, 33, 35, 36, 36, 
    37, 35, 33, 35, 33, 35, 32), prevday1 = c(NA, 250.615931666667, 
    252.675878333333, 1578.209605, 186.636575166667, 1062.07912666667, 
    1031.481235, 1584.38902166667, 276.973836666667, 401.620463333333, 
    1622.50827666667, 273.825153333333, 1511.37474333333, 291.460865, 
    215.138178333333, 247.509348333333, 1140.21383833333, 1680.63441666667, 
    1742.44168333333, 592.162706166667, 1610.7307, 615.857495, 
    1664.13551, 464.973065, 1956.2482, 1767.94469333333, 1869.02678333333, 
    1806.731, 1746.3731, 549.216605, 1425.42390166667, 1900.32575
    ), prevday2 = c(NA, NA, 250.615931666667, 252.675878333333, 
    1578.209605, 186.636575166667, 1062.07912666667, 1031.481235, 
    1584.38902166667, 276.973836666667, 401.620463333333, 1622.50827666667, 
    273.825153333333, 1511.37474333333, 291.460865, 215.138178333333, 
    247.509348333333, 1140.21383833333, 1680.63441666667, 1742.44168333333, 
    592.162706166667, 1610.7307, 615.857495, 1664.13551, 464.973065, 
    1956.2482, 1767.94469333333, 1869.02678333333, 1806.731, 
    1746.3731, 549.216605, 1425.42390166667), prev_instant1 = c(NA, 
    237.211388333333, 455.932271666667, 367.837349666667, 1230.40137333333, 
    1080.74080166667, 1898.06056666667, 326.103031666667, 302.770571666667, 
    1859.65283333333, 281.700161666667, 1684.32288333333, 291.448878333333, 
    214.838578333333, 254.042623333333, 1380.14074333333, 824.437228333333, 
    1660.46284666667, 268.004111666667, 1715.02763333333, 1853.08503333333, 
    1821.31845, 1173.91945333333, 1859.87353333333, 1887.67635, 
    1760.29563333333, 1876.05421666667, 1743.10665, 366.382048333333, 
    1185.16379, 1713.98534666667, 1746.36006666667), prev_instant2 = c(NA, 
    275.55167, 242.638122833333, 220.635857, 1784.77271666667, 
    1195.45020333333, 590.114391666667, 310.141536666667, 1397.3184605, 
    1747.44398333333, 260.10318, 1521.77355833333, 283.317726666667, 
    206.678135, 231.428693833333, 235.600631666667, 232.455201666667, 
    281.422625, 256.470893333333, 1613.82088333333, 1564.34841666667, 
    1795.03498333333, 1551.64725666667, 1517.69289833333, 1596.66556166667, 
    2767.82433333333, 2949.38005, 328.691775, 389.83789, 1805.71815333333, 
    1153.97645666667, 1752.75968333333), prev_2_hour = c(NA, 
    219.024983, 313.393630708333, 263.748829166667, 931.193606666667, 
    699.399163791667, 754.018962083334, 272.22309625, 595.954508875, 
    1597.21487208333, 512.64361, 1236.42579666667, 281.200373333334, 
    196.983981666666, 230.327737625, 525.483920416666, 391.120302791667, 
    610.101280416667, 247.710625543785, 978.741044166665, 979.658926666667, 
    1189.25306041667, 814.840889166667, 989.059700416665, 1352.2367025, 
    1770.20417833333, 1847.11590666667, 843.191556416666, 363.50806625, 
    904.924465041666, 841.746712500002, 1747.73452958333)), .Names = c("power", 
"temperature", "prevday1", "prevday2", "prev_instant1", "prev_instant2", 
"prev_2_hour"), class = "data.frame", row.names = c(NA, 32L))

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:36:25+0000

This dataset has 32 observations.

Actually, only 30 as two rows have NA.

From this error message, I infer that I have more predictor variables as compared to the number of observations.

Yes. By default, the s() choose basis dimension (or rank) to be 10 for 1D smoother, giving 10 raw parameters. After centering constraint (see ?identifiability) you get one fewer parameter, but you still have 9 parameters for each smooth. Note that you have 5 smooths! So you have 45 parameters for smooth terms, plus an intercept. This is greater than your 30 data.

I guess this error is generated during cross-validation procedures.

No. This error is detected as soon as GAM formula has been interpreted and model frame been constructed. Even before real basis construction we can already know what is n (number of data) and what is p (number of parameters).

Is there any way to handle this error?

Reduce k manually rather than using default. However for cubic spline the minimum k is 3. For example, use s(temperature, bs = 'cr', k = 3). Note I have also set bs = 'cr' to use natural cubic spline, not the default bs = 'tp' for thin-plate regression spline. You can use it of course, but for 1D smooth "cr" is a natural choice.

Categories

r - mgcv gam() error: model has more coefficients than data

r - mgcv gam() error: model has more coefficients than data

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags