Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

r - mgcv gam() error: model has more coefficients than data

I am using GAM (generalized additive models) for my dataset. This dataset has 32 observations, with 6 predictor variables and a response variable (namely power). I am using gam() function of the mgcv package to fit the models. Whenever, I try to fit a model I do get an error message as:

Error in gam(formula.hh, data = data, na.action = na.exclude,  : 
  Model has more coefficients than data

From this error message, I infer that I have more predictor variables as compared to the number of observations. I guess this error is generated during cross-validation procedures. Is there any way to handle this error?

I am using following code for this,

library(mgcv)
formula.hh <- as.formula(power ~ s(temperature) 
                                + s(prevday1) + s(prevday2)
                                + s(prev_2_hour) + s(prev_instant1))
model <- gam(formula.hh, data = data, na.action = na.exclude)

Here, I am attaching the data with dput() function

> dput(data)
data <- structure(list(power = c(250.615931666667, 252.675878333333, 
1578.209605, 186.636575166667, 1062.07912666667, 1031.481235, 
1584.38902166667, 276.973836666667, 401.620463333333, 1622.50827666667, 
273.825153333333, 1511.37474333333, 291.460865, 215.138178333333, 
247.509348333333, 1140.21383833333, 1680.63441666667, 1742.44168333333, 
592.162706166667, 1610.7307, 615.857495, 1664.13551, 464.973065, 
1956.2482, 1767.94469333333, 1869.02678333333, 1806.731, 1746.3731, 
549.216605, 1425.42390166667, 1900.32575, 1766.18103333333), 
    temperature = c(31, 30, 28, 28, 27, 31, 32, 32, 30.5, 33, 
    33, 30, 32, 24, 30, 26, 28, 32, 34, 25, 32, 33, 35, 36, 36, 
    37, 35, 33, 35, 33, 35, 32), prevday1 = c(NA, 250.615931666667, 
    252.675878333333, 1578.209605, 186.636575166667, 1062.07912666667, 
    1031.481235, 1584.38902166667, 276.973836666667, 401.620463333333, 
    1622.50827666667, 273.825153333333, 1511.37474333333, 291.460865, 
    215.138178333333, 247.509348333333, 1140.21383833333, 1680.63441666667, 
    1742.44168333333, 592.162706166667, 1610.7307, 615.857495, 
    1664.13551, 464.973065, 1956.2482, 1767.94469333333, 1869.02678333333, 
    1806.731, 1746.3731, 549.216605, 1425.42390166667, 1900.32575
    ), prevday2 = c(NA, NA, 250.615931666667, 252.675878333333, 
    1578.209605, 186.636575166667, 1062.07912666667, 1031.481235, 
    1584.38902166667, 276.973836666667, 401.620463333333, 1622.50827666667, 
    273.825153333333, 1511.37474333333, 291.460865, 215.138178333333, 
    247.509348333333, 1140.21383833333, 1680.63441666667, 1742.44168333333, 
    592.162706166667, 1610.7307, 615.857495, 1664.13551, 464.973065, 
    1956.2482, 1767.94469333333, 1869.02678333333, 1806.731, 
    1746.3731, 549.216605, 1425.42390166667), prev_instant1 = c(NA, 
    237.211388333333, 455.932271666667, 367.837349666667, 1230.40137333333, 
    1080.74080166667, 1898.06056666667, 326.103031666667, 302.770571666667, 
    1859.65283333333, 281.700161666667, 1684.32288333333, 291.448878333333, 
    214.838578333333, 254.042623333333, 1380.14074333333, 824.437228333333, 
    1660.46284666667, 268.004111666667, 1715.02763333333, 1853.08503333333, 
    1821.31845, 1173.91945333333, 1859.87353333333, 1887.67635, 
    1760.29563333333, 1876.05421666667, 1743.10665, 366.382048333333, 
    1185.16379, 1713.98534666667, 1746.36006666667), prev_instant2 = c(NA, 
    275.55167, 242.638122833333, 220.635857, 1784.77271666667, 
    1195.45020333333, 590.114391666667, 310.141536666667, 1397.3184605, 
    1747.44398333333, 260.10318, 1521.77355833333, 283.317726666667, 
    206.678135, 231.428693833333, 235.600631666667, 232.455201666667, 
    281.422625, 256.470893333333, 1613.82088333333, 1564.34841666667, 
    1795.03498333333, 1551.64725666667, 1517.69289833333, 1596.66556166667, 
    2767.82433333333, 2949.38005, 328.691775, 389.83789, 1805.71815333333, 
    1153.97645666667, 1752.75968333333), prev_2_hour = c(NA, 
    219.024983, 313.393630708333, 263.748829166667, 931.193606666667, 
    699.399163791667, 754.018962083334, 272.22309625, 595.954508875, 
    1597.21487208333, 512.64361, 1236.42579666667, 281.200373333334, 
    196.983981666666, 230.327737625, 525.483920416666, 391.120302791667, 
    610.101280416667, 247.710625543785, 978.741044166665, 979.658926666667, 
    1189.25306041667, 814.840889166667, 989.059700416665, 1352.2367025, 
    1770.20417833333, 1847.11590666667, 843.191556416666, 363.50806625, 
    904.924465041666, 841.746712500002, 1747.73452958333)), .Names = c("power", 
"temperature", "prevday1", "prevday2", "prev_instant1", "prev_instant2", 
"prev_2_hour"), class = "data.frame", row.names = c(NA, 32L))
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This dataset has 32 observations.

Actually, only 30 as two rows have NA.

From this error message, I infer that I have more predictor variables as compared to the number of observations.

Yes. By default, the s() choose basis dimension (or rank) to be 10 for 1D smoother, giving 10 raw parameters. After centering constraint (see ?identifiability) you get one fewer parameter, but you still have 9 parameters for each smooth. Note that you have 5 smooths! So you have 45 parameters for smooth terms, plus an intercept. This is greater than your 30 data.

I guess this error is generated during cross-validation procedures.

No. This error is detected as soon as GAM formula has been interpreted and model frame been constructed. Even before real basis construction we can already know what is n (number of data) and what is p (number of parameters).

Is there any way to handle this error?

Reduce k manually rather than using default. However for cubic spline the minimum k is 3. For example, use s(temperature, bs = 'cr', k = 3). Note I have also set bs = 'cr' to use natural cubic spline, not the default bs = 'tp' for thin-plate regression spline. You can use it of course, but for 1D smooth "cr" is a natural choice.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...