Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
374 views
in Technique[技术] by (71.8m points)

r - How to correctly `dput` a fitted linear model (by `lm`) to an ASCII file and recreate it later?

I want to persist a lm object to a file and reload it into another program. I know I can do this by writing/reading a binary file via saveRDS/readRDS, but I'd like to have an ASCII file instead of a binary file. At a more general level, I'd like to know why my idioms for reading in dput output in general is not behaving as I'd expect.

Below are examples of making a simple fit, and successful and unsuccessful recreations of the model:

dat_train <- data.frame(x=1:4, z=c(1, 2.1, 2.9, 4))
fit <- lm(z ~ x, dat_train)
rm(dat_train) # Just to make sure fit is not dependent upon `dat_train existence`

dat_score <- data.frame(x=c(1.5, 3.5))

## This works (of course)
predict(fit, dat_score)
#    1    2 
# 1.52 3.48

Saving to binary file works:

## http://stackoverflow.com/questions/5118074/reusing-a-model-built-in-r
saveRDS(fit, "model.RDS")
fit2 <- readRDS("model.RDS")
predict(fit2, dat_score)
#    1    2 
# 1.52 3.48

So does this (dput it in the R session not to a file):

fit2 <- eval(dput(fit))
predict(fit2, dat_score)
#    1    2 
# 1.52 3.48

But if I persist file to disk, I cannot figure out how to get back into normal shape:

dput(fit, file = "model.R")
fit3 <- source("model.R")$value

# Error in is.data.frame(data): object 'dat_train' not found

predict(fit3, dat_score)
# Error in predict(fit3, dat_score): object 'fit3' not found

Trying to be explicit with the eval does not work either:

## http://stackoverflow.com/questions/9068397/import-text-file-as-single-character-string
dput(fit, file="model.R")
fit4 <- eval(parse(text=paste(readLines("model.R"), collapse=" ")))

# Error in is.data.frame(data): object 'dat_train' not found

predict(fit4, dat_score)
# Error in predict(fit4, dat_score): object 'fit4' not found

In both cases above, I expect fit3 and fit4 to both work, but they don't recompile into a lm object that I can use with predict().

Can anyone advise me on how I can persist a model to a file with a structure(...) ASCII-like structure, and then re-read it back in as a lm object I can use in predict()? And why my current methods are not working?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Step 1:

You need to control de-parsing options:

dput(fit, control = c("quoteExpressions", "showAttributes"), file = "model.R") 

You can read more on all possible options in ?.deparseOpts.


The "quoteExpressions" wraps all calls / expressions / languages with quote, so that they are not evaluated when you later re-parse it. Note:

  • source is doing parsing;
  • call field in your fitted "lm" object is a call:

    fit$call
    # lm(formula = z ~ x, data = dat_train)
    

So, without "quoteExpressions", R will try to evaluate lm call during parsing. And if we evaluate it, it is fitting a linear model, and R will aim to find dat_train, which will not exist in your new R session.


The "showAttributes" is another mandatory option, as "lm" object has class attributes. You certainly don't want to discard all class attributes and only export a plain "list" object, right? Moreover, many elements in a "lm" object, like model (the model frame), qr (the compact QR matrix) and terms (terms info), etc all have attributes. You want to keep them all.


If you don't set control, the default setting with:

control = c("keepNA", "keepInteger", "showAttributes")

will be used. As you can see, there is no "quoteExpressions", so you will get into trouble.

You can also specify "keepInteger" and "keepNA", but I don't see the need for "lm" object.

------

Step 2:

The above step will get source working correctly. You can recover your model:

fit1 <- source("model.R")$value

However, it is not yet ready for generic functions like summary and predict to work. Why?

The critical issue is the terms object in fit1 is not really a "terms" object, but only a formula (it is even not a formula, but only a "language" object without "formula" class!). Just compare fit$terms and fit1$terms, and you will see the difference. Don't be surprised; we've set "quoteExpressions" earlier. While that is definitely helpful to prevent evaluation of call, it has side-effect for terms. So we need to reconstruct terms as best as we can.

Fortunately, it is sufficient to do:

fit1$terms <- terms.formula(fit1$terms)

Though this still does not recover all information in fit$terms (like variable classes are missing), it is readily a valid "terms" object.

Why is a "terms" object critical? Because all generic functions rely on it. You may not need to know more on this, as it is really technical, so I will stop here.

Once this is done, we can successfully use predict (and summary, too):

predict(fit1)  ## no `newdata` given, using model frame `fit1$model`
#   1    2    3    4 
#1.03 2.01 2.99 3.97 

predict(fit1, dat_score)  ## with `newdata`
#   1    2 
#1.52 3.48 

-------

Conclusion remark:

Although I have shown you how to get things work, I don't really recommend you doing this in general. An "lm" object will be pretty large when you fit a model to a large dataset, for example, residuals, fitted.values are long vectors, and qr and model are huge matrices / data frames. So think about this.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...