Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
396 views
in Technique[技术] by (71.8m points)

r - What are the caveats of using source versus parse & eval?

Short version

Can I replace

source(filename, local = TRUE, encoding = 'UTF-8')

with

eval(parse(filename, encoding = 'UTF-8'))

without any risk of breakage, to make UTF-8 source files work on Windows?

Long version

I am currently loading specific source files via

source(filename, local = TRUE, encoding = 'UTF-8')

However, it is well known that this does not work on Windows, full stop.

As a workaround, Joe Cheng suggested using instead

eval(parse(filename, encoding = 'UTF-8'))

This seems to work quite well1 but even after consulting the source code of source, I don’t understand how they differ in one crucial detail:

Both source and sys.source do not simply parse and then eval the file content. Instead, they parse the file content and then iterate manually over the parsed expressions, and eval them one by one. I do not understand why this would be necessary in sys.source (source at least uses it to show verbose diagnostics, if so instructed; but sys.source does nothing of the kind):

for (i in seq_along(exprs)) eval(exprs[i], envir)

What is the purpose of evaling statements separately? And why is it iterating over indices instead directly over the sub-expressions? What other caveats are there?

To clarify: I am not concerned about the additional parameters of source and parse, some of which may be set via options.


1 The reason that source is tripped up by the encoding but parse isn’t boils down to the fact that source attempts to convert the input text. parse does no such thing, it reads the file’s byte content as-is and simply marks its Encoding as UTF-8 in memory.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is not a full answer as it primarily addresses the seq_along part of the question, but too lengthy to include as comments.

One key difference between the seq_along followed by [ vs just using for i in x approach (which I believe is be similar to seq_along followed by [[ instead of [) is that the former preserves the expression. Here is an example to illustrate the difference:

> txt <- "x <- 1 + 1
+ # abnormal expression
+   2 *
+     3
+ "
> x <- parse(text=txt, keep.source=TRUE)
> 
> for(i in x) print(i)
x <- 1 + 1
2 * 3
> for(i in seq_along(x)) print(x[i])
expression(x <- 1 + 1)
expression(2 *
    3)

Alternatively:

> attributes(x[[2]])
NULL
> attributes(x[2])
$srcref
$srcref[[1]]
2 *
    3

Whether this has any practical impact when comparing to eval(parse(..., keep.source=T)), I can only say that it could, but can't imagine a situation where it does.

Note that subsetting expression separately also leads to the srcref business getting subset, which could conceivably be useful (...maybe?).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...