Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
291 views
in Technique[技术] by (71.8m points)

r - When to use 'with' function and why is it good?

What are the benefits of using with()? In the help file it mentions it evaluates the expression in an environment it creates from the data. What are the benefits of this? Is it faster to create an environment and evaluate it in there as opposed to just evaluating it in the global environment? Or is there something else I'm missing?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

with is a wrapper for functions with no data argument

There are many functions that work on data frames and take a data argument so that you don't need to retype the name of the data frame for every time you reference a column. lm, plot.formula, subset, transform are just a few examples.

with is a general purpose wrapper to let you use any function as if it had a data argument.

Using the mtcars data set, we could fit a model with or without using the data argument:

# this is obviously annoying
mod = lm(mtcars$mpg ~ mtcars$cyl + mtcars$disp + mtcars$wt)

# this is nicer
mod = lm(mpg ~ cyl + disp + wt, data = mtcars)

However, if (for some strange reason) we wanted to find the mean of cyl + disp + wt, there is a problem because mean doesn't have a data argument like lm does. This is the issue that with addresses:

# without with(), we would be stuck here:
z = mean(mtcars$cyl + mtcars$disp + mtcars$wt)

# using with(), we can clean this up:
z = with(mtcars, mean(cyl + disp + wt))

Wrapping foo() in with(data, foo(...)) lets us use any function foo as if it had a data argument - which is to say we can use unquoted column names, preventing repetitive data_name$column_name or data_name[, "column_name"].

When to use with

Use with whenever you like interactively (R console) and in R scripts to save typing and make your code clearer. The more frequently you would need to re-type your data frame name for a single command (and the longer your data frame name is!), the greater the benefit of using with.

Also note that with isn't limited to data frames. From ?with:

For the default with method this may be an environment, a list, a data frame, or an integer as in sys.call.

I don't often work with environments, but when I do I find with very handy.

When you need pieces of a result for one line only

As @Rich Scriven suggests in comments, with can be very useful when you need to use the results of something like rle. If you only need the results once, then his example with(rle(data), lengths[values > 1]) lets you use the rle(data) results anonymously.

When to avoid with

When there is a data argument

Many functions that have a data argument use it for more than just easier syntax when you call it. Most modeling functions (like lm), and many others too (ggplot!) do a lot with the provided data. If you use with instead of a data argument, you'll limit the features available to you. If there is a data argument, use the data argument, not with.

Adding to the environment

In my example above, the result was assigned to the global environment (bar = with(...)). To make an assignment inside the list/environment/data, you can use within. (In the case of data.frames, transform is also good.)

In packages

Don't use with in R packages. There is a warning in help(subset) that could apply just about as well to with:

Warning This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

If you build an R package using with, when you check it you will probably get warnings or notes about using variables without a visible binding. This will make the package unacceptable by CRAN.

Alternatives to with

Don't use attach

Many (mostly dated) R tutorials use attach to avoid re-typing data frame names by making columns accessible to the global environment. attach is widely considered to be bad practice and should be avoided. One of the main dangers of attach is that data columns can become out of sync if they are modified individually. with avoids this pitfall because it is invoked one expression at a time. There are many, many questions on Stack Overflow where new users are following an old tutorial and run in to problems because of attach. The easy solution is always don't use attach.

Using with all the time seems too repetitive

If you are doing many steps of data manipulation, you may find yourself beginning every line of code with with(my_data, .... You might think this repetition is almost as bad as not using with. Both the data.table and dplyr packages offer efficient data manipulation with non-repetitive syntax. I'd encourage you to learn to use one of them. Both have excellent documentation.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...