I think Hadley would be the best person to explain to you, but I will give it a shot.
%.%
is a binary operator called chain operator. In R
you can pretty much define any binary operator of your own with the special character %
. From what I have seem, we pretty much use it to make easier "chainable" syntaxes (like x+y
, much better than sum(x,y)
). You can do really cool stuff with them, see this cool example here.
What is the purpose of %.%
in dplyr
? To make it easier for you to express yourself, reducing the gap between what you want to do and how you express it.
Taking the example from the introduction to dplyr, let's suppose you want to group flights by year, month and day, select those variables plus the delays in arrival and departure, summarise these by the mean and then filter just those delays over 30. If there were no %.%
, you would have to write like this:
filter(
summarise(
select(
group_by(hflights, Year, Month, DayofMonth),
Year:DayofMonth, ArrDelay, DepDelay
),
arr = mean(ArrDelay, na.rm = TRUE),
dep = mean(DepDelay, na.rm = TRUE)
),
arr > 30 | dep > 30
)
It does the job. But it is pretty difficult to express yourself and to read it. Now, you can write the same thing with a more friendly syntax using the chain operator %.%
:
hflights %.%
group_by(Year, Month, DayofMonth) %.%
select(Year:DayofMonth, ArrDelay, DepDelay) %.%
summarise(
arr = mean(ArrDelay, na.rm = TRUE),
dep = mean(DepDelay, na.rm = TRUE)
) %.%
filter(arr > 30 | dep > 30)
It is easier both to write and read!
And how does that work?
Let's take a look at the definitions. First for %.%
:
function (x, y)
{
chain_q(list(substitute(x), substitute(y)), env = parent.frame())
}
It uses another function called chain_q
. So let's look at it:
function (calls, env = parent.frame())
{
if (length(calls) == 0)
return()
if (length(calls) == 1)
return(eval(calls[[1]], env))
e <- new.env(parent = env)
e$`__prev` <- eval(calls[[1]], env)
for (call in calls[-1]) {
new_call <- as.call(c(call[[1]], quote(`__prev`), as.list(call[-1])))
e$`__prev` <- eval(new_call, e)
}
e$`__prev`
}
What does that do?
To simplify things, let's assume you called: group_by(hflights,Year, Month, DayofMonth) %.% select(Year:DayofMonth, ArrDelay, DepDelay)
.
Your calls x
and y
are then both group_by(hflights,Year, Month, DayofMonth)
and select(Year:DayofMonth, ArrDelay, DepDelay)
. So the function creates a new environment called e
(e <- new.env(parent = env)
) and saves an object called __prev
with the evaluation of the first call (e$'__prev' <- eval(calls[[1]], env)
. Then for each other call it creates another call whose first argument is the previous call - that is __prev
- in our case it would be select('__prev', Year:DayofMonth, ArrDelay, DepDelay)
- so it "chains" the calls inside the loop.
Since you can use binary operators one over another, you actually can use this syntax to express very complex manipulations in a very readable way.