DISCLAIMER : This is a relatively long answer, not very clear, and not very interesting, so feel free to skip it or to only read the (sort of) conclusion.
I've tried a bit of tracing on
[<-.data.frame
, as suggested by Ari B. Friedman. Debugging starts on line 162 of the function, where there is a test to determine if value
(the replacement value argument) is not a list.
Case 1 : value
is not a list
Then it is considered as a vector. Matrices and arrays are considered as one vector, like the help page says :
Note that when the replacement value is an array (including a matrix)
it is not treated as a series of columns (as 'data.frame’ and
‘as.data.frame’ do) but inserted as a single column.
If only one column of the data frame is selected in the LHS, then the only constraint is that the number of rows to be replaced must be equal to or a multiple of length(value)
. If this is the case, value
is recycled with rep
if necessary and converted to a list. If length(value)==0
, there is no recycling (as it is impossible), and value
is just converted to a list.
If several columns of the data frame are selected in the LHS, then the constraint is a bit more complex : length(value)
must be equal to or a multiple of the total number of elements to be replaced, ie the number of rows * the number of columns.
The exact test is the following :
(m < n * p && (m == 0L || (n * p)%%m))
Where n
is the number of rows, p
the number of columns, and m
the length of value
. If the condition is FALSE, then value
is converted into an n x p
matrix (thus recycled if necessary) and the matrix is splitted by columns into a list.
If value
is NULL, then the condition is TRUE as m==0
, and the function is stopped.
Note that the problem occurs for every value
of length 0. For example,
cars1[,c("mpg")] <- numeric(0)
works, whereas :
cars1[,c("mpg","disp")] <- numeric(0)
fails in the same way as cars1[,c("mpg","disp")] <- NULL
Case 2 : value
is a list
If value
is a list, then it is used to replace several columns at the same time. For example :
cars1[,c("mpg","disp")] <- list(1,2)
will replace cars1$mpg
with a vector of 1s, and cars1$disp
with a vector of 2s.
There is a sort of "double recycling" which happens here :
- first, the length of the
value
list must be less than or equal to the number of columns to be replaced. If it is less, then a classic recycling is done.
- second, for each element of the
value
list, its length must be equal to, greater than or a multiple of the number of rows to be replaced. If it is less, another recycling is done for each list element to match the number of rows. If it is more, a warning is displayed.
When the value
in RHS is list(NULL)
, nothing really happens, as recycling is impossible (rep(NULL, 10)
is always NULL
). But the code continues and in the end each column to be replaced is assigned NULL
, ie is removed.
Summary and (sort of) conclusion
data.frame
and list
behave differently because of the specific constraint on data frames, where each element must be of the same length. Removing several columns by assigning NULL
fails not because of the NULL
value by itself, but because NULL
is of length 0. The error comes from a test which verifies if the length of the assigned value is a multiple of the number of elements to be replaced (number of rows * number of columns).
Handling the case of value=NULL
for multiple columns doesn't seem difficult (by adding about four lines of simple code), but it requires to consider NULL
as a special case. I'm not able to determine if it is not handled because it would break the logic of the function implementation, or because it would have side effects I don't know.