So this is VERY strange. RODBC seems to drop the time portion of DateTime SQL columns if the result set is large enough. (The queries are running against an SQL Server 2012 machine, and, yes, when I run them on the SQL Server side they produce identical and proper results, regardless of how many rows are returned.)
For example, the following works perfectly:
myconn <- odbcConnect(dsnName, uid, pwd)
results <- sqlQuery(myconn, "SELECT TOP 100 MyID, MyDateTimeColumn from MyTable ORDER BY MyDateTimeColumn DESC")
close(myconn)
In R, the following works as expected:
> results$MyDateTimeColumn[3]
[1] "2013-07-01 00:01:22 PDT"
which is a valid POSIXct
date time. However, when somewhere between 10,000 and 100,000 rows are returned, suddenly the time portion disappears:
myconn <- odbcConnect(dsnName, uid, pwd)
bigResults <- sqlQuery(myconn, "SELECT TOP 100000 MyID, MyDateTimeColumn from MyTable ORDER BY MyDateTimeColumn DESC")
close(myconn)
(same code, simply a larger number of rows returned; NOTE: the exact same row has now lost its time component), R responds:
> bigResults$MyDateTimeColumn[3]
[1] "2013-07-01 PDT"
Note that the time is now missing (this is not a different row; it's the exact same row as previous), as the following shows:
>strptime(results$TriggerTime[3], "%Y-%m-%d %H:%M:%S")
[1] "2013-07-01 00:01:22"
>strptime(bigResults$TriggerTime[3], "%Y-%m-%d %H:%M:%S")
[1] NA
Obviously the work-around is either incremental query-with-append or export-to-CSV-and-import-to-R, but this seems very odd. Anyone ever seen anything like this?
Config: I'm using the latest version of RODBC (1.3-10) and can duplicate the behavior on both an R installation running on Windows x64 and an R installation running on Mac OS X 10.9 (Mavericks).
EDIT #2 Adding output of dput()
to compare the objects, per request:
> dput(results[1:10,]$MyDateTimeColumn)
structure(c(1396909903.347, 1396909894.587, 1396909430.903, 1396907996.9, 1396907590.02, 1396906077.887, 1396906071.99, 1396905537.36, 1396905531.413, 1396905231.787), class = c("POSIXct", "POSIXt"), tzone = "")
> dput(bigResults[1:10,]$MyDateTimeColumn)
structure(c(1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000), class = c("POSIXct", "POSIXt"), tzone = "")
It would appear that the underlying data are actually changing as a result of the number of rows returned by the query, which is downright strange.
See Question&Answers more detail:
os