Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
320 views
in Technique[技术] by (71.8m points)

timezone - Convert date string that contains time zone to POSIXct in R

I have a vector with dates in this format (example of the first 6 rows):

 Dates<-c(
   "Sun Oct 04 20:33:05 EEST 2015",
   "Sun Oct 04 20:49:23 EEST 2015",
   "Sun Oct 04 21:05:25 EEST 2015",
   "Mon Sep 28 10:02:38 IDT 2015", 
   "Mon Sep 28 10:17:50 IDT 2015",
   "Mon Sep 28 10:39:48 IDT 2015")

I tried to read this variable Dates to R using as.Date() function:

as.Date(Dates,format = "%a %b %d %H:%M:%S %Z %Y")

but the process failed as %Z parameter is not supported in the input. The time zones differ throughout the vector. What are the alternatives to read data correctly with respect to the time zone?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This solution requires some simplifying assumptions. Assuming you have many elements in your vector, the best approach is to use a database of timezone offsets to figure out what each time is (in a chosen locale, such as GMT). The timezone data I used is the timezone.csv file from https://timezonedb.com/download

#Create sample data
Dates<-c(
  "Sun Oct 04 20:33:05 EEST 2015",
  "Sun Oct 04 20:49:23 EEST 2015",
  "Sun Oct 04 21:05:25 EEST 2015",
  "Mon Sep 28 10:02:38 IDT 2015", 
  "Mon Sep 28 10:17:50 IDT 2015",
  "Mon Sep 28 10:39:48 IDT 2015")

#separate timezone string from date/time info
no_timezone <- paste(substr(Dates, 1, 19), substr(Dates, nchar(Dates)-3, nchar(Dates)))
timezone <- as.data.frame(substr(Dates, 21, nchar(Dates)-5))
colnames(timezone) <- "abbreviation"

#reference timezone database to get offsets from GMT
timezone_db <- read.csv(file="timezonedb/timezone.csv", header=FALSE)
colnames(timezone_db) <- c("zone_id", "abbreviation", "time_start", "gmt_offset", "dst")
timezone_db <- timezone_db[timezone_db$dst == 0, ]
timezone_db <- unique(timezone_db[,c("abbreviation", "gmt_offset")])
timezone_db <- timezone_db[!duplicated(timezone_db$abbreviation), ]

#adjust all time to GMT
time_adjust <- merge(timezone, timezone_db, all.x=TRUE, by="abbreviation")
gmt_time <- strptime(no_timezone, format = "%a %b %d %H:%M:%S %Y", tz="GMT")

#final data
Dates_final <- gmt_time - time_adjust$gmt_offset

Depending on how exact your data needs to be, be careful to adjust for daylight savings if necessary. Also, I don't know much about time zones, but I noticed that for some reason, certain time zones can have multiple offsets. In the original database, CLT (Chilean time) can vary from 3-5 hours from GMT, for some reason.

For this exercise, my code simply takes the first of each time zone's offset from the database and assumes no daylight savings day. This may be sufficient if your work doesn't require such precision, but you should QA and validate your work either way.

Also, note that this solution should be robust for date changes as well. For example, if the time is adjusted from 1am to 11pm, then the date should revert back one day.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...