This is a tricky one. My approach makes use of regular expressions and the (?(DEFINE)...)
syntax which is only supported by the newer regex
module.
Essentially,
DEFINE
let us define subroutines prior to matching them, so first of all we define all needed bricks for our date guessing function:
(?(DEFINE)
(?P<year_def>[12]d{3})
(?P<year_short_def>d{2})
(?P<month_def>January|February|March|April|May|June|
July|August|September|October|November|December)
(?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
(?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01]))
(?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day)
(?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun)
(?P<hms_def>d{2}:d{2}:d{2})
(?P<hm_def>d{2}:d{2})
(?P<ms_def>d{5,6})
(?P<delim_def>([-/., ]+|(?<=d|^)T))
)
# actually match them
(?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)|
(?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$)
""", re.VERBOSE)
After this, we need to think of possible delimiters:
# delim
delim = re.compile(r'([-/., ]+|(?<=d)T)')
Format mapping:
# formats
formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''}
The function GuessFormat()
splits the parts with the help of the delimiters, tries to match them and outputs the corresponding code for strftime()
:
def GuessFormat(datestring):
# define the bricks
bricks = re.compile(r"""
(?(DEFINE)
(?P<year_def>[12]d{3})
(?P<year_short_def>d{2})
(?P<month_def>January|February|March|April|May|June|
July|August|September|October|November|December)
(?P<month_short_def>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
(?P<day_def>(?:0[1-9]|[1-9]|[12][0-9]|3[01]))
(?P<weekday_def>(?:Mon|Tue|Wednes|Thurs|Fri|Satur|Sun)day)
(?P<weekday_short_def>Mon|Tue|Wed|Thu|Fri|Sat|Sun)
(?P<hms_def>T?d{2}:d{2}:d{2})
(?P<hm_def>T?d{2}:d{2})
(?P<ms_def>d{5,6})
(?P<delim_def>([-/., ]+|(?<=d|^)T))
)
# actually match them
(?P<hms>^(?&hms_def)$)|(?P<year>^(?&year_def)$)|(?P<month>^(?&month_def)$)|(?P<month_short>^(?&month_short_def)$)|(?P<day>^(?&day_def)$)|
(?P<weekday>^(?&weekday_def)$)|(?P<weekday_short>^(?&weekday_short_def)$)|(?P<hm>^(?&hm_def)$)|(?P<delim>^(?&delim_def)$)|(?P<ms>^(?&ms_def)$)
""", re.VERBOSE)
# delim
delim = re.compile(r'([-/., ]+|(?<=d)T)')
# formats
formats = {'ms': '%f', 'year': '%Y', 'month': '%B', 'month_dec': '%m', 'day': '%d', 'weekday': '%A', 'hms': '%H:%M:%S', 'weekday_short': '%a', 'month_short': '%b', 'hm': '%H:%M', 'delim': ''}
parts = delim.split(datestring)
out = []
for index, part in enumerate(parts):
try:
brick = dict(filter(lambda x: x[1] is not None, bricks.match(part).groupdict().items()))
key = next(iter(brick))
# ambiguities
if key == 'day' and index == 2:
key = 'month_dec'
item = part if key == 'delim' else formats[key]
out.append(item)
except AttributeError:
out.append(part)
return "".join(out)
A test in the end:
import regex as re
datestrings = [datetime.now().isoformat(), '2006-11-02', 'Thursday, 10 August 2006 08:42:51', 'August 9, 1995', 'Aug 9, 1995', 'Thu, 01 Jan 1970 00:00:00', '21/11/06 16:30',
'06 Jun 2017 20:33:10']
# test
for dt in datestrings:
print("Date: {}, Format: {}".format(dt, GuessFormat(dt)))
This yields:
Date: 2017-06-07T22:02:05.001811, Format: %Y-%m-%dT%H:%M:%S.%f
Date: 2006-11-02, Format: %Y-%m-%d
Date: Thursday, 10 August 2006 08:42:51, Format: %A, %m %B %Y %H:%M:%S
Date: August 9, 1995, Format: %B %m, %Y
Date: Aug 9, 1995, Format: %b %m, %Y
Date: Thu, 01 Jan 1970 00:00:00, Format: %a, %m %b %Y %H:%M:%S
Date: 21/11/06 16:30, Format: %d/%m/%d %H:%M
Date: 06 Jun 2017 20:33:10, Format: %d %b %Y %H:%M:%S