I'm currently working on a report using PANDAS that takes data from an API and formats it for some people. The main issue revolves around a column called "Status Date". This column pulls data from two API fields: one that displays when an application is filed, and one that displays when the application is awarded. As of right now we're just dealing with filed scholarships, and I have code written that merges the two fields based on which one is null:
merged.loc[merged["Status Date"].isnull(),'Status Date'] = merged["Categorized At"]
This does the job. However, both API fields are hardcoded with UTC+0000.
This is an issue, because out UTC offset is -0600 and the API pulls in +0000. This would cause an issue where any scholarships filed (the data we're working with) after 18:00:00 would be categorized into the next day. For example:
Status Date
12/5/20
when it should be:
Status Date
12/4/20
To fix this, I used Pytz to write a code that would turn anything in this column into a DateTime object and localize it:
merged['Status Date'] = pd.to_datetime(merged['Status Date'], errors='coerce')
merged['Status Date'] = merged['Status Date'].dt.tz_localize('UTC').dt.tz_convert('America/Chicago').dt.date
However, this managed to create new issues. While anything that was categorized after 18:00:00 (like the above example) was fixed, there are now some dates that are pulling the previous date. For example:
Status Date
1/4/21
should be
Status Date
1/5/21
I'm completely lost with how to fix this. I can't tell if there's any major issues with my code, and we need this fixed soon.
question from:
https://stackoverflow.com/questions/65941344/completely-lost-on-an-issue-with-timezones-using-pytz-and-python 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…