python - Pandas: create timestamp from 3 columns: Month, Day, Hour

Question

Welcome To Ask or Share your Answers For Others

python - Pandas: create timestamp from 3 columns: Month, Day, Hour

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas: create timestamp from 3 columns: Month, Day, Hour

I'm using Python 2.7, panda 0.14.1-2, numpy 1.8.1-1. I have to use Python 2.7 because I'm coupling it with something that doesn't work on Python 3

I'm trying to analyze a csv files that outputs Month, Day and Hour in separate columns, and would look something like the following:

Month Day Hour Value 1 1 1 105 1 1 2 30 1 1 3 85 1 1 4 52 1 1 5 65

I basically want to create a timestamp from those columns, and use "2005" as the year, and set this new timestamp column to be the index. I've read a lot of similar questions (here and here) but they all rely on doing during read_csv(). I don't have a year column, so I don't think this applies to me (aside from loading dataframe, inserting column, writing, and redoing read_csv... seems convoluted).

After loading the dataframe, I insert a Year column in position 0 df.insert(0, "Year", 2005)

So now I've got

Year Month Day Hour Value 2005 1 1 1 105 2005 1 1 2 30 2005 1 1 3 85 2005 1 1 4 52 2005 1 1 5 65 df.types tells me that all columns are int64 types.

Then I've tried doing this:

df['Datetime'] = pd.to_datetime(df.Year*1000000 + df.Month*10000 + df.Day+100 + df.Hour, format="%Y%M%d%H")

But I'm getting "TypeError: 'long' object is unsliceable"

On the other hand, the following runs without errors.

df['Datetime'] = pd.to_datetime(df.Year*10000 + df.Month*100 + df.Day, format="%Y%M%d")

As 2.7 doesn't like the %Y%M%d%H, as pointed by @EdChum, I've tried doing it in two steps: creating a datetime without hours, and adding the hours after. But: the output is not what I wanted

In [1]: # Do it without hours first (otherwise doesn't work in Python 2.7)
df['Datetime'] = pd.to_datetime(df.Year*10000 + df.Month*100 + df.Day, format="%Y%M%d")

In [2]: df['Datetime']
Out [2]:
0    2005-01-01 00:01:00
1    2005-01-01 00:01:00
...
13   2005-01-01 00:01:00
14   2005-01-01 00:01:00
...
8745   2005-01-31 00:12:00
8746   2005-01-31 00:12:00
...
8758   2005-01-31 00:12:00
8759   2005-01-31 00:12:00

8758 is supposed to be 2005-12-31 for example. What is wrong with that?

Once I resolve that, I'll be able to re-add the hours:

In [3]: # Then add the hours
df['Datetime'] = df['Datetime'] + pd.to_timedelta(df['Hour'], unit="h")

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:14:49+0000

Letting the pandas parser do the heavy lifting (as in first answer) is obviously the best option if you are getting it from csv. If you are getting or calculating numbers in a different way try:

df['DateTime'] = df[['Year', 'Month', 'Day', 'Hour']].apply(lambda s : datetime.datetime(*s),axis = 1)

find that is still easy to read and very flexible.

Categories

python - Pandas: create timestamp from 3 columns: Month, Day, Hour

python - Pandas: create timestamp from 3 columns: Month, Day, Hour

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags