Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
216 views
in Technique[技术] by (71.8m points)

python - pandas 0.21.0 Timestamp compatibility issue with matplotlib

I just updated pandas from 0.17.1 to 0.21.0 to take advantage of some new functionalities, and ran into compatibility issue with matplotlib (which I also updated to latest 2.1.0). In particular, the Timestamp object seems to be changed significantly.

I happen to have another machine still running the older versions of pandas(0.17.1)/matplotlib(1.5.1) which I used to compared the differences:

Both versions show my DataFrame index to be dtype='datetime64[ns]

DatetimeIndex(['2017-03-13', '2017-03-14', ... '2017-11-17'], type='datetime64[ns]', name='dates', length=170, freq=None)

But when calling type(df.index[0]), 0.17.1 gives pandas.tslib.Timestamp and 0.21.0 gives pandas._libs.tslib.Timestamp.

When plotting with df.index as x-axis:

plt.plot(df.index, df['data'])

matplotlibs by default formats the x-axis labels as dates for pandas 0.17.1 but fails to recognize it for pandas 0.21.0 and simply gives raw number 1.5e18 (epoch time in nanosec).

I also have a customized cursor that reports clicked location on the graph by using matplotlib.dates.DateFormatter on the x-value which fails for 0.21.0 with:

OverflowError: signed integer is greater than maximum

I can see in debug the reported x-value is around 736500 (i.e. day count since year 0) for 0.17.1 but is around 1.5e18 (i.e. nanosec epoch time) for 0.21.0.

I am surprised at this break of compatibility between matplotlib and pandas as they are obviously used together by most people. Am I missing something in the way I call the plot function above for the newer versions?

Update as I mentioned above, I prefer directly calling plot with a given axes object but just for the heck of it, I tried calling the plot method of the DataFrame itself df.plot(). As soon as this is done, all subsequent plots correctly recognize the Timestamp within the same python session. It's as if an environment variable is set, because I can reload another DataFrame or create another axes with subplots and no where does the 1.5e18 show up. This really smells like a bug as the latest pandas doc says pandas:

The plot method on Series and DataFrame is just a simple wrapper around plt.plot()

But clearly it does something to the python session such that subsequent plots deal with the Timestamp index properly.

In fact, simply running the example at the above pandas link:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))

Depending on whether ts.plot() is called or not, the following plot either correctly formats x-axis as dates or not:

plt.plot(ts.index,ts)
plt.show()

Once a member plot is called, subsequently calling plt.plot on new Series or DataFrame will autoformat correctly without needing to call the member plot method again.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is an issue with pandas datetimes and matplotlib coming from the recent release of pandas 0.21, which does not register its converters any more at import. Once you use those converters once (within pandas) they'll be registered and automatically used by matplotlib as well.

A workaround would be to register them manually,

import pandas.plotting._converter as pandacnv
pandacnv.register()

In any case the issue is well known at both pandas and matplotlib side, so there will be some kind of fix for the next releases. Pandas is thinking about readding the register in an upcomming release. So this issue may be there only temporarily. An option is also to revert to pandas 0.20.x where this should not occur.

Update: this is no longer an issue with current versions of matplotlib (2.2.2)/pandas(0.23.1), and likely many that have been released since roughly December 2017, when this was fixed.

Update 2: As of pandas 0.24 or higher the recommended way to register the converters is

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

or if pandas is already imported as pd,

pd.plotting.register_matplotlib_converters()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...