I am trying to plot some data in pandas and the inbuilt plot function conveniently plots one line per column. What I want to do is to manually assign each line a color based on a classification I make.
The following works:
df = pd.DataFrame({'1': [1, 2, 3, 4], '2': [1, 2, 1, 2]})
s = pd.Series(['c','y'], index=['1','2'])
df.plot(color = s)
But when my indices are integers it no longer works and throws as KeyError:
df = pd.DataFrame({1: [1, 2, 3, 4], 2: [1, 2, 1, 2]})
s = pd.Series(['c','y'], index=[1,2])
df.plot(color = s)
The way I understand it is that when an integer index is used it somehow has to start from 0. That is my guess since the following works as well:
df = pd.DataFrame({0: [1, 2, 3, 4], 1: [1, 2, 1, 2]})
s = pd.Series(['c','y'], index=[1,0])
df.plot(color = s)
My question is:
- What is happening here?
- Assuming I have an integer index that does not start from 0 or is not formed of successive numbers, how can I make this work without having to convert the index to string or reindex starting from 0?
EDIT:
I realised that even in the first case, the code doesn't do what I expected it to do.
It seems like pandas matches the index of DataFrame and Series only if both are integer indices starting from 0. If that isn't the case, a KeyError is thrown or if the index is a str the order of the elements is used.
Is this correct? And is there a way to match the Series and DataFrame indices? Or do I have to make sure I pass a list of colours in the right order?
See Question&Answers more detail:
os