Here is a solution using Pandas and Matplotlib with more fine-grained control.
First, I provided below a function that generates a random dataframe for testing. Importantly, it creates three columns that generalize to more abstract problems:
my_timestamp
is a datetime
column containing timestamps
my_series
is the string label to which you want to apply the groupby
my_value
is a numeric value recorded for my_series
at time my_timestamp
Replace the column names with whatever dataframe that you have.
def generate_random_data(N=100):
'''
Returns a dataframe with N rows of random data.
'''
list_of_lists = []
labels = ['foo', 'bar', 'baz']
epoch = 1515617110
for _ in range(N):
key = random.choice(labels)
value = 0
if key == 'foo':
value = random.randint(1, 10)
elif key == 'bar':
value = random.randint(50, 60)
else:
value = random.randint(80, 90)
epoch += random.randint(5000, 30000)
row = [key, epoch, value]
list_of_lists.append(row)
df = pd.DataFrame(list_of_lists, columns=['my_series', 'epoch', 'my_value'])
df['my_timestamp'] = pd.to_datetime(df['epoch'], unit='s')
df = df[['my_timestamp', 'my_series', 'my_value']]
#df.set_index('ts', inplace=True)
return df
Here is some example data that was generated:
Now, the following code will run the groupby
and plot a nice time series graph.
def plot_gb_time_series(df, ts_name, gb_name, value_name, figsize=(20,7), title=None):
'''
Runs groupby on Pandas dataframe and produces a time series chart.
Parameters:
----------
df : Pandas dataframe
ts_name : string
The name of the df column that has the datetime timestamp x-axis values.
gb_name : string
The name of the df column to perform group-by.
value_name : string
The name of the df column for the y-axis.
figsize : tuple of two integers
Figure size of the resulting plot, e.g. (20, 7)
title : string
Optional title
'''
xtick_locator = DayLocator(interval=1)
xtick_dateformatter = DateFormatter('%m/%d/%Y')
fig, ax = plt.subplots(figsize=figsize)
for key, grp in df.groupby([gb_name]):
ax = grp.plot(ax=ax, kind='line', x=ts_name, y=value_name, label=key, marker='o')
ax.xaxis.set_major_locator(xtick_locator)
ax.xaxis.set_major_formatter(xtick_dateformatter)
ax.autoscale_view()
ax.legend(loc='upper left')
_ = plt.xticks(rotation=90, )
_ = plt.grid()
_ = plt.xlabel('')
_ = plt.ylim(0, df[value_name].max() * 1.25)
_ = plt.ylabel(value_name)
if title is not None:
_ = plt.title(title)
_ = plt.show()
Here is an example invocation:
df = generate_random_data()
plot_gb_time_series(df, 'my_timestamp', 'my_series', 'my_value',
figsize=(10, 5), title="Random data")
And here is the resulting time series plot: