Plotting timeseries data

Most of the data that we want to plot with Matplotlib will be in tabular format. In the second demo of this week, we will make some plots to display daily reservoir levels of Fall Creek Reservoir in the Willamette National Forest.

fall creek reservoir

Daily reservoir levels

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data/fall_creek_levels.csv')
df
date level
0 2020-11-05 677.94
1 2020-11-06 691.36
2 2020-11-07 693.29
3 2020-11-08 694.50
4 2020-11-09 694.56
... ... ...
582 2022-06-10 829.63
583 2022-06-11 830.73
584 2022-06-12 830.90
585 2022-06-13 830.65
586 2022-06-14 830.19

587 rows × 2 columns

We can see that we have a table with two columns. The first column contains dates (from Nov 5, 2020 to June 14, 2022) and the second column contains reservoir level (in feet). Before we plot, we need to make sure that Pandas has interpreted the date column as dates.

df.dtypes
date      object
level    float64
dtype: object

While the level column has been interpeted as a float64, the date column has not been recognized as dates. We have to alert Pandas that this column contains dates using the keyword argument parse_dates when we read the file.

df = pd.read_csv('data/fall_creek_levels.csv', parse_dates=['date'])
df
date level
0 2020-11-05 677.94
1 2020-11-06 691.36
2 2020-11-07 693.29
3 2020-11-08 694.50
4 2020-11-09 694.56
... ... ...
582 2022-06-10 829.63
583 2022-06-11 830.73
584 2022-06-12 830.90
585 2022-06-13 830.65
586 2022-06-14 830.19

587 rows × 2 columns

Now when we display the column data types, we find that the date column has been interpreted as a NumPy datetime.

df.dtypes
date     datetime64[ns]
level           float64
dtype: object

Note

parse_dates will automatically recognize and convert many common date formats. But if the date and time is formatted unusually, we might have to specify the format. We can do that using a parser function (see below).

parser = lambda date: pd.to_datetime(date).strftime('%Y-%m-%d')
df = pd.read_csv('data/fall_creek_levels.csv', parse_dates=['date'], date_parser=parser)
/var/folders/xj/5ps5mr8d5ysbd2mxxqjg3k800000gq/T/ipykernel_49651/4277095575.py:2: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  df = pd.read_csv('data/fall_creek_levels.csv', parse_dates=['date'], date_parser=parser)
df['date']
0     2020-11-05
1     2020-11-06
2     2020-11-07
3     2020-11-08
4     2020-11-09
         ...    
582   2022-06-10
583   2022-06-11
584   2022-06-12
585   2022-06-13
586   2022-06-14
Name: date, Length: 587, dtype: datetime64[ns]

Now that we have read our dataset as a DataFrame, we can plot it easily.

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df['date'].values, df['level'].values, linewidth=2)
ax.set_title('Fall Creek Reservoir levels', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Level (ft)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
plt.show()
../_images/06b-demo_13_0.png

This looks great but the tick labels on the x-axis are difficult to read. We can edit the tick labels using the dates.mdates functions.

import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df['date'].values, df['level'].values, linewidth=2)
ax.set_title('Fall Creek Reservoir levels', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Level (ft)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y')) # <-----

plt.show()
../_images/06b-demo_16_0.png

Note

We formatted the dates using an abbreviated month name (%b) followed by a new line (\n) followed by the year (%Y). See this table for more options.

5-minute air temperature

Next we will plot some time-series data with higher temporal resolution. This file contains air temperatures from a U.S. Climate Reference Network weather station near Corvallis.

df = pd.read_csv('data/corvallis_air_temp.csv')
df
date time air_temp
0 20220610 10 24.3
1 20220610 15 24.0
2 20220610 20 23.7
3 20220610 25 23.1
4 20220610 30 22.7
... ... ... ...
1650 20220615 1740 13.9
1651 20220615 1745 14.7
1652 20220615 1750 14.6
1653 20220615 1755 14.9
1654 20220615 1800 14.8

1655 rows × 3 columns

When we have a look at the data, we find that the dates are in one column and the time is in another. So we will use another parser function to transform dates and times in ISO 8601 format (i.e. yyyy-mm-dd hh:mm:ss)

parser = lambda date: pd.to_datetime(date).strftime('%Y%m%d %H%M')
df = pd.read_csv('data/corvallis_air_temp.csv', parse_dates={'datetime': ['date', 'time']}, date_parser=parser)
df
/var/folders/xj/5ps5mr8d5ysbd2mxxqjg3k800000gq/T/ipykernel_49651/910982156.py:2: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  df = pd.read_csv('data/corvallis_air_temp.csv', parse_dates={'datetime': ['date', 'time']}, date_parser=parser)
datetime air_temp
0 2022-06-10 00:10:00 24.3
1 2022-06-10 00:15:00 24.0
2 2022-06-10 00:20:00 23.7
3 2022-06-10 00:25:00 23.1
4 2022-06-10 00:30:00 22.7
... ... ...
1650 2022-06-15 17:40:00 13.9
1651 2022-06-15 17:45:00 14.7
1652 2022-06-15 17:50:00 14.6
1653 2022-06-15 17:55:00 14.9
1654 2022-06-15 18:00:00 14.8

1655 rows × 2 columns

df.dtypes
datetime    datetime64[ns]
air_temp           float64
dtype: object
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df['datetime'].values, df['air_temp'].values, linewidth=2)
ax.set_title('Corvallis air temperatures', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Air temperature ($^\circ$C)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))

plt.show()
../_images/06b-demo_23_0.png

To plot air temperatures for June 13, we could slice the DataFrame using the [start:end] syntax we learnt in Week 1.

df_slice = df[863:1150]
df_slice
datetime air_temp
863 2022-06-13 00:05:00 13.1
864 2022-06-13 00:10:00 13.0
865 2022-06-13 00:15:00 12.7
866 2022-06-13 00:20:00 12.6
867 2022-06-13 00:25:00 12.5
... ... ...
1145 2022-06-13 23:35:00 14.9
1146 2022-06-13 23:40:00 14.9
1147 2022-06-13 23:45:00 14.7
1148 2022-06-13 23:50:00 14.5
1149 2022-06-13 23:55:00 15.0

287 rows × 2 columns

But this is a little unwieldy because we have to manually find the right index for the start of June 13. A better way of doing this would be to slice by date and time. We can only do this if we make our datetime column the index column.

df.set_index('datetime', inplace=True)
df
air_temp
datetime
2022-06-10 00:10:00 24.3
2022-06-10 00:15:00 24.0
2022-06-10 00:20:00 23.7
2022-06-10 00:25:00 23.1
2022-06-10 00:30:00 22.7
... ...
2022-06-15 17:40:00 13.9
2022-06-15 17:45:00 14.7
2022-06-15 17:50:00 14.6
2022-06-15 17:55:00 14.9
2022-06-15 18:00:00 14.8

1655 rows × 1 columns

Now we can slice the DataFrame using datetime. This time we have to use the .loc function to make it clear that we are referring to the index of the DataFrame.

df_slice = df.loc['2022-06-13 00:00:00':'2022-06-13 23:55:00']
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df_slice.index.values, df_slice['air_temp'].values, linewidth=2)
ax.set_title('Corvallis air temperatures', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Air temperature ($^\circ$C)', fontsize=14)
ax.set_xlabel('Time (UTC)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))

plt.show()
../_images/06b-demo_30_0.png

Note

Now that we have set the index column to datetime, we refer to the x-axis as df_slice.index.

Strangely the lowest air temperatures occur at noon. This is because our data are in UTC time. So we need to convert to Pacific by subtracting eight hours from our datetime index.

df['pacific_time'] = df.index + pd.DateOffset(hours=-8)
df
air_temp pacific_time
datetime
2022-06-10 00:10:00 24.3 2022-06-09 16:10:00
2022-06-10 00:15:00 24.0 2022-06-09 16:15:00
2022-06-10 00:20:00 23.7 2022-06-09 16:20:00
2022-06-10 00:25:00 23.1 2022-06-09 16:25:00
2022-06-10 00:30:00 22.7 2022-06-09 16:30:00
... ... ...
2022-06-15 17:40:00 13.9 2022-06-15 09:40:00
2022-06-15 17:45:00 14.7 2022-06-15 09:45:00
2022-06-15 17:50:00 14.6 2022-06-15 09:50:00
2022-06-15 17:55:00 14.9 2022-06-15 09:55:00
2022-06-15 18:00:00 14.8 2022-06-15 10:00:00

1655 rows × 2 columns

df.set_index('pacific_time', inplace=True)
df
air_temp
pacific_time
2022-06-09 16:10:00 24.3
2022-06-09 16:15:00 24.0
2022-06-09 16:20:00 23.7
2022-06-09 16:25:00 23.1
2022-06-09 16:30:00 22.7
... ...
2022-06-15 09:40:00 13.9
2022-06-15 09:45:00 14.7
2022-06-15 09:50:00 14.6
2022-06-15 09:55:00 14.9
2022-06-15 10:00:00 14.8

1655 rows × 1 columns

Now we can slice using the same syntax as before.

df_pacific_slice = df.loc['2022-06-13 00:00:00':'2022-06-13 23:55:00']

And we produce a more logical figure showing highest air temperatures at around 1 pm.

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df_pacific_slice.index.values, df_pacific_slice['air_temp'].values, linewidth=2)
ax.set_title('Corvallis air temperatures', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Air temperature ($^\circ$C)', fontsize=14)
ax.set_xlabel('Time (PT)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
ax.set_xlim(df_pacific_slice.index[0], df_pacific_slice.index[-1])

plt.show()
../_images/06b-demo_38_0.png

Add some extra information

We can additional information to our plots to make specific points. For example, we could add a dashed vertical line (vlines) to show when maximum air temperatures occurred or a dashed horizontal line (hlines) to show the the value of the maximum air temperature.

Note

This function has the following syntax Axes.hlines(y, xmin, xmax, colors=None, linestyles='solid', label='', *, data=None, **kwargs)

# Identify the time and value of the maximum air temperature
highest_temp_idx = df_pacific_slice['air_temp'].idxmax()
highest_temp = df_pacific_slice['air_temp'].max()
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df_pacific_slice.index.values, df_pacific_slice['air_temp'].values, linewidth=2)
ax.set_title('Corvallis air temperatures', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Air temperature ($^\circ$C)', fontsize=14)
ax.set_xlabel('Time (PT)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
ax.set_xlim(df_pacific_slice.index[0], df_pacific_slice.index[-1])
ax.set_ylim(0, 20)

ax.vlines(highest_temp_idx, 0, highest_temp, color='k', ls='dashed')
ax.hlines(highest_temp, xmin=df_pacific_slice.index[0], xmax=highest_temp_idx, color='k', ls='dashed')
plt.show()
../_images/06b-demo_42_0.png

We can also add some text to our plots using the annotate function to make them even more informative. In it’s simplest form, the text is placed at xy. Optionally, the text can be displayed in another position xytext. An arrow pointing from the text to the annotated point xy can then be added by defining arrowprops.

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df_pacific_slice.index.values, df_pacific_slice['air_temp'].values, linewidth=2)
ax.set_title('Corvallis air temperatures', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Air temperature ($^\circ$C)', fontsize=14)
ax.set_xlabel('Time (PT)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
ax.set_xlim(df_pacific_slice.index[0], df_pacific_slice.index[-1])
ax.set_ylim(0, 20)

ax.vlines(highest_temp_idx, 0, highest_temp, color='k', ls='dashed')
ax.hlines(highest_temp, xmin=df_pacific_slice.index[0], xmax=highest_temp_idx, color='k', ls='dashed')

ax.annotate(f'%.1f C at %s' % (highest_temp, highest_temp_idx.strftime('%H:%M')), 
            xy=(highest_temp_idx, highest_temp), 
            xytext=(highest_temp_idx+pd.DateOffset(hours=2), highest_temp+1),
            arrowprops=dict(facecolor='black', shrink=0.05, width=1, headwidth=8), fontsize=12)
plt.show()
../_images/06b-demo_44_0.png

Summary

In this demo, we were introduced to the power of Pandas for manipulating and plotting data. Next week, we will demonstrate some of the other things we can do using this library.