Plotting timeseries data¶

Most of the data that we want to plot with Matplotlib will be in tabular format. In the second demo of this week, we will make some plots to display daily reservoir levels of Fall Creek Reservoir in the Willamette National Forest.

Daily reservoir levels¶

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data/fall_creek_levels.csv')
df

	date	level
0	2020-11-05	677.94
1	2020-11-06	691.36
2	2020-11-07	693.29
3	2020-11-08	694.50
4	2020-11-09	694.56
...	...	...
582	2022-06-10	829.63
583	2022-06-11	830.73
584	2022-06-12	830.90
585	2022-06-13	830.65
586	2022-06-14	830.19

587 rows × 2 columns

We can see that we have a table with two columns. The first column contains dates (from Nov 5, 2020 to June 14, 2022) and the second column contains reservoir level (in feet). Before we plot, we need to make sure that Pandas has interpreted the date column as dates.

df.dtypes

date      object
level    float64
dtype: object

While the level column has been interpeted as a float64, the date column has not been recognized as dates. We have to alert Pandas that this column contains dates using the keyword argument parse_dates when we read the file.

df = pd.read_csv('data/fall_creek_levels.csv', parse_dates=['date'])
df

	date	level
0	2020-11-05	677.94
1	2020-11-06	691.36
2	2020-11-07	693.29
3	2020-11-08	694.50
4	2020-11-09	694.56
...	...	...
582	2022-06-10	829.63
583	2022-06-11	830.73
584	2022-06-12	830.90
585	2022-06-13	830.65
586	2022-06-14	830.19

587 rows × 2 columns

Now when we display the column data types, we find that the date column has been interpreted as a NumPy datetime.

df.dtypes

date     datetime64[ns]
level           float64
dtype: object

Note

parse_dates will automatically recognize and convert many common date formats. But if the date and time is formatted unusually, we might have to specify the format. We can do that using a parser function (see below).

parser = lambda date: pd.to_datetime(date).strftime('%Y-%m-%d')
df = pd.read_csv('data/fall_creek_levels.csv', parse_dates=['date'], date_parser=parser)

/var/folders/xj/5ps5mr8d5ysbd2mxxqjg3k800000gq/T/ipykernel_49651/4277095575.py:2: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  df = pd.read_csv('data/fall_creek_levels.csv', parse_dates=['date'], date_parser=parser)

df['date']

   2020-11-05
   2020-11-06
   2020-11-07
   2020-11-08
   2020-11-09
         ...    
 2022-06-10
 2022-06-11
 2022-06-12
 2022-06-13
 2022-06-14
Name: date, Length: 587, dtype: datetime64[ns]

Now that we have read our dataset as a DataFrame, we can plot it easily.

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df['date'].values, df['level'].values, linewidth=2)
ax.set_title('Fall Creek Reservoir levels', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Level (ft)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
plt.show()

This looks great but the tick labels on the x-axis are difficult to read. We can edit the tick labels using the dates.mdates functions.

import matplotlib.dates as mdates

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df['date'].values, df['level'].values, linewidth=2)
ax.set_title('Fall Creek Reservoir levels', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Level (ft)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y')) # <-----

plt.show()

Note

We formatted the dates using an abbreviated month name (%b) followed by a new line (\n) followed by the year (%Y). See this table for more options.

5-minute air temperature¶

Next we will plot some time-series data with higher temporal resolution. This file contains air temperatures from a U.S. Climate Reference Network weather station near Corvallis.

df = pd.read_csv('data/corvallis_air_temp.csv')
df

	date	time	air_temp
0	20220610	10	24.3
1	20220610	15	24.0
2	20220610	20	23.7
3	20220610	25	23.1
4	20220610	30	22.7
...	...	...	...
1650	20220615	1740	13.9
1651	20220615	1745	14.7
1652	20220615	1750	14.6
1653	20220615	1755	14.9
1654	20220615	1800	14.8

1655 rows × 3 columns

When we have a look at the data, we find that the dates are in one column and the time is in another. So we will use another parser function to transform dates and times in ISO 8601 format (i.e. yyyy-mm-dd hh:mm:ss)

parser = lambda date: pd.to_datetime(date).strftime('%Y%m%d %H%M')
df = pd.read_csv('data/corvallis_air_temp.csv', parse_dates={'datetime': ['date', 'time']}, date_parser=parser)
df

/var/folders/xj/5ps5mr8d5ysbd2mxxqjg3k800000gq/T/ipykernel_49651/910982156.py:2: FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version. Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.
  df = pd.read_csv('data/corvallis_air_temp.csv', parse_dates={'datetime': ['date', 'time']}, date_parser=parser)

	datetime	air_temp
0	2022-06-10 00:10:00	24.3
1	2022-06-10 00:15:00	24.0
2	2022-06-10 00:20:00	23.7
3	2022-06-10 00:25:00	23.1
4	2022-06-10 00:30:00	22.7
...	...	...
1650	2022-06-15 17:40:00	13.9
1651	2022-06-15 17:45:00	14.7
1652	2022-06-15 17:50:00	14.6
1653	2022-06-15 17:55:00	14.9
1654	2022-06-15 18:00:00	14.8

1655 rows × 2 columns

df.dtypes

datetime    datetime64[ns]
air_temp           float64
dtype: object

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df['datetime'].values, df['air_temp'].values, linewidth=2)
ax.set_title('Corvallis air temperatures', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Air temperature ($^\circ$C)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))

plt.show()

To plot air temperatures for June 13, we could slice the DataFrame using the [start:end] syntax we learnt in Week 1.

df_slice = df[863:1150]
df_slice

	datetime	air_temp
863	2022-06-13 00:05:00	13.1
864	2022-06-13 00:10:00	13.0
865	2022-06-13 00:15:00	12.7
866	2022-06-13 00:20:00	12.6
867	2022-06-13 00:25:00	12.5
...	...	...
1145	2022-06-13 23:35:00	14.9
1146	2022-06-13 23:40:00	14.9
1147	2022-06-13 23:45:00	14.7
1148	2022-06-13 23:50:00	14.5
1149	2022-06-13 23:55:00	15.0

287 rows × 2 columns

But this is a little unwieldy because we have to manually find the right index for the start of June 13. A better way of doing this would be to slice by date and time. We can only do this if we make our datetime column the index column.

df.set_index('datetime', inplace=True)
df

	air_temp
datetime
2022-06-10 00:10:00	24.3
2022-06-10 00:15:00	24.0
2022-06-10 00:20:00	23.7
2022-06-10 00:25:00	23.1
2022-06-10 00:30:00	22.7
...	...
2022-06-15 17:40:00	13.9
2022-06-15 17:45:00	14.7
2022-06-15 17:50:00	14.6
2022-06-15 17:55:00	14.9
2022-06-15 18:00:00	14.8

1655 rows × 1 columns

Now we can slice the DataFrame using datetime. This time we have to use the .loc function to make it clear that we are referring to the index of the DataFrame.

df_slice = df.loc['2022-06-13 00:00:00':'2022-06-13 23:55:00']

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df_slice.index.values, df_slice['air_temp'].values, linewidth=2)
ax.set_title('Corvallis air temperatures', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Air temperature ($^\circ$C)', fontsize=14)
ax.set_xlabel('Time (UTC)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))

plt.show()

Note

Now that we have set the index column to datetime, we refer to the x-axis as df_slice.index.

Strangely the lowest air temperatures occur at noon. This is because our data are in UTC time. So we need to convert to Pacific by subtracting eight hours from our datetime index.

df['pacific_time'] = df.index + pd.DateOffset(hours=-8)
df

	air_temp	pacific_time
datetime
2022-06-10 00:10:00	24.3	2022-06-09 16:10:00
2022-06-10 00:15:00	24.0	2022-06-09 16:15:00
2022-06-10 00:20:00	23.7	2022-06-09 16:20:00
2022-06-10 00:25:00	23.1	2022-06-09 16:25:00
2022-06-10 00:30:00	22.7	2022-06-09 16:30:00
...	...	...
2022-06-15 17:40:00	13.9	2022-06-15 09:40:00
2022-06-15 17:45:00	14.7	2022-06-15 09:45:00
2022-06-15 17:50:00	14.6	2022-06-15 09:50:00
2022-06-15 17:55:00	14.9	2022-06-15 09:55:00
2022-06-15 18:00:00	14.8	2022-06-15 10:00:00

1655 rows × 2 columns

df.set_index('pacific_time', inplace=True)
df

	air_temp
pacific_time
2022-06-09 16:10:00	24.3
2022-06-09 16:15:00	24.0
2022-06-09 16:20:00	23.7
2022-06-09 16:25:00	23.1
2022-06-09 16:30:00	22.7
...	...
2022-06-15 09:40:00	13.9
2022-06-15 09:45:00	14.7
2022-06-15 09:50:00	14.6
2022-06-15 09:55:00	14.9
2022-06-15 10:00:00	14.8

1655 rows × 1 columns

Now we can slice using the same syntax as before.

df_pacific_slice = df.loc['2022-06-13 00:00:00':'2022-06-13 23:55:00']

And we produce a more logical figure showing highest air temperatures at around 1 pm.

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df_pacific_slice.index.values, df_pacific_slice['air_temp'].values, linewidth=2)
ax.set_title('Corvallis air temperatures', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Air temperature ($^\circ$C)', fontsize=14)
ax.set_xlabel('Time (PT)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
ax.set_xlim(df_pacific_slice.index[0], df_pacific_slice.index[-1])

plt.show()

Add some extra information¶

We can additional information to our plots to make specific points. For example, we could add a dashed vertical line (vlines) to show when maximum air temperatures occurred or a dashed horizontal line (hlines) to show the the value of the maximum air temperature.

Note

This function has the following syntax Axes.hlines(y, xmin, xmax, colors=None, linestyles='solid', label='', *, data=None, **kwargs)

# Identify the time and value of the maximum air temperature
highest_temp_idx = df_pacific_slice['air_temp'].idxmax()
highest_temp = df_pacific_slice['air_temp'].max()

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df_pacific_slice.index.values, df_pacific_slice['air_temp'].values, linewidth=2)
ax.set_title('Corvallis air temperatures', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Air temperature ($^\circ$C)', fontsize=14)
ax.set_xlabel('Time (PT)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
ax.set_xlim(df_pacific_slice.index[0], df_pacific_slice.index[-1])
ax.set_ylim(0, 20)

ax.vlines(highest_temp_idx, 0, highest_temp, color='k', ls='dashed')
ax.hlines(highest_temp, xmin=df_pacific_slice.index[0], xmax=highest_temp_idx, color='k', ls='dashed')
plt.show()

We can also add some text to our plots using the annotate function to make them even more informative. In it’s simplest form, the text is placed at xy. Optionally, the text can be displayed in another position xytext. An arrow pointing from the text to the annotated point xy can then be added by defining arrowprops.

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df_pacific_slice.index.values, df_pacific_slice['air_temp'].values, linewidth=2)
ax.set_title('Corvallis air temperatures', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.set_ylabel('Air temperature ($^\circ$C)', fontsize=14)
ax.set_xlabel('Time (PT)', fontsize=14)
ax.tick_params(axis='both', labelsize=14)
ax.grid()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
ax.set_xlim(df_pacific_slice.index[0], df_pacific_slice.index[-1])
ax.set_ylim(0, 20)

ax.vlines(highest_temp_idx, 0, highest_temp, color='k', ls='dashed')
ax.hlines(highest_temp, xmin=df_pacific_slice.index[0], xmax=highest_temp_idx, color='k', ls='dashed')

ax.annotate(f'%.1f C at %s' % (highest_temp, highest_temp_idx.strftime('%H:%M')), 
            xy=(highest_temp_idx, highest_temp), 
            xytext=(highest_temp_idx+pd.DateOffset(hours=2), highest_temp+1),
            arrowprops=dict(facecolor='black', shrink=0.05, width=1, headwidth=8), fontsize=12)
plt.show()

Summary¶

In this demo, we were introduced to the power of Pandas for manipulating and plotting data. Next week, we will demonstrate some of the other things we can do using this library.

Programming for spatial data science

Plotting timeseries data

Contents

Plotting timeseries data¶

Daily reservoir levels¶

5-minute air temperature¶

Add some extra information¶

Summary¶