For our Remote Sensing final project, my group and I analyzed air quality trends in the US, and California more specifically, by using AQI datasets from the EPA.
Air quality serves as an important indicator of human health and the environment. Under the Clean Air Act, U.S. the Environmental Protection Agency (EPA) established National Ambient Air Quality Standards (NAAQS) for five common air pollutants: ozone, particulate matter (PM2.5 and PM10), sulfur dioxide, carbon monoxide, and nitrogen dioxide. Combined, these pollutants form the U.S. Air Quality Index (AQI) for reporting air quality.
The purpose of this project is to assess how air quality in the United States has changed over time from 1980 to 2020. We will use daily Air Quality Index (AQI) data by US county and metropolitan areas for our analysis. This data is made available by the US EPA and is freely accessible.
We use pre-generated data files in .csv format from the US EPA for Daily AQI by County which can be found here: https://aqs.epa.gov/aqsweb/airdata/download_files.html#AQI
The data include daily AQI readings for each year from January 1, 1980 through May 5, 2021. The Daily AQI by County dataset includes observations for all 50 states plus the District of Columbia, Mexico, Virgin Islands, Canada, Guam, and Puerto Rico.
Data of interest is AQI. The six defining parameters that contribute to AQI are Ozone, CO, NO2, SO2, PM 2.5, and PM 10. Daily summary AQI recordings include: 1) The aggregate of all sub-daily measurements taken at the monitor. 2) The single sample value if the monitor takes a single, daily sample (e.g., there is only one sample with a 24-hour duration). In this case, the mean and max daily sample will have the same value.
To calculate AQI, the concentration of each pollutant in the air is measured and converted into a number running from zero upwards by using a standard index or scale. The calculated number for every pollutant is termed as a sub-index. The highest sub-index for any given hour is recorded as the AQI for that hour or aggregated per day. AQI runs from 0 to 500. The higher the AQI value, the greater the level of air pollution and the greater the health concern. The pollutant with the highest concentration over a 24-hour period and which corresponds to the highest sub-index is referred to as the ‘Defining Parameter.’ Here is a breakdown of the different categories.
For more information on how AQI is calculated please visit: https://www.airnow.gov/sites/default/files/2020-05/aqi-technical-assistance-document-sept2018.pdf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime
"aqi_us.csv") aqi.to_csv(
We are interested in exploring the trends in air quality from 1980-2021, specifically how each pollutant contributes to AQI readings over time. We do this by finding the number of air quality readings above 150 in each year, separated by the defining parameters for easy comparison.
We further this analysis to see if California follows the same trend as the US. Since California has extreme fire seasons and many big cities – both which negatively affect air quality – we have chosen California as a focal point to explore local patterns of air quality.
Understanding long term trends in air quality can be useful for many cases. Federal officials, such as in the EPA, as well as local government officials tasked with improving air quality must know which pollutant is most responsible for lowered air quality. By separating the data as we do below, they can evaluate which pollutant needs to be reduced the most to improve the overall air quality of the region. This analysis can also benefit health workers. By understanding the local trends in air quality, they can better understand some the health challenges their patients may be presenting with.
# format date as datetime
'Date']=pd.to_datetime(aqi['Date'], format = '%Y-%m-%d')
aqi[
# convert Category column to category data type
'Category']=aqi.Category.astype('category')
aqi[
# convert object to integer
'AQI'] = pd.to_numeric(aqi['AQI']) aqi[
# check data types
aqi.dtypes
State Name object
county Name object
State Code object
County Code object
Date datetime64[ns]
AQI int64
Category category
Defining Parameter object
Defining Site object
Number of Sites Reporting object
dtype: object
# subset data for readings in the unhealthy range and above (greater than 150)
= aqi[aqi['AQI'] > 150] aqi_unhealthy
# subsets for each parameter for above 150
= aqi_unhealthy[aqi_unhealthy['Defining Parameter'] == 'Ozone']
aqi_ozone_unhealthy = aqi_unhealthy[aqi_unhealthy['Defining Parameter'] == 'PM2.5']
aqi_pm25_unhealthy = aqi_unhealthy[aqi_unhealthy['Defining Parameter'] == 'PM10']
aqi_pm10_unhealthy = aqi_unhealthy[aqi_unhealthy['Defining Parameter'] == 'SO2']
aqi_so2_unhealthy = aqi_unhealthy[aqi_unhealthy['Defining Parameter'] == 'CO']
aqi_co_unhealthy = aqi_unhealthy[aqi_unhealthy['Defining Parameter'] == 'NO2'] aqi_no2_unhealthy
# counts for each subset
= aqi_unhealthy.groupby('year').size().reset_index(name='count')
counts_aqi_unhealthy = aqi_ozone_unhealthy.groupby('year').size().reset_index(name='count')
counts_ozone_unhealthy = aqi_pm25_unhealthy.groupby('year').size().reset_index(name='count')
counts_pm25_unhealthy = aqi_pm10_unhealthy.groupby('year').size().reset_index(name='count')
counts_pm10_unhealthy = aqi_so2_unhealthy.groupby('year').size().reset_index(name='count')
counts_so2_unhealthy = aqi_co_unhealthy.groupby('year').size().reset_index(name='count')
counts_co_unhealthy = aqi_no2_unhealthy.groupby('year').size().reset_index(name='count') counts_no2_unhealthy
# plot the number of AQI Readings above 150 by Year in the US
'year'], counts_aqi_unhealthy['count'],
plt.plot(counts_aqi_unhealthy[= 'All Parameters', ls = 'dashed', color = 'black')
label <<<<<<< HEAD
'year'], counts_ozone_unhealthy['count'],
plt.plot(counts_ozone_unhealthy[= 'Ozone')
label 'year'], counts_pm25_unhealthy['count'],
plt.plot(counts_pm25_unhealthy[= 'PM 2.5')
label 'year'], counts_pm10_unhealthy['count'],
plt.plot(counts_pm10_unhealthy[= 'PM 10')
label 'year'], counts_so2_unhealthy['count'],
plt.plot(counts_so2_unhealthy[= 'SO2')
label 'year'], counts_co_unhealthy['count'],
plt.plot(counts_co_unhealthy[= 'CO')
label 'year'], counts_no2_unhealthy['count'],
plt.plot(counts_no2_unhealthy[= 'NO2')
label
= 'Defining Parameter')
ax.legend(title 'Number of AQI Readings above 150 by Year in the US')
plt.title('Year')
plt.xlabel('Count') plt.ylabel(
From this line graph, we can see that air quality in the US since the 80s has generally improved. There have been some spikes, however, and in recent years it seems that PM 2.5 contributes significantly to these spikes in the number of AQI readings above 150.
When looking at California’s AQI trends, we can see that it generally follows the same negative trend in AQI readings above 150 that the entire US does. However, it does spike in recent years. These spikes appear to be due to increases in AQI readings above 150 due to PM 2.5 as the defining parameter.
# subset all data for CA and PM2.5 and make new column for year
= aqi[aqi['State Name'] == 'California']
aqi_ca = aqi_ca[aqi_ca['Defining Parameter'] == 'PM2.5']
aqi_ca_pm25 'Year'] = pd.DatetimeIndex(aqi_ca_pm25['Date']).year aqi_ca_pm25[
# plot the CA AQI Readings with PM2.5 as the Defining Parameter
= plt.subplots(figsize = (12,8), dpi = 300)
fig, ax
0, 50, facecolor='green', alpha=0.3)
ax.axhspan(50, 100, facecolor='yellow', alpha=0.3)
ax.axhspan(100, 150, facecolor='orange', alpha=0.3)
ax.axhspan(150, 200, facecolor='red', alpha=0.3)
ax.axhspan(200, 300, facecolor='purple', alpha=0.3)
ax.axhspan(300, 600, facecolor='maroon', alpha=0.3)
ax.axhspan(
'Year'], aqi_ca_pm25['AQI'])
plt.scatter(aqi_ca_pm25[0,600)
plt.ylim("CA AQI Readings with PM2.5 as the Defining Parameter")
plt.title("Year")
plt.xlabel("AQI") plt.ylabel(
When filtering for all AQI readings above 150 with PM 2.5 as the defining parameter, we can see that there is a very clear uptick in high AQI readings. Before 2000, there were very few readings above 150, but over time readings in the “Unhealthy”, “Very Unhealthy”, and “Hazardous” thresholds have become much more abundant. Additionally, between 2015 and 2021 we see very high readings above 300, showing how AQI readings have worsened in addition to more abundant unhealthy readings in general.
One possible explanation for the increased hazardous AQI readings due to PM 2.5 since 2015 is that California’s wildfire season has progressively gotten worse each year. 2017 and 2020 have the highest readings. The 2017 season was the most destructive wildfire season on record at the time, with 1.2 million acres burned. The largest fire in the 2017 season was the Thomas Fire in Santa Barbara County, which was California’s largest modern wildfire at the time. In 2020, 4.2 million acres were burned, more than 4% of the state’s land, making 2020 the largest wildfire season recorded in California’s modern history. The August Complex Fire alone burned more than 1 million acres, making it the first “gigafire” on record.
After looking at air quality trends in the US and in California, we can see that air quality in general has seemed to get better depite some recent spikes largely due to PM 2.5. For further analysis, it would be interesting to see if high AQI readings follow seasonal patterns each year. We have speculated that the recent spikes in hazardous AQI readings due to PM 2.5 could be due to large wildfire events, but further analysis would be needed to confirm this.
For attribution, please cite this work as
Cruz (2022, Jan. 18). Felicia Cruz: Analyzing Air Quality Trends in the US with EPA AQI Data. Retrieved from https://fmcruz23.github.io/posts/2022-01-18-aqi/
BibTeX citation
@misc{cruz2022analyzing, author = {Cruz, Felicia}, title = {Felicia Cruz: Analyzing Air Quality Trends in the US with EPA AQI Data}, url = {https://fmcruz23.github.io/posts/2022-01-18-aqi/}, year = {2022} }