Using the dataRetrieval
package to analyze USGS datasets
During my EDS 213 course titled “Metadata Standards, Data Modeling and Data Semantics” I learned how to use an API to programmatically retrieve data.
Using the dataRetrieval
package to access USGS water gauge data, I will be looking at the Ventura River and Santa Paula Creek gauges to explore rain events and patterns.
# Query both sites at once by passing in a vector for the siteNumber argument
# Ventura is 1111850
# Santa Paula is 1111350
# subset for just 2021
combined <- readNWISdv(siteNumber = c("11118500", "11113500"),
parameterCd = "00060",
startDate = "2021-01-01",
endDate = "2021-11-01")
# rename the discharge column and drop the unnecessary variables
combined$discharge <- combined$X_00060_00003
combined <- combined %>%
select(site_no, Date, discharge) %>%
mutate(site_no = case_when(site_no == "11118500" ~ "Ventura",
TRUE ~ "Santa Paula"))
One of my favorite things about living in the Santa Barbara area is the beautiful weather. With sunny skies and warm temperatures most days, I have become quite sensitive to even small drops in temperature or brief rain episodes. In 2021, we have experienced very little rain thus far. From the plot below, we can see that the highest level of water discharge for either site was less than 15 cubic feet per second. Besides this spike, levels were consistently under 5 cubic feet per second.
# make a time series plot with both sites
ggplot(data = combined, aes(x = Date, y = discharge)) +
geom_line(aes(color = site_no)) +
ylim(0, 15) +
labs(title = "Water Discharge, Santa Paula and Ventura",
subtitle = "2021-01-01 - 2021-11-01",
color = "Site",
y = "Discharge (cubic ft/s)")
The Thomas Fire in December of 2018 ended my first quarter of college early and postponed my first finals week. Once the fire was contained after the holidays, I thought I would be returning to a somewhat normal college experience…until the Montecito Mudslides happened.
Just as the Thomas Fire was nearing complete containment, heavy rainfall on January 9, 2019 washed away massive amounts of mud and debris loosened from the areas burned by the fire. These mudslides sadly resulted in 23 deaths and the destruction of around 100 residences. After a quick weekend trip home in January, the Highway 101 closure due to these mudslides meant in order to get back to school I needed to take a short boat trip from Ventura Harbor to Santa Barbara.
In the time series plot above, I used the readNWISdv
to retrieve daily values of water discharge. By using the readNWISuv
function, I can download data that includes readings for every 15 minutes on a given day. Filtering for January 9, 2018, I can look at the frequency and intensity of rainfall for this particular day.
# getting instantaneous data for January 9 to look at water discharge
combined_jan_9_18 <- readNWISuv(siteNumbers = c("11118500", "11113500"),
parameterCd = "00060",
startDate = "2018-01-09",
endDate = "2018-01-09",
tz = "America/Los_Angeles")
combined_jan_9_18$discharge <- combined_jan_9_18$X_00060_00000
combined_jan_9_18 <- combined_jan_9_18 %>%
select(site_no, dateTime, discharge) %>%
mutate(site_no = case_when(site_no == "11118500" ~ "Ventura",
TRUE ~ "Santa Paula"))
ggplot(combined_jan_9_18, aes(x = dateTime, y = discharge)) +
geom_line(aes(color = site_no)) +
labs(y = "Discharge (cubic ft/s)",
color = "Site",
title = "Water Discharge",
subtitle = "2018-01-09",
x = "Time")
From the above plot, we can see that water discharge peaked at around 6000 cubic feet per second at about 7 am on January 9. Compared to the maximum discharge in 2021 so far of 15 cubic feet per second, this highlights the intensity of the rainfall that triggered the Montecito Mudslides back in 2018. In a place of such low rainfall like Santa Barbara County, it is unlikely that such a massive rain event would occur and bring with it such devastating consequences, but the burn scars left by the Thomas Fire made these mudslides practically inevitable.
Using an API to access and download data was extremely simple and fast. Instead of taking the time to download massive csv files off the internet to then use in an R script, using a package like dataRetrieval
(or metajam
which is used to access DataONE datasets) streamlines this process. It also makes it quick and easy to edit your code in case your dates or regions of interest change, as opposed to going back to the website to download another couple csv files.
For attribution, please cite this work as
Cruz (2021, Nov. 2). Felicia Cruz: Using an API to Programmatically Access and Download Data. Retrieved from https://fmcruz23.github.io/posts/2021-10-20-data-retrieval/
BibTeX citation
@misc{cruz2021using, author = {Cruz, Felicia}, title = {Felicia Cruz: Using an API to Programmatically Access and Download Data}, url = {https://fmcruz23.github.io/posts/2021-10-20-data-retrieval/}, year = {2021} }