if (!require("pacman")) install.packages("pacman")
::p_load(
pacman# Data handling
dplyr, janitor, stringr, tidyr, forcats, lubridate, purrr,# Visualization
ggplot2, patchwork,# Spatial data
sf,
units, terra)
Severe Heat Days using the Universal Thermal Comfort Index
Author: Marcial Yangali (marcialy@colmex.mx)
Reviewers: Landy Sánchez and Emerson Baptista
Overview
This Code & Data is part of the CACHE research initiative on Aging, Climate, and Health, specifically the demonstration project on Heat, Disability in Older Adults, and Care. This entry shows how to construct heat measures using the Universal Thermal Comfort Index (UTCI). We detail the data wrangling, analysis, and visualization process in R used to construct the variable days of severe heat define as any day reaching 32° C of UTCI or above (strong, very strong, or extreme heat stress). We demonstrate this process with data from Mexico, but, of course, the procedure can be replicated for any other part of the world.
Required R Packages 🛠️
We begin by loading the necessary R packages. These support data manipulation (mainly dplyr
but also janitor
, stringr
, tidyr
, lubridate
and forcats
), visualization (ggplot2
, patchwork
), and geospatial operations (sf
, units
, terra
).
Heat Data
1. Universal Thermal Climate Index (UTCI)
The Universal Thermal Climate Index (UTCI) is a bioclimatic index that evaluates the physiological comfort of the human body under certain meteorological conditions (Brödel et al. 2012). The UTCI is derived from the UTCI-Fiala model (Fiala et al., 2012), which integrates a dynamic thermoregulation framework of the human body along with a clothing insulation model that varies with temperature. Overall, UTCI offers an assessment of the “apparent” temperature that our body would perceive in specific environmental conditions, specifically, air temperature, wind, humidity, and radiation.
As the UTCI incorporates more variables affecting thermal comfort, it seeks to provide a more realistic picture of the thermal stress perceived by the humans and their behavioral responses, particularly in outdoor conditions. Compared to other heat indices, UTCI is more robust and flexible in capturing thermal stress across a wide range of climates and extreme weather conditions. Due to its adaptability and comprehensive design, UTCI was selected as the primary measure to evaluate heat stress in our study across Mexican municipalities.
The UTCI data comes from Thermal comfort indices derived from ERA5 reanalysis, also known as ERA5-HEAT (Human thErmAl comforT) produced by Copernicus Climate Change Service1. This dataset provides hourly global raster images based on ERA5 reanalysis. For our analysis, we downloaded from the daily UTCI raster files Data Store for the period from May 1st to May 31st, 2019. Since our focus is on Mexico, we specified a sub-region extraction using bounding coordinates that encompass the national territory: 33.0°N (North), -118.5°E (East), 14.0°N (South), and -86.0°E (West). All downloaded data were organized into a dedicated subfolder named UTCI within the main in directory.
# We save the names of the raster images in a list
<- list.files("in/UTCI", pattern = "\\.nc$", full.names = TRUE) files
2. Geographic Data – Mexican Municipalities and States
In order to spatially visualize the distribution of heat exposure, we use official geographic boundaries at the municipal and state levels from the 2020 Census. We are using the shapefiles 00mun
and 00ent
from the downloaded folder: mg_2020_integrado/conjunto_de_datos
.
# Municipal geometries
<- read_sf("in/GEO/00mun.shp") |>
mapa st_transform(st_crs(4326)) |>
clean_names()
# State level geometries
<- read_sf("in/GEO/00ent.shp") |>
mapa2 st_transform(st_crs(4326))
From Raster to Vector Data
We begin by exploring one of the raster images we have downloaded:
# 1. Open a NetCDF file and store it as object 'r'
<- rast(files[1]) # SpatRaster object
r
# 1.1. Visualize the raster layer
plot(r[[1]])
The image reveals the pixel size (0.25 x 0.25 degrees). We will use these dimensions to create a regular grid of square polygons and examine how they intersect with the geometries of Mexico.
# 2. Get the center coordinates of each cell
<- as.data.frame(r, xy = TRUE, cells = TRUE)[, c("x", "y", "cell")]
coords
# 3. Create polygons (0.25° x 0.25° squares) centered on each point
<- 0.25 # resolution in degrees
res
# Use purrr::pmap to build each polygon
<- coords |>
polys mutate(
geometry = purrr::pmap(list(x, y), function(x, y) {
st_polygon(list(matrix(c(
- res/2, y - res/2,
x + res/2, y - res/2,
x + res/2, y + res/2,
x - res/2, y + res/2,
x - res/2, y - res/2
x ncol = 2, byrow = TRUE)))
),
})
)
# 4. Convert to an sf object
<- st_sf(ID = polys$cell, geometry = st_sfc(polys$geometry, crs = 4326))
grid
|>
grid ggplot() +
geom_sf() +
geom_sf(data = mapa, fill = NA, color = "darkgreen")
Weighted Values
To obtain a summary value per municipality (e.g., daily mean or maximum UTCI), we must account for all pixels within each municipal boundary. However, values must be adjusted according to the proportion of each pixel that lies within a given municipality — the weight. These weights are calculated using st_intersection()
. Due to the complexity of municipal polygons, this operation may take considerable time. It is highly recommended to perform it once and save the result:
⚠️ This is the time-consuming step:
<- st_intersection(
wgrid |> select(ID),
grid |> select(CVEGEO))
mapa
st_write(wgrid, "in/wgrid.shp")
Once the intersection is completed, we calculate the proportion of each polygon’s area used as weight:
<- st_read("in/wgrid.shp")
wgrid
<- wgrid |>
weights_df mutate(
area = st_area(geometry),
weight = as.numeric(area) / as.numeric(st_area(wgrid[match(ID, wgrid$ID), ]))
|>
) st_drop_geometry() |>
select(ID, CVEGEO, weight)
Creating a Function to Process UTCI Files
Now we define a function to process a NetCDF file and compute weighted UTCI values for each municipality:
<- function(path_nc, weights_df) {
process_utci_with_intersection <- terra::rast(path_nc)
r <- max(r) - 273.15 # Convert from Kelvin to Celsius
utci_day
# Extract values
<- as.data.frame(utci_day, xy = TRUE, cells = TRUE)
df_utci
<- df_utci |>
df_utci rename(ID = cell, MAX = 4) # The fourth column contains the value
# Convert to sf
<- st_as_sf(df_utci, coords = c("x", "y"), crs = crs(r)) |>
utci_sf st_transform(4326)
# Join with weights
<- inner_join(utci_sf, weights_df,
df_merge by = "ID")
# Aggregate by municipality
<- df_merge |>
df_mun group_by(CVEGEO) |>
summarise(
MAX = sum(MAX * weight, na.rm = TRUE) / sum(weight[!is.na(MAX)]),
.groups = "drop"
)
# Extract and add date
<- stringr::str_extract(basename(path_nc), "\\d{8}")
fecha $DATE <- as.Date(fecha, format = "%Y%m%d")
df_mun
return(df_mun)
}
Applying the Function to 31 Days of Data
We apply the function to the 31 UTCI raster files downloaded for May 2019:
# Apply function
<- purrr::map_dfr(
UTCI_mun
files,~process_utci_with_intersection(.x, weights_df))
Finally, we extract the day, month, and year from the DATE variable. This step will be especially useful as we incorporate additional months and years into the analysis, allowing for easier temporal aggregation, filtering, and visualization across multiple time scales.
<- UTCI_mun |>
UTCI_mun mutate(DATE = as.Date(DATE)) |>
mutate(
YEAR = year(DATE),
MONTH = month(DATE),
DAY = day(DATE)
|>
) st_drop_geometry()
📈 Plot 1: UTCI Distribution – Average, Percentiles and Heat Threshold
We plot:
Mean daily maximun UTCI
25th and 75th percentiles (in light gray)
A fixed 32°C threshold (horizontal red line)
Code
# Percentile bands and thresholds
|>
UTCI_mun group_by(YEAR, MONTH, DAY) |>
summarise(
MeanMAX = mean(MAX),
MAX25 = quantile(MAX, .25, na.rm = TRUE),
MAX75 = quantile(MAX, .75, na.rm = TRUE)
|>
) ungroup() |>
ggplot() +
geom_line(aes(x = DAY, y = MeanMAX), linewidth = 0.85) +
geom_line(aes(x = DAY, y = MAX25), color = "gray") +
geom_line(aes(x = DAY, y = MAX75), color = "gray") +
geom_hline(yintercept = 32, color = "red", linewidth = 0.85, alpha = 0.9) +
scale_y_continuous(limits = c(17, 40),
breaks = seq(10,40,1)) +
scale_x_continuous(breaks = seq(1, 31, by = 1)) +
theme_bw() +
labs(
x = "Day of the Month",
y = "UTCI (Daily Maximum)",
title = "Average UTCI in Mexico by Day of May (2019)",
subtitle = "Red: Absolute threshold (32°C)",
caption = "Source: UTCI based on ERA5 reanalysis data. Own elaboration."
+
) theme(strip.background = element_rect(fill = "white"),
strip.text = element_text(face = "bold", size = 11))
📌 During May, the absolute threshold (32°C) is exceeded by all the daily averages.
🗺️ Map 1: Average UTCI by Municipality
As a first spatial exploration, we plot the average daily maximum UTCI for each municipality. This provides a baseline spatial understanding of where thermal comfort conditions tend to be most extreme.
Code
# Average MAX UTCI
<- UTCI_mun |>
summary group_by(YEAR) |>
summarise(Mean = mean(MAX, na.rm = TRUE)) |>
ungroup() |>
st_drop_geometry()
# Map
<- mapa |>
map rename(CVEGEO = cvegeo) |>
# Fast merge
left_join(UTCI_mun |>
group_by(CVEGEO, YEAR) |>
summarise(Mean = mean(MAX, na.rm = TRUE)) |>
ungroup() |>
mutate(CVEGEO = str_pad(CVEGEO, width = 5,
side = "left", pad = "0")),
by = c("CVEGEO")) |>
# Plot:
ggplot() +
geom_sf(aes(fill = Mean), color = NA) +
geom_sf(data = mapa2, fill = NA, color = "white", linewidth = 0.2) +
scale_fill_viridis_c(option = "magma",
direction = -1,
name = "UTCI") +
theme_void() +
theme(legend.position = c(.01,.3),
plot.background = element_rect(fill = "white",
color = "white"),
plot.margin = unit(c(.2, .2, .2, .2),
"cm"))
# Distribution
<- UTCI_mun |>
dist group_by(CVEGEO, YEAR) |>
summarise(Mean = mean(MAX, na.rm = TRUE)) |>
ungroup() |>
mutate(CVEGEO = str_pad(CVEGEO, width = 5,
side = "left", pad = "0")) |>
# Plot:
ggplot() +
geom_density(aes(x = Mean, y = ..count..),
fill = "#5d1f61",
color = "#5d1f61",
linewidth = .8,
alpha = .5) +
geom_vline(xintercept = summary$Mean[1],
color = "#5d1f61", alpha = .85)+
scale_x_continuous(breaks = seq(20,40,1))+
labs(subtitle = "Number of Municipalities by degree of UTCI:",
color = NULL,
fill = NULL,
y = NULL,
x = NULL)+
theme_minimal() +
theme(legend.position = c(.9,.7))
# Paste Maps and Distribution using patchwork library:
/dist+plot_layout(heights = c(2.4,1.2))+
mapplot_annotation(title = "Average Daily MAX UTCI (°C) by Municipality in May 2019",
caption = "Source: ERA5 reanalysis, Copernicus. Own elaboration using municipal geometries from INEGI.")
📌 This map reveals consistent regional patterns, with higher levels of heat stress in coastal and southeastern areas of Mexico.
Defining the Heat Measure
In this analysis, we focus on a classic absolute measure of extreme heat, which allows for global comparability. While it is possible to construct relative measures (to capture local anomalies), here we focus solely on a fixed threshold to better align with international heat stress standards.
🌡️ Absolute Heat Threshold
We define extreme heat days as those in which the UTCI reaches or exceeds 32°C, corresponding to the “Strong Heat Stress” category in international classifications. This fixed threshold enables temporal and spatial comparisons, independent of local climate history or adaptation levels.
UTCI heat stress classifications:
Moderate Heat Stress (26–32°)
Strong Heat Stress (32–38°)
Very Strong Heat Stress (38–46°)
Extreme Heat Stress (>46°C)
We identify all days meeting or exceeding the 32°C threshold:
<- UTCI_mun |>
UTCI_mun group_by(CVEGEO) |>
mutate(absoluto = case_when(MAX >= 32 ~ 1,
TRUE ~ 0)) |>
ungroup()
Municipal-Level Distributions of Heat Days: We create a summarize dataset by calculate the total number of heat days per municipality (UTCI_mun_summary
). However, our complete dataset (UTCI_mun
) will still be useful.
# Total heat days per municipality and year
<- UTCI_mun |>
UTCI_mun_summary group_by(CVEGEO, YEAR) |>
summarise(
Absoluto = sum(absoluto, na.rm = TRUE),
Mean = mean(MAX, na.rm = TRUE),
|>
) ungroup() |>
mutate(CVEGEO = str_pad(CVEGEO, width = 5,
side = "left", pad = "0"))
📈 Plot 2: Number of Municipalities Exceeding Threshold
The goal for the following plot is to show, for each day of May 2019, how many municipalities exceeded the absolute threshold (32°C). This gives us a sense of geographic spread of extreme heat across the country.
Code
|>
UTCI_mun group_by(YEAR, MONTH, DAY) |>
summarise(absoluto = sum(absoluto, na.rm = T)) |>
ungroup() |>
mutate(fecha = as.Date(paste(YEAR, MONTH, DAY, sep = "-"),
format = "%Y-%m-%d"),
day_of_year = as.numeric(format(fecha, "%j"))) |>
ggplot() +
geom_line(aes(x = DAY, y = absoluto)) +
scale_y_continuous(breaks = seq(0, 1800, by = 250),
limits = c(0, 1800)) +
scale_x_continuous(breaks = seq(1, 31, by = 1)) +
theme_bw() +
labs(
x = "Month of the Year",
y = "Number of Municipalities",
title = "Municipalities Exceeding Heat Threshold in 2020",
caption = "Source: UTCI based on ERA5 reanalysis data. Own elaboration."
+
) theme(strip.background = element_rect(fill = "white"),
strip.text = element_text(face = "bold", size = 11))
🗺️ Map 2: Municipal Heat Days
We visualize how the number of extreme heat days varies across municipalities. This map displays the total number of severe heat days (≥32°C) in each municipality.
Code
# Map
|>
mapa rename(CVEGEO = cvegeo) |>
left_join(UTCI_mun_summary |>
mutate(CVEGEO = str_pad(CVEGEO, width = 5,
side = "left", pad = "0")),
by = c("CVEGEO")) |>
# Plot:
ggplot() +
geom_sf(aes(fill = Absoluto), color = NA) +
geom_sf(data = mapa2, fill = NA, color = "white", linewidth = 0.2) +
scale_fill_viridis_c(option = "magma",
direction = -1,
name = "Days:") +
theme_void() +
theme(legend.position = c(.1,.3),
plot.background = element_rect(fill = "white",
color = "white"),
plot.margin = unit(c(.2, .2, .2, .2),
"cm"))+
# Jitter plot
|>
mapa rename(CVEGEO = cvegeo) |>
left_join(UTCI_mun_summary |>
mutate(CVEGEO = str_pad(CVEGEO, width = 5,
side = "left", pad = "0")),
by = c("CVEGEO")) |>
st_drop_geometry() |>
left_join(mapa2 |>
clean_names() |>
rename(name_state = nomgeo),
by = "cve_ent") |>
group_by(name_state) |>
mutate(MeanAbs = mean(Absoluto, na.rm = T)) |>
ggplot() +
geom_jitter(aes(x = Absoluto,
y = fct_reorder(name_state, MeanAbs),
group = name_state,
color = Absoluto),
size = .17,
alpha = .8) +
scale_color_viridis_c(option = "magma",
direction = -1,
name = "Days:") +
theme_bw() +
theme(legend.position = "none")+
labs(x = NULL, y = NULL)+
plot_layout(widths = c(4,1))+
plot_annotation(title = "Severe Heat Days by Absolute Measure",
caption = "ERA5 reanalysis, Copernicus. Own elaboration using municipal geometries from INEGI.")
📌 Regions in northern deserts (e.g., Sonora, Coahuila) and along the Pacific and Gulf coasts consistently experience the highest number of extreme heat days. These patterns are persistent and align with the mean UTCI trends shown in previous maps.
Final Remarks
By focusing on an absolute threshold (UTCI ≥ 32°C), this entry lays the groundwork for internationally comparable indicators of heat exposure. Future work will expand on these foundations by incorporating relative thresholds, population vulnerability, and health system capacity.
This Code & Data entry is part of ongoing research by the CACHE initiative, and aims to promote transparency, reproducibility, and collaboration around climate-demographic methods.
Footnotes
Di Napoli C., Barnard C., Prudhomme C., Cloke HL and Pappenberger F. (2020): Thermal comfort indices derived from ERA5 reanalysis. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). DOI: 10.24381/cds.553b7518 (Accessed on 01-11-2024)↩︎