Severe Heat Days using the Universal Thermal Comfort Index

Author: Marcial Yangali (marcialy@colmex.mx)

Reviewers: Landy Sánchez and Emerson Baptista

Overview

This Code & Data is part of the CACHE research initiative on Aging, Climate, and Health, specifically the demonstration project on Heat, Disability in Older Adults, and Care. This entry shows how to construct heat measures using the Universal Thermal Comfort Index (UTCI). We detail the data wrangling, analysis, and visualization process in R used to construct the variable days of severe heat define as any day reaching 32° C of UTCI or above (strong, very strong, or extreme heat stress). We demonstrate this process with data from Mexico, but, of course, the procedure can be replicated for any other part of the world.

Required R Packages 🛠️

We begin by loading the necessary R packages. These support data manipulation (mainly dplyr but also janitor, stringr , tidyr, lubridate and forcats), visualization (ggplot2, patchwork), and geospatial operations (sf, units, terra).

if (!require("pacman")) install.packages("pacman")
pacman::p_load(
  # Data handling
  dplyr, janitor, stringr, tidyr, forcats, lubridate, purrr,
  # Visualization
  ggplot2, patchwork,
  # Spatial data
  sf,
  units,
  terra)

Heat Data

1. Universal Thermal Climate Index (UTCI)

The Universal Thermal Climate Index (UTCI) is a bioclimatic index that evaluates the physiological comfort of the human body under certain meteorological conditions (Brödel et al. 2012). The UTCI is derived from the UTCI-Fiala model (Fiala et al., 2012), which integrates a dynamic thermoregulation framework of the human body along with a clothing insulation model that varies with temperature. Overall, UTCI offers an assessment of the “apparent” temperature that our body would perceive in specific environmental conditions, specifically, air temperature, wind, humidity, and radiation.

As the UTCI incorporates more variables affecting thermal comfort, it seeks to provide a more realistic picture of the thermal stress perceived by the humans and their behavioral responses, particularly in outdoor conditions. Compared to other heat indices, UTCI is more robust and flexible in capturing thermal stress across a wide range of climates and extreme weather conditions. Due to its adaptability and comprehensive design, UTCI was selected as the primary measure to evaluate heat stress in our study across Mexican municipalities.

The UTCI data comes from Thermal comfort indices derived from ERA5 reanalysis, also known as ERA5-HEAT (Human thErmAl comforT) produced by Copernicus Climate Change Service1. This dataset provides hourly global raster images based on ERA5 reanalysis. For our analysis, we downloaded from the daily UTCI raster files Data Store for the period from May 1st to May 31st, 2019. Since our focus is on Mexico, we specified a sub-region extraction using bounding coordinates that encompass the national territory: 33.0°N (North), -118.5°E (East), 14.0°N (South), and -86.0°E (West). All downloaded data were organized into a dedicated subfolder named UTCI within the main in directory.

# We save the names of the raster images in a list
files <- list.files("in/UTCI", pattern = "\\.nc$", full.names = TRUE)

2. Geographic Data – Mexican Municipalities and States

In order to spatially visualize the distribution of heat exposure, we use official geographic boundaries at the municipal and state levels from the 2020 Census. We are using the shapefiles 00mun and 00ent from the downloaded folder: mg_2020_integrado/conjunto_de_datos.

# Municipal geometries
mapa <- read_sf("in/GEO/00mun.shp") |>
  st_transform(st_crs(4326)) |>  
  clean_names()
# State level geometries
mapa2 <- read_sf("in/GEO/00ent.shp") |>
  st_transform(st_crs(4326))

From Raster to Vector Data

We begin by exploring one of the raster images we have downloaded:

# 1. Open a NetCDF file and store it as object 'r'
r <- rast(files[1]) # SpatRaster object

# 1.1. Visualize the raster layer
plot(r[[1]]) 

The image reveals the pixel size (0.25 x 0.25 degrees). We will use these dimensions to create a regular grid of square polygons and examine how they intersect with the geometries of Mexico.

# 2. Get the center coordinates of each cell
coords <- as.data.frame(r, xy = TRUE, cells = TRUE)[, c("x", "y", "cell")]

# 3. Create polygons (0.25° x 0.25° squares) centered on each point
res <- 0.25  # resolution in degrees

# Use purrr::pmap to build each polygon
polys <- coords |> 
  mutate(
    geometry = purrr::pmap(list(x, y), function(x, y) {
      st_polygon(list(matrix(c(
        x - res/2, y - res/2,
        x + res/2, y - res/2,
        x + res/2, y + res/2,
        x - res/2, y + res/2,
        x - res/2, y - res/2
      ), ncol = 2, byrow = TRUE)))
    })
  )

# 4. Convert to an sf object
grid <- st_sf(ID = polys$cell, geometry = st_sfc(polys$geometry, crs = 4326))

grid |> 
  ggplot() +
  geom_sf() +
  geom_sf(data = mapa, fill = NA, color = "darkgreen")

Weighted Values

To obtain a summary value per municipality (e.g., daily mean or maximum UTCI), we must account for all pixels within each municipal boundary. However, values must be adjusted according to the proportion of each pixel that lies within a given municipality — the weight. These weights are calculated using st_intersection(). Due to the complexity of municipal polygons, this operation may take considerable time. It is highly recommended to perform it once and save the result:

⚠️ This is the time-consuming step:

wgrid <- st_intersection(
     grid |> select(ID),
     mapa |> select(CVEGEO))

st_write(wgrid, "in/wgrid.shp")

Once the intersection is completed, we calculate the proportion of each polygon’s area used as weight:

wgrid <- st_read("in/wgrid.shp")

weights_df <- wgrid |> 
    mutate(
      area = st_area(geometry),
      weight = as.numeric(area) / as.numeric(st_area(wgrid[match(ID, wgrid$ID), ]))
    ) |> 
    st_drop_geometry() |> 
    select(ID, CVEGEO, weight)

Creating a Function to Process UTCI Files

Now we define a function to process a NetCDF file and compute weighted UTCI values for each municipality:

process_utci_with_intersection <- function(path_nc, weights_df) {
  r <- terra::rast(path_nc)
  utci_day <- max(r) - 273.15  # Convert from Kelvin to Celsius

  # Extract values
  df_utci <- as.data.frame(utci_day, xy = TRUE, cells = TRUE)

  df_utci <- df_utci |> 
    rename(ID = cell, MAX = 4)  # The fourth column contains the value

  # Convert to sf
  utci_sf <- st_as_sf(df_utci, coords = c("x", "y"), crs = crs(r)) |>
    st_transform(4326)

  # Join with weights
  df_merge <- inner_join(utci_sf, weights_df,
                         by = "ID")

  # Aggregate by municipality
  df_mun <- df_merge |>
    group_by(CVEGEO) |>
    summarise(
      MAX = sum(MAX * weight, na.rm = TRUE) / sum(weight[!is.na(MAX)]),
      .groups = "drop"
    )

  # Extract and add date
  fecha <- stringr::str_extract(basename(path_nc), "\\d{8}")
  df_mun$DATE <- as.Date(fecha, format = "%Y%m%d")

  return(df_mun)
}

Applying the Function to 31 Days of Data

We apply the function to the 31 UTCI raster files downloaded for May 2019:

# Apply function
UTCI_mun <- purrr::map_dfr(
    files,
    ~process_utci_with_intersection(.x, weights_df))

Finally, we extract the day, month, and year from the DATE variable. This step will be especially useful as we incorporate additional months and years into the analysis, allowing for easier temporal aggregation, filtering, and visualization across multiple time scales.

UTCI_mun <- UTCI_mun |>
  mutate(DATE = as.Date(DATE)) |> 
  mutate(
    YEAR = year(DATE),
    MONTH = month(DATE),
    DAY = day(DATE)
  ) |> 
  st_drop_geometry()

📈 Plot 1: UTCI Distribution – Average, Percentiles and Heat Threshold

We plot:

  • Mean daily maximun UTCI

  • 25th and 75th percentiles (in light gray)

  • A fixed 32°C threshold (horizontal red line)

Code
# Percentile bands and thresholds
UTCI_mun |>
  group_by(YEAR, MONTH, DAY) |>
  summarise(
    MeanMAX = mean(MAX),
    MAX25 = quantile(MAX, .25, na.rm = TRUE),
    MAX75 = quantile(MAX, .75, na.rm = TRUE)
  ) |>
  ungroup() |> 
  ggplot() +
  geom_line(aes(x = DAY, y = MeanMAX), linewidth = 0.85) +
  geom_line(aes(x = DAY, y = MAX25), color = "gray") +
  geom_line(aes(x = DAY, y = MAX75), color = "gray") +
  geom_hline(yintercept = 32, color = "red", linewidth = 0.85, alpha = 0.9) +
  scale_y_continuous(limits = c(17, 40),
                     breaks = seq(10,40,1)) +
  scale_x_continuous(breaks = seq(1, 31, by = 1)) +
  theme_bw() +
  labs(
    x = "Day of the Month",
    y = "UTCI (Daily Maximum)",
    title = "Average UTCI in Mexico by Day of May (2019)",
    subtitle = "Red: Absolute threshold (32°C)",
    caption = "Source: UTCI based on ERA5 reanalysis data. Own elaboration."
  ) +
  theme(strip.background = element_rect(fill = "white"),
        strip.text = element_text(face = "bold", size = 11))

📌 During May, the absolute threshold (32°C) is exceeded by all the daily averages.

🗺️ Map 1: Average UTCI by Municipality

As a first spatial exploration, we plot the average daily maximum UTCI for each municipality. This provides a baseline spatial understanding of where thermal comfort conditions tend to be most extreme.

Code
# Average MAX UTCI
summary <- UTCI_mun |> 
  group_by(YEAR) |> 
  summarise(Mean = mean(MAX, na.rm = TRUE)) |> 
  ungroup() |> 
    st_drop_geometry()
    
# Map
map <- mapa |> 
    rename(CVEGEO = cvegeo) |> 
    # Fast merge
  left_join(UTCI_mun |> 
  group_by(CVEGEO, YEAR) |> 
  summarise(Mean = mean(MAX, na.rm = TRUE)) |> 
  ungroup() |> 
      mutate(CVEGEO = str_pad(CVEGEO, width = 5, 
                            side = "left", pad = "0")), 
  by = c("CVEGEO")) |>
    # Plot:
    ggplot() +
  geom_sf(aes(fill = Mean), color = NA) +
  geom_sf(data = mapa2, fill = NA, color = "white", linewidth = 0.2) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       name = "UTCI") +
    theme_void() +
    theme(legend.position = c(.01,.3),
          plot.background = element_rect(fill = "white",
                                         color = "white"),
          plot.margin = unit(c(.2, .2, .2, .2),
                             "cm"))

# Distribution
dist <- UTCI_mun |> 
  group_by(CVEGEO, YEAR) |> 
  summarise(Mean = mean(MAX, na.rm = TRUE)) |> 
  ungroup() |> 
      mutate(CVEGEO = str_pad(CVEGEO, width = 5, 
                            side = "left", pad = "0")) |>
    # Plot:
    ggplot() +
    geom_density(aes(x = Mean, y = ..count..),
                 fill = "#5d1f61",
                 color = "#5d1f61",
                 linewidth = .8,
                 alpha = .5) +
    geom_vline(xintercept = summary$Mean[1], 
               color = "#5d1f61", alpha = .85)+
    scale_x_continuous(breaks = seq(20,40,1))+
  labs(subtitle = "Number of Municipalities by degree of UTCI:",
      color = NULL,
      fill = NULL,
       y = NULL,
       x = NULL)+
    theme_minimal() +
    theme(legend.position = c(.9,.7))
# Paste Maps and Distribution using patchwork library:
map/dist+plot_layout(heights = c(2.4,1.2))+
    plot_annotation(title = "Average Daily MAX UTCI (°C) by Municipality in May 2019",
       caption = "Source: ERA5 reanalysis, Copernicus. Own elaboration using municipal geometries from INEGI.")

📌 This map reveals consistent regional patterns, with higher levels of heat stress in coastal and southeastern areas of Mexico.

Defining the Heat Measure

In this analysis, we focus on a classic absolute measure of extreme heat, which allows for global comparability. While it is possible to construct relative measures (to capture local anomalies), here we focus solely on a fixed threshold to better align with international heat stress standards.

🌡️ Absolute Heat Threshold

We define extreme heat days as those in which the UTCI reaches or exceeds 32°C, corresponding to the “Strong Heat Stress” category in international classifications. This fixed threshold enables temporal and spatial comparisons, independent of local climate history or adaptation levels.

UTCI heat stress classifications:

  • Moderate Heat Stress (26–32°)​

  • Strong Heat Stress (32–38°)​

  • Very Strong Heat Stress (38–46°)​

  • Extreme Heat Stress (>46°C)

We identify all days meeting or exceeding the 32°C threshold:

UTCI_mun <- UTCI_mun |> 
  group_by(CVEGEO) |> 
  mutate(absoluto = case_when(MAX >= 32 ~ 1,
                              TRUE ~ 0)) |> 
  ungroup()

Municipal-Level Distributions of Heat Days: We create a summarize dataset by calculate the total number of heat days per municipality (UTCI_mun_summary). However, our complete dataset (UTCI_mun) will still be useful.

# Total heat days per municipality and year
UTCI_mun_summary <- UTCI_mun |> 
  group_by(CVEGEO, YEAR) |> 
  summarise(
    Absoluto = sum(absoluto, na.rm = TRUE),
    Mean = mean(MAX, na.rm = TRUE),
  ) |> 
  ungroup() |> 
    mutate(CVEGEO = str_pad(CVEGEO, width = 5, 
                            side = "left", pad = "0"))

📈 Plot 2: Number of Municipalities Exceeding Threshold

The goal for the following plot is to show, for each day of May 2019, how many municipalities exceeded the absolute threshold (32°C). This gives us a sense of geographic spread of extreme heat across the country.

Code
UTCI_mun |>
  group_by(YEAR, MONTH, DAY) |> 
  summarise(absoluto = sum(absoluto, na.rm = T)) |> 
  ungroup() |> 
  mutate(fecha = as.Date(paste(YEAR, MONTH, DAY, sep = "-"),
                         format = "%Y-%m-%d"),
         day_of_year = as.numeric(format(fecha, "%j"))) |> 
  ggplot() +
  geom_line(aes(x = DAY, y = absoluto)) +
  scale_y_continuous(breaks = seq(0, 1800, by = 250),
                     limits = c(0, 1800)) +
  scale_x_continuous(breaks = seq(1, 31, by = 1)) +
  theme_bw() +
  labs(
    x = "Month of the Year",
    y = "Number of Municipalities",
    title = "Municipalities Exceeding Heat Threshold in 2020",
    caption = "Source: UTCI based on ERA5 reanalysis data. Own elaboration."
  ) +
  theme(strip.background = element_rect(fill = "white"),
        strip.text = element_text(face = "bold", size = 11))

🗺️ Map 2: Municipal Heat Days

We visualize how the number of extreme heat days varies across municipalities. This map displays the total number of severe heat days (≥32°C) in each municipality.

Code
# Map
mapa |> 
    rename(CVEGEO = cvegeo) |> 
  left_join(UTCI_mun_summary |> 
      mutate(CVEGEO = str_pad(CVEGEO, width = 5, 
                            side = "left", pad = "0")), 
  by = c("CVEGEO")) |>
    # Plot:
    ggplot() +
  geom_sf(aes(fill = Absoluto), color = NA) +
  geom_sf(data = mapa2, fill = NA, color = "white", linewidth = 0.2) +
  scale_fill_viridis_c(option = "magma",
                       direction = -1,
                       name = "Days:") +
    theme_void() +
    theme(legend.position = c(.1,.3),
          plot.background = element_rect(fill = "white",
                                         color = "white"),
          plot.margin = unit(c(.2, .2, .2, .2),
                             "cm"))+
# Jitter plot
mapa |> 
    rename(CVEGEO = cvegeo) |> 
  left_join(UTCI_mun_summary |> 
      mutate(CVEGEO = str_pad(CVEGEO, width = 5, 
                            side = "left", pad = "0")), 
  by = c("CVEGEO")) |>
    st_drop_geometry() |> 
    left_join(mapa2 |> 
                 clean_names() |> 
                  rename(name_state = nomgeo),
              by = "cve_ent") |> 
    group_by(name_state) |> 
    mutate(MeanAbs = mean(Absoluto, na.rm = T)) |> 
    ggplot() +
  geom_jitter(aes(x = Absoluto,
                  y = fct_reorder(name_state, MeanAbs),
                  group = name_state,
      color = Absoluto),
      size = .17,
      alpha = .8) +
  scale_color_viridis_c(option = "magma",
                       direction = -1,
                       name = "Days:") +
    theme_bw() +
    theme(legend.position = "none")+
    labs(x = NULL, y = NULL)+
    plot_layout(widths = c(4,1))+
    plot_annotation(title = "Severe Heat Days by Absolute Measure",
                    caption = "ERA5 reanalysis, Copernicus. Own elaboration using municipal geometries from INEGI.")

📌 Regions in northern deserts (e.g., Sonora, Coahuila) and along the Pacific and Gulf coasts consistently experience the highest number of extreme heat days. These patterns are persistent and align with the mean UTCI trends shown in previous maps.

Final Remarks

By focusing on an absolute threshold (UTCI ≥ 32°C), this entry lays the groundwork for internationally comparable indicators of heat exposure. Future work will expand on these foundations by incorporating relative thresholds, population vulnerability, and health system capacity.

This Code & Data entry is part of ongoing research by the CACHE initiative, and aims to promote transparency, reproducibility, and collaboration around climate-demographic methods.

Footnotes

  1. Di Napoli C., Barnard C., Prudhomme C., Cloke HL and Pappenberger F. (2020): Thermal comfort indices derived from ERA5 reanalysis. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). DOI: 10.24381/cds.553b7518 (Accessed on 01-11-2024)↩︎