Aging and disability in the Mexican Population

Author: Marcial Yangali (marcialy@colmex.mx) and Mariana Ramos

Reviewers: Emerson Baptista and Landy Sánchez

Overview

This Code & Data is part of the CACHE research initiative on Aging, Climate, and Health, specifically the demonstration project on Heat, Disability in Older Adults, and Care. Older adults are particularly vulnerable to heat due to age-related physiological changes and chronic health conditions. Those challenges are even greater for those experiencing a disability, because of compromise health, reduced everyday functionally and difficulties to received help during extreme weather events.

This entry shows how to construct disability measures using the international recommentation of the Washington Group. We detail the data wrangling, analysis, and visualization process in R used to construct a disability measure that accounts for multiple disabilities and severity degrees. We show estimations for Mexico using census population data. Similar procedures can be used to analyze disability for any other source who meet the Washington group criterion.

Required R Packages 🛠️

We begin by loading the necessary R packages. These support data manipulation (mainly dplyr but also stringr, ipumsr, tidyr, srvyr, gtsummary, purr, and labelled), visualization (ggplot2, patchwork, geofacet, scales and plotly), and geospatial operations (sf, biscale, ggtern).

if (!require("pacman")) install.packages("pacman")
pacman::p_load(
    #IPUMS
    ipumsr,
    #Manipulation & analysis
    dplyr, tidyr, srvyr, gtsummary, labelled, purr,
    # Data Viz
    ggplot2, patchwork, geofacet, plotly, scales,
    # For Spatial Data
    sf, biscale, ggtern)

Data

Mexican Population Census 2020 (Sample from IPUMS)

We use the 2020 Mexican Population Census. In 2020, Mexico adopted the Washington Group short set questions (WGSS) to measure disability in the Population and Housing Census. Those questions identify functional limitations in six domains: seeing, hearing, walking, remembering, self-care, and communication (see below for a detailed discussion). Our data comes from IPUMS International. IPUMS provides the option to select variables based on the WGSS, which are labeled with the prefix ‘WG’ and are part of the harmonized variable set.

# Read the IPUMS metadata file (DDI = Data Documentation Initiative)
ddi <- read_ipums_ddi("in/IPUMS_MEXICO/ipumsi_00006.xml")
# Load microdata using the metadata
data <- read_ipums_micro(ddi)

Estimate the effective sample size as a percentage of the total population (%):

nrow(data)/sum(data$PERWT)*100

[1] 5.982092

Measures of disability: the Washington Group Categories

Washington Group Short Set of Questions on Disability (WGSS)

The Washington Group on Disability Statistics developed the WG Short Set (WGSS) of six questions for use in national censuses and surveys to assess disability condition. These questions reflect advancements in the conceptualization and understanding of disability and are based on the World Health Organization’s International Classification of Functioning, Disability, and Health (ICF).

This approach breaks from previous ones that medicalized disabilities, viewing them solely as a condition defined by limitations in bodily functions of the individual. The ICF introduces a bio-psychosocial model where disability is the result of the interaction between a person’s capabilities (limitations in functioning) and environmental barriers (physical, social, cultural, etc.) that may hinder their participation in society.

In the 2020 Mexican Housing and Population Census, this approach was implemented for measuring disability seeing, hearing, walking, remembering, self-care, and communication. Using questions such as:

In your daily life, how much diﬀiculty does (NAME) have to:

see, even wearing glasses?
hear, even using a hearing aid?
walk, go up or down?
remember or concentrate?
bathing, dressing, or eating?
speak or communicate?

And the answers are:

He/She does it with no diﬀiculty
He/She does it with some diﬀiculty
He/She does it with a lot of diﬀiculty
He/She cannot do it at all

Response can be analyzed by severity of the difficulty, and there is an extra questions regarding mental problems.

Converting WG variables into categorical labels

To work with readable response categories (rather than numeric codes), we first convert WG variables into labeled factors.

# Convert the six WG variables into factor variables to display category labels
data <- data |> 
    mutate(WGCARE = as_factor(WGCARE),
           WGCOGN = as_factor(WGCOGN), 
           WGCOMM = as_factor(WGCOMM),
           WGHEAR = as_factor(WGHEAR),
           WGMOBIL = as_factor(WGMOBIL),
           WGVISION = as_factor(WGVISION))

# Check frequency distribution for one variable as an example
table(data$WGCARE)


        No difficulty       Some difficulty   A lot of difficulty 
              7356695                 68476                 41216 
     Cannot do at all               Unknown NIU (not in universe) 
                33636                  5194                     0

Code

# names(GEOLEV1)
pop_mun <- data |>
    mutate(AGE65 = case_when(AGE >= 65 ~ "65+",
                             T ~ "65-")) |> 
    group_by(GEOLEV2, AGE65) |>
    summarise(Count = sum(PERWT), .groups = "drop")

save(pop_mun, file = "pop_mun_IPUMS.RDATA")
sum(pop_mun$Count)

[1] 125461406

Separating the population by disability status

We divide the dataset into two groups:

People without disabilities (all responses are “No difficulty”)
People with at least one functional limitation

# People with no disability at all (all answers are "No difficulty")
popnodis <- data |> 
  filter(if_all(c(WGCARE, WGCOGN, WGCOMM, WGHEAR, WGMOBIL, WGVISION),
                ~ . == "No difficulty"))

# People with at least one functional limitation
popdis <- data |> 
  filter(if_any(c(WGCARE, WGCOGN, WGCOMM, WGHEAR, WGMOBIL, WGVISION),
                ~ . != "No difficulty"))

Estimate the proportion of people with disabilities (unweighted and weighted):

# Unweighted percentage
nrow(popdis) / nrow(data) * 100

[1] 16.5003

# Weighted percentage using census sample weights (PERWT)
sum(popdis$PERWT) / sum(data$PERWT) * 100

[1] 15.71811

Absolute and Relative Frequencies of WG Categories

We now collapse responses into three groups by the severity of the difficulty:

“No difficulty”
“Milder”: corresponds to “Some difficulty”
“Severe”: includes both “A lot of difficulty” and “Cannot do at all”

# Save original variable labels
labels <- var_label(popdis)

# Collapse response categories
popdis <- popdis |> 
  mutate(across(c(WGCARE, WGCOGN, WGCOMM, WGHEAR, WGMOBIL, WGVISION), 
                ~ case_when(
                  . == "Some difficulty" ~ "Milder",
                  . %in% c("A lot of difficulty", "Cannot do at all") ~ "Severe",
                  TRUE ~ as.character(.))))

# Restore original labels
var_label(popdis) <- labels

Now, we compute relative frequencies using survey weights:

# Create survey design object with weights
popdis_svy <- popdis |> 
  as_survey_design(weights = PERWT)

# Generate a weighted summary table of WG domains
popdis_svy |> 
  select(WGCARE, WGCOGN, WGCOMM, WGHEAR, WGMOBIL, WGVISION) |> 
  tbl_svysummary(
    percent = "column",
    missing = "ifany") |> 
  as_gt()

Characteristic	N = 19,720,160¹
Difficulty with self-care (Washington group)
Milder	1,049,606 (5.3%)
No difficulty	17,365,336 (88%)
Severe	1,133,094 (5.7%)
Unknown	172,124 (0.9%)
Difficulty remembering or concentrating (Washington group)
Milder	3,304,022 (17%)
No difficulty	15,150,712 (77%)
Severe	1,088,698 (5.5%)
Unknown	176,728 (0.9%)
Difficulty communicating (Washington group)
Milder	974,950 (4.9%)
No difficulty	17,683,708 (90%)
Severe	887,600 (4.5%)
Unknown	173,902 (0.9%)
Difficulty hearing (Washington group)
Milder	3,499,498 (18%)
No difficulty	14,776,140 (75%)
Severe	1,271,674 (6.4%)
Unknown	172,848 (0.9%)
Difficulty walking or climbing stairs (Washington group)
Milder	5,387,834 (27%)
No difficulty	11,240,080 (57%)
Severe	2,921,346 (15%)
Unknown	170,900 (0.9%)
Difficulty seeing (Washington group)
Milder	9,181,344 (47%)
No difficulty	7,801,176 (40%)
Severe	2,568,836 (13%)
Unknown	168,804 (0.9%)
¹ n (%)

Counting Multiple Disabilities

Since one person can experience multiple disabilities, we compute the number of severe and milder limitations per person:

# Count how many domains are classified as severe/milder for each individual
popdis <- popdis %>% 
  mutate(
    num_disabilities_severe = rowSums(select(., starts_with("WG")) == "Severe"),
    num_disabilities_milder = rowSums(select(., starts_with("WG")) == "Milder"))

We then redefine milder disability cases to exclude individuals who also have at least one severe limitation:

popdis <- popdis |> 
    mutate(num_disabilities_milder = case_when(
        num_disabilities_severe >= 1 ~ 0,
        T ~ num_disabilities_milder))

Reshaping the Data by Disability Type, Age, and Sex

We aggregate the population by:

Disability type (WG domain, multiple, milder, none)
Age group (5-year intervals)
Sex
State (GEOLEV1)

# List of WG disability variables
disabilities <- names(popdis)[grepl("^WG", names(popdis))]

# Define 5-year age groups
age_breaks <- c(seq(0, 100, by = 5), Inf)
age_labels <- c(paste(seq(0, 95, by = 5), seq(4, 99, by = 5), sep = "-"), "100+")

# Create data for individuals with only one severe disability
df_one_severe <- purrr::map_dfr(disabilities, function(dis) {
  popdis |>
    filter(num_disabilities_severe == 1,
           !!sym(dis) == "Severe") |>
    mutate(
      AGE5 = cut(AGE, breaks = age_breaks, labels = age_labels, right = FALSE),
      Disability_Type = dis,
      Disability_Level = "Severe"
    ) |>
    group_by(GEOLEV1, SEX, Disability_Type, AGE5) |>
    summarise(Count = sum(PERWT), .groups = "drop")
})

# Individuals with more than one severe disability
df_mult_severe <- popdis |>
  filter(num_disabilities_severe > 1) |>
  mutate(
    AGE5 = cut(AGE, breaks = age_breaks, labels = age_labels, right = FALSE),
    Disability_Type = "DIS_MULT"
  ) |>
  group_by(GEOLEV1, SEX, Disability_Type, AGE5) |>
  summarise(Count = sum(PERWT), .groups = "drop")

# Individuals with only milder disabilities
df_milder <- popdis |>
  filter(num_disabilities_milder >= 1) |>
  mutate(
    AGE5 = cut(AGE, breaks = age_breaks, labels = age_labels, right = FALSE),
    Disability_Type = "DIS_MILDER"
  ) |>
  group_by(GEOLEV1, SEX, Disability_Type, AGE5) |>
  summarise(Count = sum(PERWT), .groups = "drop")

# Individuals with no disability
df_nodis <- popnodis |>
  mutate(
    AGE5 = cut(AGE, breaks = age_breaks, labels = age_labels, right = FALSE),
    Disability_Type = "NO_DIS"
  ) |>
  group_by(GEOLEV1, SEX, Disability_Type, AGE5) |>
  summarise(Count = sum(PERWT), .groups = "drop")

We combine all the groups into a single dataframe:

df_filtered <- bind_rows(df_one_severe, df_mult_severe, df_milder, df_nodis)

Finally, we pivot and organize the final dataset:

# Wide format for easier comparison
df_filtered <- df_filtered |> 
  pivot_wider(names_from = Disability_Type, values_from = Count) |> 
  mutate(across(everything(), ~ replace_na(.x, 0)))

# Final dataset with totals for severe, milder, and no disability
POP_DIS <- df_filtered |> 
  mutate(
    POPDIS_SEVERE = WGCARE + WGCOGN + WGCOMM + WGHEAR + WGMOBIL + WGVISION + DIS_MULT
  ) |> 
  select(GEOLEV1, SEX, AGE5, POPDIS_SEVERE, 
         POPDIS_MILDER = DIS_MILDER, 
         NO_DIS)

Population Pyramids by Disability Status

Now, we visualize the age and sex structure of the population by disability status using population pyramids. A Population Pyramid is a visual depiction of the age and gender distribution within a population. It is fundamentally a bar chart in which each bar corresponds to a particular age group, usually displaying males on the left and females on the right. The height of each bar reflects the quantity or percentage of individuals belonging to that specific age and gender category.

Preparing the Data

# Summarize total population by age group, sex, and disability status
PYRAMID <- POP_DIS |> 
  group_by(AGE5, SEX) |> 
  summarise(
    POPDIS_SEVERE = sum(POPDIS_SEVERE),
    POPDIS_MILDER = sum(POPDIS_MILDER),
    POPNO_DIS = sum(NO_DIS)
  ) |> 
  ungroup() |> 
  # Reshape to long format for plotting
  pivot_longer(cols = POPDIS_SEVERE:POPNO_DIS,
               names_to = "Type",
               values_to = "Pop")

Figure 1. National Population Pyramid (Absolute Numbers)

This plot displays the age-sex structure of the total population, broken down by disability type. It includes non-disabled individuals to contextualize the composition.

Code

PYRAMID |> 
    mutate(Pop = case_when(SEX == 1 ~ Pop*-1,
                           T~Pop)) |> 
    mutate(Type = case_when(
        Type == "POPDIS_SEVERE" ~ "Severe",
        Type == "POPDIS_MILDER" ~ "Milder",
        Type == "POPNO_DIS" ~ "None")) |> 
    mutate(Type = factor(Type, 
                            levels = c(
                                "None",
                                "Milder",
                                "Severe"))) |> 
    ggplot()+
    aes(x = AGE5, y = Pop/1000000)+
    geom_bar(aes(group = Type, fill = Type),
             stat = "identity")+
    coord_flip()+
    scale_fill_viridis_d()+
    scale_y_continuous(labels = abs)+
    theme_minimal()+
    labs(x = NULL, y = "Population in millions",
         fill = "Disability status")

Figure 2. National Population Pyramid (Relative Shares)

This version of the pyramid shows the relative composition of each age group by disability status.

Code

PYRAMID |> 
  mutate(
    Pop = if_else(SEX == 1, -Pop, Pop),
    Type = case_when(
      Type == "POPDIS_SEVERE" ~ "Severe",
      Type == "POPDIS_MILDER" ~ "Milder",
      Type == "POPNO_DIS" ~ "None"
    ),
    Type = factor(Type, levels = c("None", "Milder", "Severe"))
  ) |> 
  ggplot(aes(x = AGE5, y = Pop / 1e6, fill = Type)) +
  geom_bar(stat = "identity", position = "fill") +
  coord_flip() +
  scale_y_continuous(labels = abs, name = "Proportion") +
  scale_fill_viridis_d() +
  theme_minimal() +
  labs(x = NULL, fill = "Disability status")

State-Level Analysis

We repeat the same approach at the state level using facet_geo() from geofacet.

Loading and Joining Spatial Data

We use official geographic boundaries at the state level from the 2020 Census. We are using the shapefile 00ent from the downloaded folder: mg_2020_integrado/conjunto_de_datos.

# Load shapefile with state geometries
mapa <- read_sf("in/GEO/00ent.shp") |> 
  st_transform(st_crs(4326))

# Add state code variable
POP_DIS <- POP_DIS |> 
  mutate(CVE_ENT = substr(GEOLEV1, 5, 6))

# Group and reshape pyramid data by state
PYRAMID_STATES <- POP_DIS |> 
  group_by(CVE_ENT, AGE5, SEX) |> 
  summarise(
    POPDIS_SEVERE = sum(POPDIS_SEVERE),
    POPDIS_MILDER = sum(POPDIS_MILDER),
    POPNO_DIS = sum(NO_DIS)
  ) |> 
  ungroup() |> 
  pivot_longer(cols = POPDIS_SEVERE:POPNO_DIS,
               names_to = "Type",
               values_to = "Pop") |> 
  left_join(mapa |> st_drop_geometry(), by = "CVE_ENT")

Figure 3. State-Level Pyramids (Absolute)

Code

gridmx <- "mx_state_grid3"
PYRAMID_STATES |> 
    mutate(Pop = case_when(SEX == 1 ~ Pop*-1,
                           T~Pop)) |> 
    mutate(Type = case_when(
        Type == "POPDIS_SEVERE" ~ "Severe",
        Type == "POPDIS_MILDER" ~ "Milder",
        Type == "POPNO_DIS" ~ "None")) |> 
    mutate(Type = factor(Type, 
                            levels = c(
                                "None",
                                "Milder",
                                "Severe"))) |> 
    mutate(CVE_ENT = as.numeric(CVE_ENT)) |> 
    ggplot()+
    aes(x = AGE5, y = Pop/1000000)+
    geom_bar(aes(group = Type, fill = Type),
             stat = "identity")+
    coord_flip()+
    scale_fill_viridis_d()+
    scale_y_continuous(labels = abs)+
    facet_geo(~CVE_ENT, scales = "free_x",
              grid = gridmx, label = "name_iso")+
    theme_minimal()+
    labs(x = NULL, y = "Population in millions")+
    theme(legend.position = c(.8,.8))

Figure 4. State-Level Pyramids (Relative Shares)

Code

PYRAMID_STATES |> 
    mutate(Pop = case_when(SEX == 1 ~ Pop*-1,
                           T~Pop)) |> 
    mutate(Type = case_when(
        Type == "POPDIS_SEVERE" ~ "Severe",
        Type == "POPDIS_MILDER" ~ "Milder",
        Type == "POPNO_DIS" ~ "None")) |> 
    mutate(Type = factor(Type, 
                            levels = c(
                                "None",
                                "Milder",
                                "Severe"))) |> 
    mutate(CVE_ENT = as.numeric(CVE_ENT)) |> 
    ggplot()+
    aes(x = AGE5, y = Pop/1000000)+
    geom_bar(aes(group = Type, fill = Type),
             stat = "identity",
             position = "fill")+
    coord_flip()+
    scale_fill_viridis_d()+
    scale_y_continuous(labels = abs)+
    facet_geo(~CVE_ENT, scales = "free_x", grid = gridmx, label = "name_iso")+
    theme_minimal()+
    labs(x = NULL, y = "Proportion")+
    theme(legend.position = c(.8,.8))

Map 1. Prevalence of Severe Disability by State

This map shows the proportion of people with severe disability in each state, based on weighted counts.

PROP_STATES_DIS <- POP_DIS |> 
    group_by(CVE_ENT) |> 
    summarise(POPDIS_SEVERE = sum(POPDIS_SEVERE),
              POPDIS_MILDER = sum(POPDIS_MILDER),
              POPNO_DIS = sum(NO_DIS)) |> 
    ungroup() |> 
    mutate(PROPDIS = POPDIS_SEVERE/(POPDIS_SEVERE+POPDIS_MILDER+POPNO_DIS)*100) |> 
    left_join(mapa,
              by = "CVE_ENT") |> 
    st_as_sf()

Code

PROP_STATES_DIS |> 
  ggplot() +
  geom_sf(aes(fill = PROPDIS)) +
  scale_fill_viridis_c(
    name = "Severe Disability (%)",
    option = "D") +
  theme_void() +
  theme(
    legend.position = "bottom",
    legend.key.width = unit(1.5, "cm"),  
    legend.key.height = unit(0.5, "cm"))

Figure 5. Scatterplot: Aging vs Disability

We now explore the relationship between disability prevalence and population aging at the state level.

Code

PROP_STATES_65 <- POP_DIS |> 
    mutate(AGE65 = case_when(AGE5 %in% c(
        "65-69", "70-74", "75-79", "80-84", 
        "85-89", "90-94", "95-99", "100+"
        ) ~ "more65",
        T ~ "less65")) |> 
    group_by(CVE_ENT, AGE65) |> 
    summarise(POP = sum(
        POPDIS_SEVERE+POPDIS_MILDER+NO_DIS)) |> 
    ungroup() |> 
    pivot_wider(names_from = AGE65,
                values_from = POP) |> 
    mutate(PROP65 = more65/(less65+more65)*100)

PROP_STATES <- PROP_STATES_DIS |> 
    left_join(PROP_STATES_65,
              by = "CVE_ENT") |> 
    mutate(POP_TOTAL = less65+more65) |> 
  mutate(
    tooltip = paste0(
      "<b>", NOMGEO, "</b><br>",
      "Severe Disability: ", round(PROPDIS, 1),
      "%<br>",
      "Population 65+: ", round(PROP65, 1), "%<br>",
      "Total Population: ", comma(POP_TOTAL))
  )

scatterplot <- ggplot(PROP_STATES, aes(
    x = PROPDIS,
    y = PROP65,
    size = POP_TOTAL,
    color = NOMGEO,
    text = tooltip
  )) +
  geom_point(alpha = 0.7) +
    scale_x_continuous(breaks = seq(0,7,1),
                       limits = c(0,7))+
    scale_y_continuous(breaks = seq(0,13,1),
                       limits = c(0,13))+
  theme_minimal() +
  labs(
    x = "Severe Disability Prevalence (%)",
    y = "Population Aged 65 and Over (%)"
  ) +
  theme(legend.position = "none")

# Convert to interactive plot
plotly::ggplotly(scatterplot, tooltip = "text")

Integrating Extreme Heat Exposure

We now incorporate the number of extreme heat days per state to explore their relationship with disability prevalence in the older population, through bivariate choropleth map.

Loading Heat Exposure Data

State-level data on extreme heat exposure (i.e., number of days exceeding the Universal Thermal Climate Index threshold) is incorporated in this section.The computation of extreme heat days was performed following the steps described in a separate Code & Data entry: Exploring Severe Heat Days using the Universal Thermal Comfort Index.

load("in/UTCI_states_2020.RData")

🔗 Note: The full methodology for calculating UTCI-based thresholds and spatial aggregation is detailed in the linked entry. This allows full reproducibility from raster data to state-level indicators.

Map 2. Bivariate Map: Disability and Extreme Heat

# Merge demographic and climate data
POP_UTCI_bivar <- PROP_STATES |> 
  left_join(UTCI_states, by = "CVE_ENT") |> 
  st_as_sf()

# Classify values into bivariate groups
POP_UTCI_bivar <- bi_class(
  POP_UTCI_bivar,
  x = Absoluto,       # Number of extreme heat days
  y = PROPDIS,        # Prevalence of severe disability
  style = "quantile", # Can be changed to "fisher", "equal", etc.
  dim = 3
)

Generate custom bivariate legend

legend <- bi_legend(
  pal = "GrPink",
  dim = 3,
  xlab = "Extreme Heat Days",
  ylab = "Disability (%)",
  size = 8)

Plot bivariate choropleth map

Code

POP_UTCI_bivar |> 
  ggplot() +
  geom_sf(aes(fill = bi_class), color = NA) +
  bi_scale_fill(pal = "GrPink", dim = 3) +
  theme_void() +
  theme(legend.position = "none") +
  # Position legend in the Gulf of Mexico (empty space)
  annotation_custom(
    ggplotGrob(legend),
    xmin = -95, xmax = -86,  # Longitude range for Gulf
    ymin = 24, ymax = 32     # Latitude range for Gulf
  )

Final Remarks

This Code & Data entry explores the demographic composition of people with disabilities in Mexico, using harmonized indicators based on the Washington Group Short Set. By combining census microdata with spatial and age-disaggregated analysis, it highlights how severe disability accumulates in older age groups and varies across states.

Building on this foundation, we linked state-level disability prevalence with exposure to extreme heat, pointing to areas of compounded vulnerability. This entry is part of the ongoing research by the CACHE initiative and seeks to support reproducible and collaborative research at the intersection of aging, disability, and climate risk.