if (!require("pacman")) install.packages("pacman")
::p_load(
pacman#IPUMS
ipumsr,#Manipulation & analysis
dplyr, tidyr, srvyr, gtsummary, labelled, purr,# Data Viz
ggplot2, patchwork, geofacet, plotly, scales,# For Spatial Data
sf, biscale, ggtern)
Aging and disability in the Mexican Population
Author: Marcial Yangali (marcialy@colmex.mx) and Mariana Ramos
Reviewers: Emerson Baptista and Landy Sánchez
Overview
This Code & Data is part of the CACHE research initiative on Aging, Climate, and Health, specifically the demonstration project on Heat, Disability in Older Adults, and Care. Older adults are particularly vulnerable to heat due to age-related physiological changes and chronic health conditions. Those challenges are even greater for those experiencing a disability, because of compromise health, reduced everyday functionally and difficulties to received help during extreme weather events.
This entry shows how to construct disability measures using the international recommentation of the Washington Group. We detail the data wrangling, analysis, and visualization process in R used to construct a disability measure that accounts for multiple disabilities and severity degrees. We show estimations for Mexico using census population data. Similar procedures can be used to analyze disability for any other source who meet the Washington group criterion.
Required R Packages 🛠️
We begin by loading the necessary R packages. These support data manipulation (mainly dplyr
but also stringr
, ipumsr
,
tidyr
, srvyr
, gtsummary
, purr
, and labelled
), visualization (ggplot2
, patchwork
, geofacet
, scales
and plotly
), and geospatial operations (sf
, biscale
, ggtern
).
Data
Mexican Population Census 2020 (Sample from IPUMS)
We use the 2020 Mexican Population Census. In 2020, Mexico adopted the Washington Group short set questions (WGSS) to measure disability in the Population and Housing Census. Those questions identify functional limitations in six domains: seeing, hearing, walking, remembering, self-care, and communication (see below for a detailed discussion). Our data comes from IPUMS International. IPUMS provides the option to select variables based on the WGSS, which are labeled with the prefix ‘WG’ and are part of the harmonized variable set.
# Read the IPUMS metadata file (DDI = Data Documentation Initiative)
<- read_ipums_ddi("in/IPUMS_MEXICO/ipumsi_00006.xml")
ddi # Load microdata using the metadata
<- read_ipums_micro(ddi) data
Estimate the effective sample size as a percentage of the total population (%):
nrow(data)/sum(data$PERWT)*100
[1] 5.982092
Measures of disability: the Washington Group Categories
Washington Group Short Set of Questions on Disability (WGSS)
The Washington Group on Disability Statistics developed the WG Short Set (WGSS) of six questions for use in national censuses and surveys to assess disability condition. These questions reflect advancements in the conceptualization and understanding of disability and are based on the World Health Organization’s International Classification of Functioning, Disability, and Health (ICF).
This approach breaks from previous ones that medicalized disabilities, viewing them solely as a condition defined by limitations in bodily functions of the individual. The ICF introduces a bio-psychosocial model where disability is the result of the interaction between a person’s capabilities (limitations in functioning) and environmental barriers (physical, social, cultural, etc.) that may hinder their participation in society.
In the 2020 Mexican Housing and Population Census, this approach was implemented for measuring disability seeing, hearing, walking, remembering, self-care, and communication. Using questions such as:
In your daily life, how much difficulty does (NAME) have to:
see, even wearing glasses?
hear, even using a hearing aid?
walk, go up or down?
remember or concentrate?
bathing, dressing, or eating?
speak or communicate?
And the answers are:
He/She does it with no difficulty
He/She does it with some difficulty
He/She does it with a lot of difficulty
He/She cannot do it at all
Response can be analyzed by severity of the difficulty, and there is an extra questions regarding mental problems.
Converting WG variables into categorical labels
To work with readable response categories (rather than numeric codes), we first convert WG variables into labeled factors.
# Convert the six WG variables into factor variables to display category labels
<- data |>
data mutate(WGCARE = as_factor(WGCARE),
WGCOGN = as_factor(WGCOGN),
WGCOMM = as_factor(WGCOMM),
WGHEAR = as_factor(WGHEAR),
WGMOBIL = as_factor(WGMOBIL),
WGVISION = as_factor(WGVISION))
# Check frequency distribution for one variable as an example
table(data$WGCARE)
No difficulty Some difficulty A lot of difficulty
7356695 68476 41216
Cannot do at all Unknown NIU (not in universe)
33636 5194 0
Code
# names(GEOLEV1)
<- data |>
pop_mun mutate(AGE65 = case_when(AGE >= 65 ~ "65+",
~ "65-")) |>
T group_by(GEOLEV2, AGE65) |>
summarise(Count = sum(PERWT), .groups = "drop")
save(pop_mun, file = "pop_mun_IPUMS.RDATA")
sum(pop_mun$Count)
[1] 125461406
Separating the population by disability status
We divide the dataset into two groups:
People without disabilities (all responses are “No difficulty”)
People with at least one functional limitation
# People with no disability at all (all answers are "No difficulty")
<- data |>
popnodis filter(if_all(c(WGCARE, WGCOGN, WGCOMM, WGHEAR, WGMOBIL, WGVISION),
~ . == "No difficulty"))
# People with at least one functional limitation
<- data |>
popdis filter(if_any(c(WGCARE, WGCOGN, WGCOMM, WGHEAR, WGMOBIL, WGVISION),
~ . != "No difficulty"))
Estimate the proportion of people with disabilities (unweighted and weighted):
# Unweighted percentage
nrow(popdis) / nrow(data) * 100
[1] 16.5003
# Weighted percentage using census sample weights (PERWT)
sum(popdis$PERWT) / sum(data$PERWT) * 100
[1] 15.71811
Absolute and Relative Frequencies of WG Categories
We now collapse responses into three groups by the severity of the difficulty:
“No difficulty”
“Milder”: corresponds to “Some difficulty”
“Severe”: includes both “A lot of difficulty” and “Cannot do at all”
# Save original variable labels
<- var_label(popdis)
labels
# Collapse response categories
<- popdis |>
popdis mutate(across(c(WGCARE, WGCOGN, WGCOMM, WGHEAR, WGMOBIL, WGVISION),
~ case_when(
== "Some difficulty" ~ "Milder",
. %in% c("A lot of difficulty", "Cannot do at all") ~ "Severe",
. TRUE ~ as.character(.))))
# Restore original labels
var_label(popdis) <- labels
Now, we compute relative frequencies using survey weights:
# Create survey design object with weights
<- popdis |>
popdis_svy as_survey_design(weights = PERWT)
# Generate a weighted summary table of WG domains
|>
popdis_svy select(WGCARE, WGCOGN, WGCOMM, WGHEAR, WGMOBIL, WGVISION) |>
tbl_svysummary(
percent = "column",
missing = "ifany") |>
as_gt()
Characteristic | N = 19,720,1601 |
---|---|
Difficulty with self-care (Washington group) | |
Milder | 1,049,606 (5.3%) |
No difficulty | 17,365,336 (88%) |
Severe | 1,133,094 (5.7%) |
Unknown | 172,124 (0.9%) |
Difficulty remembering or concentrating (Washington group) | |
Milder | 3,304,022 (17%) |
No difficulty | 15,150,712 (77%) |
Severe | 1,088,698 (5.5%) |
Unknown | 176,728 (0.9%) |
Difficulty communicating (Washington group) | |
Milder | 974,950 (4.9%) |
No difficulty | 17,683,708 (90%) |
Severe | 887,600 (4.5%) |
Unknown | 173,902 (0.9%) |
Difficulty hearing (Washington group) | |
Milder | 3,499,498 (18%) |
No difficulty | 14,776,140 (75%) |
Severe | 1,271,674 (6.4%) |
Unknown | 172,848 (0.9%) |
Difficulty walking or climbing stairs (Washington group) | |
Milder | 5,387,834 (27%) |
No difficulty | 11,240,080 (57%) |
Severe | 2,921,346 (15%) |
Unknown | 170,900 (0.9%) |
Difficulty seeing (Washington group) | |
Milder | 9,181,344 (47%) |
No difficulty | 7,801,176 (40%) |
Severe | 2,568,836 (13%) |
Unknown | 168,804 (0.9%) |
1 n (%) |
Counting Multiple Disabilities
Since one person can experience multiple disabilities, we compute the number of severe and milder limitations per person:
# Count how many domains are classified as severe/milder for each individual
<- popdis %>%
popdis mutate(
num_disabilities_severe = rowSums(select(., starts_with("WG")) == "Severe"),
num_disabilities_milder = rowSums(select(., starts_with("WG")) == "Milder"))
We then redefine milder disability cases to exclude individuals who also have at least one severe limitation:
<- popdis |>
popdis mutate(num_disabilities_milder = case_when(
>= 1 ~ 0,
num_disabilities_severe ~ num_disabilities_milder)) T
Reshaping the Data by Disability Type, Age, and Sex
We aggregate the population by:
Disability type (WG domain, multiple, milder, none)
Age group (5-year intervals)
Sex
State (GEOLEV1)
# List of WG disability variables
<- names(popdis)[grepl("^WG", names(popdis))]
disabilities
# Define 5-year age groups
<- c(seq(0, 100, by = 5), Inf)
age_breaks <- c(paste(seq(0, 95, by = 5), seq(4, 99, by = 5), sep = "-"), "100+")
age_labels
# Create data for individuals with only one severe disability
<- purrr::map_dfr(disabilities, function(dis) {
df_one_severe |>
popdis filter(num_disabilities_severe == 1,
!!sym(dis) == "Severe") |>
mutate(
AGE5 = cut(AGE, breaks = age_breaks, labels = age_labels, right = FALSE),
Disability_Type = dis,
Disability_Level = "Severe"
|>
) group_by(GEOLEV1, SEX, Disability_Type, AGE5) |>
summarise(Count = sum(PERWT), .groups = "drop")
})
# Individuals with more than one severe disability
<- popdis |>
df_mult_severe filter(num_disabilities_severe > 1) |>
mutate(
AGE5 = cut(AGE, breaks = age_breaks, labels = age_labels, right = FALSE),
Disability_Type = "DIS_MULT"
|>
) group_by(GEOLEV1, SEX, Disability_Type, AGE5) |>
summarise(Count = sum(PERWT), .groups = "drop")
# Individuals with only milder disabilities
<- popdis |>
df_milder filter(num_disabilities_milder >= 1) |>
mutate(
AGE5 = cut(AGE, breaks = age_breaks, labels = age_labels, right = FALSE),
Disability_Type = "DIS_MILDER"
|>
) group_by(GEOLEV1, SEX, Disability_Type, AGE5) |>
summarise(Count = sum(PERWT), .groups = "drop")
# Individuals with no disability
<- popnodis |>
df_nodis mutate(
AGE5 = cut(AGE, breaks = age_breaks, labels = age_labels, right = FALSE),
Disability_Type = "NO_DIS"
|>
) group_by(GEOLEV1, SEX, Disability_Type, AGE5) |>
summarise(Count = sum(PERWT), .groups = "drop")
We combine all the groups into a single dataframe:
<- bind_rows(df_one_severe, df_mult_severe, df_milder, df_nodis) df_filtered
Finally, we pivot and organize the final dataset:
# Wide format for easier comparison
<- df_filtered |>
df_filtered pivot_wider(names_from = Disability_Type, values_from = Count) |>
mutate(across(everything(), ~ replace_na(.x, 0)))
# Final dataset with totals for severe, milder, and no disability
<- df_filtered |>
POP_DIS mutate(
POPDIS_SEVERE = WGCARE + WGCOGN + WGCOMM + WGHEAR + WGMOBIL + WGVISION + DIS_MULT
|>
) select(GEOLEV1, SEX, AGE5, POPDIS_SEVERE,
POPDIS_MILDER = DIS_MILDER,
NO_DIS)
Population Pyramids by Disability Status
Now, we visualize the age and sex structure of the population by disability status using population pyramids. A Population Pyramid is a visual depiction of the age and gender distribution within a population. It is fundamentally a bar chart in which each bar corresponds to a particular age group, usually displaying males on the left and females on the right. The height of each bar reflects the quantity or percentage of individuals belonging to that specific age and gender category.
Preparing the Data
# Summarize total population by age group, sex, and disability status
<- POP_DIS |>
PYRAMID group_by(AGE5, SEX) |>
summarise(
POPDIS_SEVERE = sum(POPDIS_SEVERE),
POPDIS_MILDER = sum(POPDIS_MILDER),
POPNO_DIS = sum(NO_DIS)
|>
) ungroup() |>
# Reshape to long format for plotting
pivot_longer(cols = POPDIS_SEVERE:POPNO_DIS,
names_to = "Type",
values_to = "Pop")
Figure 1. National Population Pyramid (Absolute Numbers)
This plot displays the age-sex structure of the total population, broken down by disability type. It includes non-disabled individuals to contextualize the composition.
Code
|>
PYRAMID mutate(Pop = case_when(SEX == 1 ~ Pop*-1,
~Pop)) |>
Tmutate(Type = case_when(
== "POPDIS_SEVERE" ~ "Severe",
Type == "POPDIS_MILDER" ~ "Milder",
Type == "POPNO_DIS" ~ "None")) |>
Type mutate(Type = factor(Type,
levels = c(
"None",
"Milder",
"Severe"))) |>
ggplot()+
aes(x = AGE5, y = Pop/1000000)+
geom_bar(aes(group = Type, fill = Type),
stat = "identity")+
coord_flip()+
scale_fill_viridis_d()+
scale_y_continuous(labels = abs)+
theme_minimal()+
labs(x = NULL, y = "Population in millions",
fill = "Disability status")
State-Level Analysis
We repeat the same approach at the state level using facet_geo()
from geofacet.
Loading and Joining Spatial Data
We use official geographic boundaries at the state level from the 2020 Census. We are using the shapefile 00ent
from the downloaded folder: mg_2020_integrado/conjunto_de_datos
.
# Load shapefile with state geometries
<- read_sf("in/GEO/00ent.shp") |>
mapa st_transform(st_crs(4326))
# Add state code variable
<- POP_DIS |>
POP_DIS mutate(CVE_ENT = substr(GEOLEV1, 5, 6))
# Group and reshape pyramid data by state
<- POP_DIS |>
PYRAMID_STATES group_by(CVE_ENT, AGE5, SEX) |>
summarise(
POPDIS_SEVERE = sum(POPDIS_SEVERE),
POPDIS_MILDER = sum(POPDIS_MILDER),
POPNO_DIS = sum(NO_DIS)
|>
) ungroup() |>
pivot_longer(cols = POPDIS_SEVERE:POPNO_DIS,
names_to = "Type",
values_to = "Pop") |>
left_join(mapa |> st_drop_geometry(), by = "CVE_ENT")
Figure 3. State-Level Pyramids (Absolute)
Code
<- "mx_state_grid3"
gridmx |>
PYRAMID_STATES mutate(Pop = case_when(SEX == 1 ~ Pop*-1,
~Pop)) |>
Tmutate(Type = case_when(
== "POPDIS_SEVERE" ~ "Severe",
Type == "POPDIS_MILDER" ~ "Milder",
Type == "POPNO_DIS" ~ "None")) |>
Type mutate(Type = factor(Type,
levels = c(
"None",
"Milder",
"Severe"))) |>
mutate(CVE_ENT = as.numeric(CVE_ENT)) |>
ggplot()+
aes(x = AGE5, y = Pop/1000000)+
geom_bar(aes(group = Type, fill = Type),
stat = "identity")+
coord_flip()+
scale_fill_viridis_d()+
scale_y_continuous(labels = abs)+
facet_geo(~CVE_ENT, scales = "free_x",
grid = gridmx, label = "name_iso")+
theme_minimal()+
labs(x = NULL, y = "Population in millions")+
theme(legend.position = c(.8,.8))
Figure 5. Scatterplot: Aging vs Disability
We now explore the relationship between disability prevalence and population aging at the state level.
Code
<- POP_DIS |>
PROP_STATES_65 mutate(AGE65 = case_when(AGE5 %in% c(
"65-69", "70-74", "75-79", "80-84",
"85-89", "90-94", "95-99", "100+"
~ "more65",
) ~ "less65")) |>
T group_by(CVE_ENT, AGE65) |>
summarise(POP = sum(
+POPDIS_MILDER+NO_DIS)) |>
POPDIS_SEVEREungroup() |>
pivot_wider(names_from = AGE65,
values_from = POP) |>
mutate(PROP65 = more65/(less65+more65)*100)
<- PROP_STATES_DIS |>
PROP_STATES left_join(PROP_STATES_65,
by = "CVE_ENT") |>
mutate(POP_TOTAL = less65+more65) |>
mutate(
tooltip = paste0(
"<b>", NOMGEO, "</b><br>",
"Severe Disability: ", round(PROPDIS, 1),
"%<br>",
"Population 65+: ", round(PROP65, 1), "%<br>",
"Total Population: ", comma(POP_TOTAL))
)
<- ggplot(PROP_STATES, aes(
scatterplot x = PROPDIS,
y = PROP65,
size = POP_TOTAL,
color = NOMGEO,
text = tooltip
+
)) geom_point(alpha = 0.7) +
scale_x_continuous(breaks = seq(0,7,1),
limits = c(0,7))+
scale_y_continuous(breaks = seq(0,13,1),
limits = c(0,13))+
theme_minimal() +
labs(
x = "Severe Disability Prevalence (%)",
y = "Population Aged 65 and Over (%)"
+
) theme(legend.position = "none")
# Convert to interactive plot
::ggplotly(scatterplot, tooltip = "text") plotly
Integrating Extreme Heat Exposure
We now incorporate the number of extreme heat days per state to explore their relationship with disability prevalence in the older population, through bivariate choropleth map.
Loading Heat Exposure Data
State-level data on extreme heat exposure (i.e., number of days exceeding the Universal Thermal Climate Index threshold) is incorporated in this section.The computation of extreme heat days was performed following the steps described in a separate Code & Data entry: Exploring Severe Heat Days using the Universal Thermal Comfort Index.
load("in/UTCI_states_2020.RData")
🔗 Note: The full methodology for calculating UTCI-based thresholds and spatial aggregation is detailed in the linked entry. This allows full reproducibility from raster data to state-level indicators.
Map 2. Bivariate Map: Disability and Extreme Heat
# Merge demographic and climate data
<- PROP_STATES |>
POP_UTCI_bivar left_join(UTCI_states, by = "CVE_ENT") |>
st_as_sf()
# Classify values into bivariate groups
<- bi_class(
POP_UTCI_bivar
POP_UTCI_bivar,x = Absoluto, # Number of extreme heat days
y = PROPDIS, # Prevalence of severe disability
style = "quantile", # Can be changed to "fisher", "equal", etc.
dim = 3
)
Generate custom bivariate legend
<- bi_legend(
legend pal = "GrPink",
dim = 3,
xlab = "Extreme Heat Days",
ylab = "Disability (%)",
size = 8)
Plot bivariate choropleth map
Code
|>
POP_UTCI_bivar ggplot() +
geom_sf(aes(fill = bi_class), color = NA) +
bi_scale_fill(pal = "GrPink", dim = 3) +
theme_void() +
theme(legend.position = "none") +
# Position legend in the Gulf of Mexico (empty space)
annotation_custom(
ggplotGrob(legend),
xmin = -95, xmax = -86, # Longitude range for Gulf
ymin = 24, ymax = 32 # Latitude range for Gulf
)
Final Remarks
This Code & Data entry explores the demographic composition of people with disabilities in Mexico, using harmonized indicators based on the Washington Group Short Set. By combining census microdata with spatial and age-disaggregated analysis, it highlights how severe disability accumulates in older age groups and varies across states.
Building on this foundation, we linked state-level disability prevalence with exposure to extreme heat, pointing to areas of compounded vulnerability. This entry is part of the ongoing research by the CACHE initiative and seeks to support reproducible and collaborative research at the intersection of aging, disability, and climate risk.