Urban Planning & Data Science: My Work at IMEPLAN

In this blog, I will present the projects I developed while I was at IMEPLAN, an organization focused on urban and regional planning in the Guadalajara Metropolitan Area (GMA). IMEPLAN plays a crucial role in shaping the future of the GMA by guiding development policies, infrastructure planning, and resource management to improve the quality of life for its citizens.

While working at IMEPLAN, I had the opportunity to contribute to various data-driven projects focused on enhancing urban planning efficiency and decision-making. My work primarily involved automating spatial analysis methodologies, optimizing resource allocation, and reducing costs through the use of open-source tools. In this blog, I will walk you through some key projects, such as school proximity analysis and supply-demand assessments, which helped inform critical decisions about infrastructure and public services. These projects highlight the potential of data science and geospatial analysis in shaping the future of urban environments.

1. Huff Probabilistic Catchment Areas in Schools

For demonstrative purposes, instead of analyzing all the schools, I will use a subset of them to illustrate how the code works and explain the methodology behind it. This first project will focus on computing the Huff model, creating a raster to represent the probabilistic catchment areas, and visualizing the results. It’s a stepping stone towards more comprehensive analyses that will include broader datasets and additional public services.

The Huff model, a probabilistic spatial interaction model, offers a powerful tool for this type of analysis. By estimating the likelihood that a student will attend a specific school based on distance and school capacity, the Huff model can help urban planners and policymakers make informed decisions on school placements, expansions, and resource allocation. This blog will explore the application of Huff catchment areas to public schools in Jalisco, offering insights into how this model can aid in addressing educational inequalities and improving access to schooling in the region.

library(sf)
library(dplyr)
library(SpatialPosition)

schools <- st_read("media/example_points.shp")

convex_hull_amg <- st_convex_hull(schools)
convex_hull_buffered <- st_buffer(schools, dist = 500)

matdist <- CreateDistMatrix(schools, mygrid, longlat = TRUE)

\[ \text{Interaction} = \exp\left(-\alpha \cdot \text{mDistance}^{\beta}\right)\] Modeling spatial interactions means quantifying the distance friction or impedance. The role of the distance can be interpreted as a disincentive to access desired destinations or opportunities (e.g. jobs, shops). At the very place of the opportunity, the interaction function equals 1, meaning that the potential access is 100%. Far away from the opportunity, the interaction function tends to 0, meaning that the potential access is 0 %. The span is defined as the value where the interaction function falls to 0.5 (50%). From the individuals’ point of vue, this function may be seen as a degree of availability of a given opportunity. From the opportunity’s point of vue (a store for example), the interaction function may be seen as a decreasing catchment area: there is a maximal attraction close to the opportunity and this attraction decreases progressively through distance.

One way to visualize and analyze the accessibility and influence of each hospital is by drawing probabilistic catchment areas. These areas represent the likelihood that a patient will choose one hospital over another based on proximity and school capacity.

The Huff model, commonly used in geomarketing to identify regions of competition between shopping malls, can be adapted here to healthcare. It helps identify regions where schools compete for students, enabling planners to make strategic decisions on schools expansions, resource allocation, or even areas where new facilities may be needed. The fuzzy boundaries between catchment areas illustrate where patients are likely to face choices between two or more hospitals.

To create these probabilistic catchment areas, we follow three main steps:

Compute the Huff model values for each school , based on its attractiveness and distance from population centers.
Create a raster using the rasterHuff function, which transforms these probabilities into a grid format that covers the entire region.
Plot the raster using plotHuff, visualizing the catchment areas and the regions where hospitals compete for patients.

catchHuff <- huff(
  knownpts = puntos,
  unknownpts = mygrid,
  matdist = matdist,
  varname = "capacdd",  
  typefct = "exponential",
  span = 450, 
  beta = 2, 
  returnclass = "sf",
  longlat = TRUE  
)

rasterCatch <- rasterHuff(x = catchHuff)

plotHuff(x = rasterCatch)
plot(st_geometry(primarias), pch = 20, col = "red", add = TRUE)
mtext("Probabilistic Catchment Areas \nof Public Schools", 
      side = 3,cex = 1.5, line=0.5)

mtext(text = "distance function: exponential, span = 0.45 km, beta = 2",
      side = 1, line = 0.5)

2. Predict School Enrollment Probabilities

An alternative implementation of the Huff Model¹ allows us to predict the probability of a customer choosing a specific good or service based on the distance between the customer and the firm.

We can extrapolate these concepts to predict school enrollment probabilities, where the “customers” are children and the “firms” are schools. This approach is particularly useful in urban planning because it helps planners understand how the spatial distribution of schools affects enrollment patterns. By predicting which schools children are more likely to attend based on proximity, urban planners can optimize the allocation of educational resources, identify areas underserved by schools, and plan future school locations to improve accessibility and balance capacity with demand.

The model is constituded by the following expression:

\[ P_{ij} = \frac{\frac{S_j}{\lambda T_{ij}}}{\sum_{j=1}^{n} \frac{S_j}{\lambda T_{ij}}} \]

where $P_{ij}$ = the probability of a consumer at a given point of origin $i$ traveling to a particular shopping center $j$;

$S_j$ = the size of a shopping center $j$ (measured in terms of the square footage of selling area devoted to the sale of a particular class of goods);

$T_{ij}$ = the travel time involved in getting from a consumer’s travel base $i$ to a given shopping center $j$; and

$\lambda$ = a parameter which is to be estimated empirically to reflect the effect of travel time on various kinds of shopping trips.

The expected number of consumers at a given place of origin $i$ that shop at a particular shopping center $j$ is equal to the number of consumers at $i$ multiplied by the probability that a consumer at $i$ will select $j$ for shopping. That is,

\[ E_{ij} = P_{ij} \cdot C_i \]

Where $E_{ij}$ = the expected number of consumers at $i$ that are likely to travel to shopping center $j$; and

$C_i$ = the number of consumers at $i$.

I am currently working in a python implementation.

3. Schools Proximity Analysis

In this project, I automated a methodology developed by the Urban Planning Department of IMEPLAN, which focuses on analyzing school proximity and accessibility. By using free and open-source tools to generate isochrones (polygons representing areas reachable within a specified distance or time), I saved the department $2,500 annually in software licensing fees and reduced analysis time by 30%.

library(sf)
library(dplyr)
library(leaflet)

library(mapboxapi)
mb_access_token("pk.eyJ1IjoibWFub2xvbWFwcyIsImEiOiJjbHozNWpwamcxZjh0MmpwcW43OGdwN3I0In0.ELXy2MolTMyHKdVX8SWJgA", install = TRUE, overwrite=TRUE)

token <- Sys.getenv("MAPBOX_PUBLIC_TOKEN")

library(BAMMtools)

Read data

primarias <- st_read('data/primarias.shp')
primarias <- st_transform(primarias, st_crs('EPSG: 4326'))
primarias <- primarias %>% rename(id = Id_prmr) # Renombrar el id

manzanas <- st_read('data/manzanas.shp')
manzanas <- st_transform(manzanas, st_crs('EPSG: 4326'))

Overlapping Isochrones Polygons

The first part of the project involved generating overlapping isochrone polygons at distances of 500 and 1100 meters, which were the suggested walkable distances for school accessibility.

resultados <- list()

for (i in 1:nrow(primarias)) {
  
  primaria <- primarias[i, ]
  
  # Calcular las isócronas de 500 y 1100 metros
  isocrona_500 <- mb_isochrone(primaria,
                               profile = "walking",
                               distance = 500, # mts
                               geometry = "polygon")
  
  isocrona_1100 <- mb_isochrone(primaria,
                                profile = "walking",
                                distance = 1100, # mts
                                geometry = "polygon")
  
  # Agregar el id de la primaria a las isócronas
  isocrona_500$id <- primaria$id
  isocrona_1100$id <- primaria$id
  
  # Combinar las dos isócronas en un solo sf object (Overlapping)
  poligonos_sf <- rbind(isocrona_500, isocrona_1100)
  
  # Almacenar los poligonos en la lista de resultados
  resultados[[i]] <- poligonos_sf
}

# Combinar todos los poligonos de la lista en un solo sf object final
sf_final <- do.call(rbind, resultados)

m <- leaflet(sf_final) %>%
  addProviderTiles(providers$MapBox, options = providerTileOptions(
    id = "mapbox/streets-v11",
    accessToken = Sys.getenv("MAPBOX_PUBLIC_TOKEN")
  )) %>%
  addPolygons(color = "#3388ff", weight = 1, opacity = 1, fillOpacity = 0.2) 

# Añadir la leyenda
m <- addLegend(m, position = "bottomright", 
               colors = c("#3388ff"),
               labels = c("Isocronas Overlapped"),
               title = "Overlapping isochrones",
               opacity = 1)

# Mostrar el mapa
m

Proximity

The second part was a proximity analysis. Here, I intersected city blocks with the isochrones and classified each block based on the degree of intersection. The classification criteria were as follows:

Outside Coverage
Partially Covered)
Fully Covered

manzanas$C_clase <- 0
intersecciones_1100 <- st_intersects(manzanas, isocronas[isocronas$distance == 1100, ])
manzanas$C_clase[lengths(intersecciones_1100) > 0] <- 0.5
intersecciones_500 <- st_intersects(manzanas, isocronas[isocronas$distance == 500, ])
manzanas$C_clase[lengths(intersecciones_500) > 0] <- 1

manzanas$C_pond <- 0
manzanas <- manzanas %>%
  mutate(C_pond = case_when(
    C_clase == 0 ~ "Fuera de Cobertura",
    C_clase == 0.5 ~ "Mediananmente Cubierto",
    C_clase == 1 ~ "Cubierto",
    #TRUE ~ NA_character_  # En caso de que haya valores fuera del rango esperado
  ))

Frecuencies

# Se calcula el número de intersecciones
intersecciones <- st_intersects(manzanas, isocronas)
conteo_intersecciones <- lengths(intersecciones)
manzanas$conteo_intersecciones <- conteo_intersecciones # Se añade al df de manzanas

# Se calculan las rupturas naturales de Jenks con 4 categorías
jenks_breaks <- getJenksBreaks(manzanas$conteo_intersecciones, k = 4)

# Agregar las clasificaciones al df de manzanas
manzanas$categoria_jenks <- cut(manzanas$conteo_intersecciones, 
                                breaks = jenks_breaks, 
                                include.lowest = TRUE, 
                                labels = FALSE)
manzanas <- manzanas %>%
  mutate(F_pond = case_when(
    categoria_jenks == 1 ~ "Poco Frecuente",
    categoria_jenks == 2 ~ "Frecuente",
    categoria_jenks == 3 ~ "Muy Frecuente",
    #TRUE ~ NA_character_  # En caso de que haya valores fuera del rango esperado
  ))

manzanas <- manzanas %>%
  mutate(F_clase = case_when(
    categoria_jenks == 1 ~ 0.5,
    categoria_jenks == 2 ~ 0.75,
    categoria_jenks == 3 ~ 1,
    #TRUE ~ NA_character_  # En caso de que haya valores fuera del rango esperado
  ))

Finally, I calculated a proximity index for each block:

# Índice de próximidad
manzanas$I_prox <- manzanas$C_clase * 0.7 + manzanas$F_clase * 0.3

This proximity index combines the coverage classification and additional factors ´F_clase´ to provide a more comprehensive measure of accessibility, which aids in better urban planning decisions regarding school locations and resource allocation.

#Guardar el df de manzanas como shp
#st_write(sf_final, "data/manzanas_proximidad.shp")

4. Supply-Demand Index of Elementary Schools

In this project, I performed a kernel density analysis to assess facility sufficiency, analyzing over 1,200 schools and more than 56,000 blocks. This analysis significantly improved resource allocation for urban planners by identifying areas where schools are either oversupplied or underserved.

The methodology was automated, allowing the process to be easily adapted for other types of facilities, such as healthcare centers or public services. By leveraging open-source tools, this approach enables urban planners to efficiently evaluate facility distribution, ensuring that resources are allocated where they are most needed, enhancing the overall planning process.

library(qgisprocess) # Conect to  QGIS
library(terra)  # Raster manipulation 
library(sf)     # shapefiles manipulation
library(dplyr)  # Data manipulation

qgis_configure()
Sys.setenv("QGIS_ALLOW_OVERWRITE" = "TRUE")

alg_id <- "qgis:heatmapkerneldensityestimation"

Supply (Schools)

# Define input and output paths
equip_layer <- "C:/Users/manue/IMEPLAN/Intraurbanos_Suficiencia/datos/primarias/AMG_Equipamientos_primaria.shp"

output_path <- "C:/Users/manue/IMEPLAN/Intraurbanos_Suficiencia/primarias/resultados/kernel_primarias.tif"

input_layer <- st_read(equip_layer, quiet = TRUE)

# Define parameters
params <- list(
  INPUT = equip_layer,
  RADIUS = 1100,
  RADIUS_FIELD = NULL,
  PIXEL_SIZE = 20,
  WEIGHT_FIELD = "EXISTENTES",
  KERNEL = 0,             # Or use "Quartic (biweight)" if required
  DECAY = 0,
  OUTPUT_VALUE = 0,       # Or use "Density" if required
  OUTPUT = output_path
)

# Correctly pass the algorithm ID and parameters using do.call
result <- tryCatch({
  do.call(qgis_run_algorithm, c(list(algorithm = alg_id), params))
}, error = function(e) {
  message("An error occurred:")
  message(e$message)
  NULL
})

# Check if the result is valid
if (!is.null(result)) {
  message("Algorithm executed successfully.")
} else {
  message("Algorithm execution failed.")

We adjust to the correct output size.

# Leer el raster y el shapefile
raster <- rast("resultados/kernel_primarias.tif")  
shapefile <- st_read("C:/Users/manue/IMEPLAN/Intraurbanos_Suficiencia/datos/Limite_metropolitano/Limite_metropolitano.shp")

# Asegurar que ambos están en el mismo CRS
shapefile <- st_transform(shapefile, crs(raster))

# Obtener la extensión (bounding box) del shapefile
bbox <- ext(shapefile)

# Extender el raster para cubrir toda la extensión del shapefile con valores de NoData
extended_raster <- extend(raster, bbox)

# Establecer el valor de NoData a -9999 (opcional)
NAflag(extended_raster) <- -9999

# Reemplazar todos los valores NoData (-9999) por ceros dentro del raster
extended_raster[extended_raster == -9999] <- 0

# Reemplazar cualquier valor NA que exista en el raster por ceros
extended_raster[is.na(extended_raster)] <- 0

# Guardar el raster resultante
writeRaster(extended_raster, "resultados/kernel_primarias_limite_amg.tif", overwrite = TRUE)

Demand

# Define input and output paths
equip_layer <- "C:/Users/manue/IMEPLAN/Intraurbanos_Suficiencia/datos/primarias/manzanas.shp"
output_path <- "resultados/kernel_manzanas.tif"

input_layer <- st_read(equip_layer, quiet = TRUE)

# Define parameters
params <- list(
  INPUT = equip_layer,
  RADIUS = 1100,
  RADIUS_FIELD = NULL,
  PIXEL_SIZE = 20,
  WEIGHT_FIELD = "demanda",
  KERNEL = 0,             # Or use "Quartic (biweight)" if required
  DECAY = 0,
  OUTPUT_VALUE = 0,       # Or use "Density" if required
  OUTPUT = output_path
)

# Correctly pass the algorithm ID and parameters using do.call
result <- tryCatch({
  do.call(qgis_run_algorithm, c(list(algorithm = alg_id), params))
}, error = function(e) {
  message("An error occurred:")
  message(e$message)
  NULL
})

# Check if the result is valid
if (!is.null(result)) {
  message("Algorithm executed successfully.")
} else {
  message("Algorithm execution failed.")
}

We adjust to the correct output size.

# Leer el raster y el shapefile
raster <- rast("resultados/kernel_manzanas.tif") 
shapefile <- st_read("C:/Users/manue/IMEPLAN/Intraurbanos_Suficiencia/datos/Limite_metropolitano/Limite_metropolitano.shp")

# Asegurar que ambos están en el mismo CRS
shapefile <- st_transform(shapefile, crs(raster))

# Obtener la extensión (bounding box) del shapefile
bbox <- ext(shapefile)

# Extender el raster para cubrir toda la extensión del shapefile con valores de NoData
extended_raster <- extend(raster, bbox)

# Establecer el valor de NoData a -9999 (opcional)
NAflag(extended_raster) <- -9999

# Reemplazar todos los valores NoData (-9999) por ceros dentro del raster
extended_raster[extended_raster == -9999] <- 0

# Reemplazar cualquier valor NA que exista en el raster por ceros
extended_raster[is.na(extended_raster)] <- 0

# Guardar el raster resultante
writeRaster(extended_raster, "resultados/kernel_manzanas_limite_amg.tif", overwrite = TRUE)

Normalization Index

# Cargar los rasters
primarias <- rast("resultados/kernel_primarias_limite_amg.tif")
manzanas <- rast("resultados/kernel_manzanas_limite_amg.tif")

# Definir un raster base (primarias) como referencia para el alineamiento
manzanas_aligned <- resample(manzanas, primarias, method = "bilinear")

# Verificar la alineación
#compareGeom(primarias, manzanas_aligned)  # Debería devolver TRUE

# Realizar la resta
resultado <- primarias - manzanas_aligned

writeRaster(resultado, "resultados/resultado_resta.tif", overwrite = TRUE)

Depending on each subsystem, we apply the normalization formula with different parameters. In the case of elementary schools the formula looks like this:

\[\text{indice_norm} = \left( \frac{2}{1 + e^{-0.20 \cdot \text{result}}} \right) - 1 \]

# Aplicar la normalización al raster
indice_norm <- (2 / (1 + exp(-0.20 * resultado))) - 1

# Guardar el raster normalizado
writeRaster(indice_norm, "resultados/resultado_normalizado.tif", overwrite = TRUE)

The classification you’re using assigns each block (manzana) into one of five categories based on the intersected index (valor_intersectado). Here’s how the categories are defined:

Proximidad media con alto déficit de equipamiento: For values between -1 and -0.60, this represents areas with medium proximity but a high deficit of infrastructure or equipment.
Proximidad alta con déficit de equipamiento: For values between -0.60 and -0.20, this represents areas with high proximity but still a deficit of equipment.
Proximidad alta con balance de equipamiento: For values between -0.20 and 0.20, indicating a balance between proximity and equipment availability.
Proximidad alta con superávit de equipamientos: For values between 0.20 and 0.60, this reflects areas with high proximity and a surplus of equipment.
Proximidad muy alta con alto superávit de equipamiento: For values between 0.60 and 1, representing areas with very high proximity and a significant surplus of equipment.

Anything outside these ranges is categorized as “Sin categoría” to handle undefined or missing cases.

# Cargar el raster normalizado
raster_normalizado <- rast("resultados/resultado_normalizado.tif")

# Cargar el shapefile de manzanas
manzanas <- st_read("C:/Users/manue/IMEPLAN/Intraurbanos_Suficiencia/datos/primarias/manzanas.shp")

# Asegurarse de que ambos tienen el mismo CRS
# Cambiar el CRS de manzanas para que coincida con el del raster si es necesario
if (!st_crs(manzanas) == crs(raster_normalizado)) {
  manzanas <- st_transform(manzanas, crs(raster_normalizado))
}

# Convertir el shapefile a puntos (puedes usar centroides o los puntos del contorno)
manzanas_centroid <- st_centroid(manzanas)

# Extraer los valores del raster en las ubicaciones de los puntos
valores_extraidos <- terra::extract(raster_normalizado, vect(manzanas_centroid))

# Asegurarse de que los valores extraídos se asignan correctamente
# `valores_extraidos` es un data.frame con la estructura: ID del punto y valor del raster

# Combinar los valores extraídos con el shapefile original
manzanas$valor_intersectado <- valores_extraidos[, 2]

# Crear la columna de categorías según los rangos proporcionados
manzanas <- manzanas %>%
  mutate(categoria = case_when(
    valor_intersectado >= -1 & valor_intersectado < -0.60 ~ "Proximidad media con alto déficit de equipamiento",
    valor_intersectado >= -0.60 & valor_intersectado < -0.20 ~ "Proximidad alta con déficit de equipamiento",
    valor_intersectado >= -0.20 & valor_intersectado <= 0.20 ~ "Proximidad alta con balance de equipamiento",
    valor_intersectado > 0.20 & valor_intersectado <= 0.60 ~ "Proximidad alta con superávit de equipamientos",
    valor_intersectado > 0.60 & valor_intersectado <= 1 ~ "Proximidad muy alta con alto superávit de equipamiento",
    TRUE ~ "Sin categoría"
  ))

manzanas <- manzanas %>%
  mutate(CV_Categoria = case_when(
    valor_intersectado >= -1 & valor_intersectado < -0.60 ~ 1,
    valor_intersectado >= -0.60 & valor_intersectado < -0.20 ~ 2,
    valor_intersectado >= -0.20 & valor_intersectado <= 0.20 ~ 3,
    valor_intersectado > 0.20 & valor_intersectado <= 0.60 ~ 4,
    valor_intersectado > 0.60 & valor_intersectado <= 1 ~ 5,
    TRUE ~ NA_real_  # Usamos NA_real_ para representar valores faltantes como numéricos
  ))


# Guardar el shapefile resultante con la nueva columna
st_write(manzanas, "resultados/manzanas_resultados.shp", delete_layer = TRUE)

This process generated the following key outputs:

Density Kernel Analysis for Elementary Schools (raster): A spatial representation of the distribution of elementary schools, providing insights into the areas with high and low concentration of educational facilities.
Density Kernel Analysis for Blocks Where the Objective Population (Children) Resides (raster): A spatial analysis identifying the distribution of the target population, which helps in understanding the demand for educational services in different areas.
Difference Between Both Density Kernel Analyses (raster): A comparison between the supply (schools) and demand (children), highlighting the areas where there is a surplus or deficit of educational facilities relative to the population.
Final File with Evaluation Values for Each Block (shapefile): A shapefile that assigns an evaluation value to each block, indicating how well the needs of the population are being met by the existing infrastructure in that area.

This process was automated for all subsystems of public facilities: schools, health centers, and markets. By automating the analysis, it ensures scalability and repeatability, making it easier to apply across different facility types and geographical areas. The automation also allows for consistent evaluation and more efficient resource allocation across various public service sectors.

This process generated the following key outputs:

Density Kernel Analysis for Elementary Schools (raster): A spatial representation of the distribution of elementary schools, providing insights into the areas with high and low concentration of educational facilities.
Density Kernel Analysis for Blocks Where the Objective Population (Children) Resides (raster): A spatial analysis identifying the distribution of the target population, which helps in understanding the demand for educational services in different areas.
Difference Between Both Density Kernel Analyses (raster): A comparison between the supply (schools) and demand (children), highlighting the areas where there is a surplus or deficit of educational facilities relative to the population.
Final File with Evaluation Values for Each Block (shapefile): A shapefile that assigns an evaluation value to each block, indicating how well the needs of the population are being met by the existing infrastructure in that area.

Graphical Interface

A significant achievement of the project is the development of a Graphical Interface designed specifically for urban planners, created using Shiny in R. This interface enables users to input and customize data, facilitating the evaluation of educational facilities based on various parameters. By offering an intuitive and adaptable tool, it supports replicability and the exploration of new planning scenarios, making it invaluable for urban planning analysis.

The interface’s success in Jalisco demonstrates its potential to be implemented in other states across Mexico, contributing to strategic urban development efforts and enhancing the ability of planners to make data-driven decisions for future growth and infrastructure planning.

Conclusion

My time at IMEPLAN was a transformative experience where I had the opportunity to apply data science and geospatial analysis to real-world urban challenges. The projects I worked on not only enhanced urban planning workflows but also demonstrated the value of open-source tools in driving cost-effective and scalable solutions.

The culmination of this work is a GitHub repository containing the spatial evaluation app, complete with documentation to ensure its replicability and usability. This deliverable reflects my commitment to fostering transparency, collaboration, and innovation in urban planning. I am proud to have contributed to IMEPLAN’s mission of improving the quality of life in the Guadalajara Metropolitan Area and look forward to seeing how this work impacts future decision-making processes.

Footnotes

Huff, D. L. (1964). Defining and estimating a trading area. The Journal of Marketing, 28(3), 34-38. https://doi.org/10.2307/1249154↩︎