An approximation to Informal Settlements Detection in Mexico
“Satellite images are the dreams of society.
Wherever the hieroglyphics of any spatial image are deciphered,
the basis of social reality is presented.”
SIEGFRIED KRACAUER
Constructions and perspectives
Motivation
In Latin America, urban growth often occurs through the irregular occupation of land, where houses are built in risk areas such as slopes or riverbeds.
This phenomenon, prevalent in countries like Mexico, presents a challenge for governments, which need automated tools to identify and map these informal settlements in order to provide essential services or relocate residents to safe areas.
Data Acquisition
For the realization of this project, data on informal settlements in the city of Monterrey, Nuevo Leon, Mexico, was provided. These were polygons indicating where there was an informal settlement. However, this information alone cannot achieve much.
It is necessary to combine this data with other geospatial data. Satellite images are ideal for this project.
But what kind of satellite images?
Ideally, they should be high-resolution images with spectral bands, meaning they contain additional information such as temperature or vegetation. The problem is that this type of images comes with a high economic cost, so for practical purposes of this project, they were discarded since we aim to achieve relatively good results with free data.
The solution I found was to create a Python script to connect to QGIS (specifically the “Map to Raster” tool) to access a Google Satellite layer and obtain high-resolution rasters (0.3 meters). 1
How to Detect Informal Settlements?
Given the nature of our data, we know that we will need a model that learns the characteristics of images and makes predictions about them. This describes a task of image segmentation using convolutional neural networks. The convolution is used to extract features from the image, and the segmentation is used to predict the pixels of the image where the learned behavior is observed.
U-Net Model
The chosen algorithm for this project is the U-Net neural network architecture. This algorithm was introduced in 2015 at the University of Freiburg for semantic segmentation of biomedical images. However, this architecture is also suitable for image segmentation outside the medical field, with literature available on its use with satellite images. Another advantage is that it can generate quite good results with few training images. This is advantageous because it addresses a potential problem such as space limitation, as satellite images are large files.
Data Preprocessing
High-resolution rasters are very heavy data, so instead of downloading the entire bounding box of the Monterrey metropolitan area, I created spatial clusters of the polygons using K-means clustering.
This process was relatively straightforward; I chose 30 clusters to ensure the clusters were “small” enough to manage efficiently.
DBSCAN is often more used in geospatial analysis, but K-means seems to perform better with my dataset.
Since there are many clusters, the visualization ends up being confusing, but this is not relevant since the goal is only to efficiently manage the data download.
From this point, the ArcGIS Pro tool was used to preprocess the data with the aim of using the ArcGIS Pro Unet model implementation for image segmentation.
First, the Export Training Data For Deep Learning tool is used, utilizing the following parameters:
Model Training
We create a model with our data. We search for a learning rate and train with a specific number of epochs.
Our accuracy was of 76 which was not ideal. It was expected to get an accuracy above 85.
Classify Pixels
The Classify Pixels Using Deep Learning tool is used to generate a layer of vector polygons from a raster. That is, it performs the segmentation by traversing the raster.
This layer will be useful for achieving the desired output.
Majority Filter
The Majority Filter tool is used to reduce the noise in the predictions.
Raster to Polygon
The Raster to Polygon tool is used to convert the segmentation into polygons.
Predictions
The Monterrey Metropolitan Area is very extensive. Therefore, I divided the space into 3000x3000 m squares and carried out the prediction for each square.
Conclusions
It was concluded that this approach is not good as it predicts with many false positives and does not manage to generalize a clear pattern that would allow us to identify informal settlements in an automated way. It is thought that by using more robust data (with more spectral bands), valuable results could be achieved that allow us to understand more about urbanization dynamics in Mexico and Latin America.
Acknowledgements
I would like to express my sincere gratitude to the Center for the Future of Cities for facilitating the acquisition of data from the government of Monterrey and for providing me with the support needed to explore this project. Such collaborations are essential for advancing our understanding and improvement of our cities through data science.