An AI foundation model for airborne change detection

The Digital Reference map of Flanders (GRB) is a crucial foundation for creating a comprehensive digital twin of Flanders. It contains detailed information on every civil infrastructure object, including buildings, roads, waterways, parcels, and more, all meticulously labeled. However, as civil infrastructure constantly evolves, it has become imperative to optimize the maintenance operations of the GRB. Discover how we use AI and machine learning to streamline the identification of modifications and setup a more efficient and accurate process.

A foundation model for airborne imagery

Aerial imagery are used as primary data sources to identify these changes in the built environment and with the remarkable advancements in AI technology, the possibilities for automatic change detection have increased tremendously. By improving services like the GRB, policy makers can easily access up-to-date and reliable data which are crucial for urban planning, resource management, and sustainable growth. In collaboration with the GRB and EODaS teams of Digital Flanders, we embarked on an exciting journey to develop a change detection algorithm specifically tailored for airborne imagery.

DINO1 is a self-supervised method for training a vision transformer model (ViT) developed by Meta AI Research to process image and video data. Unlike large language models, DINO is tailored for visual tasks like classification, retrieval, understanding, and segmentation. Pretrained without supervision on a diverse image dataset, DINO's versatile features perform well on new data without extra tuning.

Our team adapted the original DINO model architecture to multispectral data and trained a customized foundation model from the ground up, meticulously curating a diverse dataset of VHR airborne imagery tiles over Flanders. The dataset's variety and multi-year coverage results in effective generalization ensuring accurate and strong change detection results.


Heterogeneous dataset for the unsupervised training of a foundation model for Flanders.

AChieving precision in built environment monitoring

In our pursuit of capturing changes in the built environment at a suitable granularity, we carefully selected a tile size of 16mx16m on which change detection and classification is performed. Why? This choice gives the perfect balance, offering sufficient detail for typical residential buildings while encompassing the contextual elements of the surroundings, such as gardens and road infrastructure. Leveraging the customized model, we are able to generate highly informative features that correspond to the semantic content of each individual tile. This approach allows us to group similar tiles together in a powerful feature space, forming the backbone of our change detection process. By applying the model and calculating a similarity metric on successive years of airborne imagery, we can create dynamic change maps (see figure below) which reveal the evolving urban landscape in high detail. 


Example of change detection between successive years.

Change detection beyond built environment

The strength of our model lies in its strong representation power and the fact that we can train it fully unsupervised, without any explicit ground truth provided. This unique capability enables us to capture changes not only within a built environment but also across a broader landscape. It even allows us to gain valuable insights into environmental changes and land use patterns that extend beyond urban areas.

By leveraging the concept of fine-tuning (see figure below) by training a light weight AI/ML classification model on top of the foundation model, we can filter out regions that are non-relevant to the built environment monitoring process, e.g. agriculture, water and forest areas. This strategic approach allows us to focus on the most pertinent areas, streamlining our efforts and ensuring that our change detection algorithm operates with accuracy and effectiveness.


Principle of finetuning a foundation model.

Effortless training and rapid classification

Imagine a world where training a machine learning model to classify tiles according to a custom set of classes is a breeze. With just a limited amount of annotations, we can achieve this seamlessly on top of our powerful foundation model. The demonstration video below showcases the simplicity and speed of training and applying a custom finetuning model for four land cover classes. In just a matter of minutes, we unlock the potential to accurately categorize various land cover types, revolutionizing the way we process and analyze geospatial data.


Data-driveN insights to support decision-making

With the advent of unsupervised training, we unlock new dimensions of understanding, going beyond the confines of human-labeled data to provide detailed, accurate and up-to-date data. The possibilities, and certainly the combination, of these unsupervised learning and fine-tuning techniques allow us to train custom models effortlessly, enhancing their efficiency and accuracy for specific applications, all in a matter of minutes.

The fusion of cutting-edge AI technologies and geospatial data empower policy 
makers, urban planners, and environmentalists with data-driven insights, fostering smarter decision-making for a sustainable future.

VITO Remote Sensing and Digitaal Vlaanderen (teams GRB and EODaS) will evaluate if this AI-technology can be introduced into their parcel monitoring strategy for Flanders or into the operational processes of GRB mutation detection e.g. to assist GIS operators in their GRB change management tasks.

1) *DINO - [2104.14294] Emerging Properties in Self-Supervised Vision Transformers (