What is Major Tom?
Major Tom is an open-access dataset developed by ESA's Φ-lab. It divides the globe into a 10 km by 10 km grid, assigning a high-resolution Sentinel-2 patch to each cell. Each patch is a multispectral cube with 12 bands, each stored as a separate GeoTIFF file (B01–B12). This structured approach allows standardised benchmarking and fair comparisons of AI models across geographies and tasks.
For this demonstration, we focused on a regional subset covering Flanders and parts of the Netherlands. We made these patches available in both their original and CORSA-compressed forms.
Enter CORSA: Compression Meets Intelligence
CORSA isn't just another image compression tool. It’s built on a Vector Quantized Variational Auto-Encoder (VQVAE) architecture, trained to compress multispectral satellite images while preserving their semantic content. Instead of storing the original image, CORSA represents it through indices pointing to a learned codebook of feature vectors—drastically reducing storage size while maintaining rich visual information.
Figure 1: Reconstructing original image from compressed CORSA embedding.
This dual role of CORSA—compression and feature extraction—means developers can train downstream models (like land cover classifiers or change detectors) directly on the compressed features, bypassing the need to decode or reprocess the full original image.
CORSA + Terrascope = Ready-to-Use AI Stack
Terrascope is a the Belgian open EO platform—funded by the Belgian Science Policy—offering on-demand access to a wide range of geospatial datasets and processing capabilities. By integrating CORSA outputs as a public data collection, Terrascope now enables anyone to build EO applications using precomputed embeddings—no GPU required, no downloads of multi-gigabyte files.
On Terrascope, we published:
- Original Sentinel-2 Major Tom patches over Flanders and the Netherlands.
- CORSA-compressed versions (feature maps) of the same patches.
- A sample Jupyter Notebook that shows how to load, visualise, and use these embeddings in a downstream land use classification task.
Speed and Efficiency: A Quick Comparison
Let’s take a group of 27 grid cells located between Antwerp and Rotterdam (see Figure 2) and compare performance:
Figure 2: Map showing Major Tom grid patches over Flanders and the Netherlands.
Format | File Size | Download Time | Reconstruct Time per Tile |
Original S2 (12 bands) | 382.8 MB | 60.6 s | 0.2 s |
CORSA (2 feature levels) | 10.1 MB | 5.9 s | 1.8 s (decode) + 0.5 s (scaling) |
Figure 3: Bar chart comparing download time and file size for original vs CORSA format.
That's:
- 10× faster downloads
- ~32× smaller files
- And more importantly: ready-to-use feature vectors without needing to retrain a model.
From Colour to Classification
To explore the semantic structure of CORSA embeddings, we visualised them in two ways:
- Codebook-inherent colour: Based on the 3D arrangement of vectors during training.
- t-SNE-based colourisation: A non-linear projection of codebook vectors into 3D space, normalised and mapped to RGB.
Figure 4: Side-by-side visualisation of the “571U_29R” patch using codebook colour and t-SNE colourisation.
These visualisations give a striking view of the ‘semantic texture’ of the Earth, as learned by CORSA.
As a toy example, we trained a lightweight land cover classification model using only 541 samples from the Dynamic World dataset, leveraging CORSA embeddings directly as input. This drastically reduces the need for annotated data and training time—perfect for rapid prototyping or deployment in low-resource settings.
Figure 5: Land cover classification map for grid cell 571U_29R using CORSA features.
Why CORSA is Unique: One Solution, Many Wins
CORSA stands apart from traditional compression or AI feature extractors because it solves multiple challenges at once:
- Storage-efficient: Achieves 25–40× compression on Sentinel-2 imagery
- Bandwidth-friendly: Smaller files = faster downloads
- Energy-saving: Reduces server-side and client-side compute
- Model-ready: Feature embeddings usable out of the box
- Few-shot friendly: Enables training with fewer labels
- Sensor-adaptable: Can be retrained for other satellites or sensors in a self-supervised way
Toward an Inclusive Future for AI4EO
What we’re seeing is a transition in the EO world: from data hoarding to data accessibility, from big compute to smart compute. By combining CORSA’s intelligence-preserving compression with the cloud-native accessibility of Terrascope, we make it easier for more people—researchers, NGOs, startups, and students—to work with remote sensing data and build impactful AI solutions.
This is a step toward the democratisation of AI4EO—bringing down barriers like cost, compute, and data availability to unlock innovation for all.
Figure 6: Walkthrough of the Terrascope notebook.
Join Us at Living Planet Symposium
Curious to learn more about data compression and how it can support your work in Earth observation (EO)? Visit the VITO booth (U31) at the Living Planet Symposium 2025 in Vienna during 23-27 June. Our Remote Sensing experts are looking forward to answering your questions and showing how CORSA can support data accessibility. And don't miss our presentations, demo, and poster on Thursday 26 June and Friday 27 June to learn more about the latest CORSA updates:
Timing | Type / Session | Topic | Speaker | Location |
Thursday, 26 June 14:00-15:30 |
Oral Presentation D.02.06 |
From Edge to Insights: Transforming Earth Observation with Lightweight Foundation Models and Embeddings-as-a-Service | Tanja Van Achteren | Hall G1 |
Thursday, 26 June 15:45-16:15 |
Demo at VITO Booth | From Orbit to Insights: CORSA Live on Edge, Insights via Terrascope Compressed Embeddings. In Collaboration with Unibap. | Tanja Van Achteren | VITO Booth (U31), EO Arena |
Thursday, 26 June 17:45-19:00 |
Poster D.04.03 |
Unlocking ML and Foundation Models within openEO | Hans Vanrompay | X5 - Poster Area |
Friday, 27 June 14:30-16:00 |
Oral Presentation C.01.03 |
Efficient On-Board Processing Using a Shared AI Backbone Acorss Multiple Tasks | Bart Beusen, Andreas Luyts | Room 1.85/1.86 |
Let’s connect in Vienna and discuss EO intelligence! Cannot make it to Vienna? Feel free to contact us online.
