CORSA, EO data compression and processing with AI

With the huge amount of data being collected by several Earth observation missions, limitations in data downlink capacity and storage have become a major obstacle in fully utilizing all the valuable information. Within the CORSA project, funded by the European Space Agency's PhiLab EO Open Science for Society framework, we have developed a novel AI-based method for near-lossless image compression which allows us to optimize the use of available data storage and perform various image analysis tasks directly on the compressed image vectors. Discover more about CORSA, a game-changer that will revolutionize the way we use data from earth observation missions, making it more efficient and effective than ever before.

Smart EO data refining 

Traditional earth observation data flows often use basic processing chains that compress and transmit unnecessary data due to the fact  that each step largely ignores the remainder of the flow. This can lead to compressing and transmitting useless data e.g. image areas with cloud cover. Edge computing, which brings computation closer to the data source, can help to resolve these bottlenecks by reducing the volume of data that needs to be transmitted.

A generic approach to this problem can be set up using the AI (Artificial Intelligence) concept of self-supervised learning. In this approach, meaningful image representations of the semantic content can be learned without the need for human interaction to label specific feature classes. The models can be trained on-ground and incrementally finetuned on-board, so there is no need to downlink a large amount of training data.

A major advantage of this approach is that the models can be trained in function of the intended information extraction use, resulting in an optimally tuned data reduction. Furthermore, after transmission, the compact image representations can either be fully reconstructed or can serve as an excellent starting point for further efficient information extraction, such as through lightweight AI/ML models trained with limited human-labelled data. 


The CORSA concept

The algorithm

The CORSA algorithm is a foundation model for Sentinel-2 data, meaning it's a powerful AI model that has been pre-trained on a large amount of heterogeneous data in a self-supervised way. This allows CORSA to be finetuned to different tasks, much like how ChatGPT and DALL-E use foundation models to generate text and images. The CORSA algorithm is specifically a custom VQVAE architecture, which compresses Sentinel-2 tiles into compact feature vectors using a codebook. This codebook acts like a dictionary of semantic features, enabling near-lossless reconstruction of Sentinel-2 imagery. The more extensive the codebook, the more intricate details of the image are retained. The multiple layers in the CORSA algorithm aren't random, but instead, they form a hierarchical representation of the spatial information present in the original image.


The CORSA architecture

The impact on image quality and downstream applications

The CORSA project has proven that Sentinel-2 data with 4 band RGBNIR can be compressed by a factor of 20 while maintaining minimal loss in image quality and without the blocky artifacts often seen in traditional compression methods like JPEG2000. This is demonstrated in the CORSA architecture shown above.

Next, we explored the impact of compression and reconstruction on the complex task of agricultural parcel delineation through image segmentation. A previously trained model we developed for semantic segmentation on cultivated land (as part of the AI4EO Sentinel-2 challenge) was tested on both original data and data compressed and reconstructed through the CORSA algorithm. The results showed minimal impact on the accuracy of the detected cultivated land, with only small details being affected which are visible in the images below. This indicated that the CORSA compression process does not negatively impact the agricultural parcel delineation task.

Evaluating the Impact of Image Quality Loss on Agricultural Parcel Delineation

Evaluating the Impact of Image Quality Loss on Agricultural Parcel Delineation; left: ground truth, middle: segmentation on original data, right: segmentation on reconstructed data - click here to read full paper

Finally, we investigated the possibility of training lightweight AI/ML classification and segmentation models directly on the compressed features instead of reconstructing the original image. In the Geo.Informed project, funded by Research Foundation-Flanders (FWO), we compared a custom U-Net trained on RGBNIR Sentinel-2 data with a network trained on the CORSA compressed features for water detection. The image below shows that both approaches were successful in segmenting water. These findings show the potential for CORSA to greatly improve the efficiency of end-to-end Earth Observation value added service development and operations.

Finetuning a model for water detection on top of CORSA features

Finetuning a model for water detection on top of CORSA features; left: U-net on original data, right: custom model on CORSA compressed features - click here to read the full paper

CORSA, a key technology for future EO missions

We have shown that CORSA is an efficient and versatile algorithm that can be used for both compression and as a generic pretrained model for downstream applications.

Looking ahead, we are excited to take CORSA to new heights by:

  • developing an embedded version for onboard implementation in AI-enabled space missions,
  • customizing it for efficient, large scale land cover/land use classification and change detection,
  • and expanding it to new sensors, particularly hyperspectral data.

These advancements will further solidify CORSA's role as a key technology for future Earth Observation missions.


Copyright images:

  • Header image:  Sentinel-2 in space - © ESA-ATG Medialab
  • WorldCover images in the CORSA concept: © ESA WorldCover project 2020/ Contains modified Copernicus Sentinel data (2020) processed by ESA WorldCover consortium