Multi-temporal and multi-modal Earth Observation latent spacedecoding using physically aware Deep Learning

This offer is part of the RELEO (REpresentation Learning for Earth Observation) of ANITI-2, the follow-on of theInterdisciplinary Artificial Intelligence Institute in the frame of the French ANR “AI Clusters”. The PhD will befunded by CNES (the French Space Agency) and Thales Alenia Space.RELEO aims at building an AI foundation model for the exploitation of Earth Observation Satellite Image TimeSeries (EO SITS). This model will fuse multi-modal data (optical, SAR, thermal) into AI-Ready chunks of latentfeatures (also known as embeddings), where traditional spatial, temporal and spectral dimensions of Earth Obser-vation data have been collapsed. These latent chunks will benefit from the complementarity and the correlationsof these different data sources and will provide essentialized (fully encoding the useful information) and stronglycompressed information. The fusion will be done with deep neural networks whose training will be guided by phys-ical models of the observed processes (bio/geo-physical models) and image formation models (radiative transferand sensor models).

About the PhD

The proposed PhD subject will contribute to RELEO’s WP3 whose goal is decoding the dimensionless compressedembeddings to produce time series of input variables for the physical models used to constrain the training prob-lem through self-supervision. Some of these variables will be Essential Climate and Biodiversity Variables [1], as forinstance soil moisture, land surface temperature or leaf area index. The work consists in building a neural decoderfor the data cube unfolding: going from the compressed multi-modal data fusion back to the temporal and spa-tial resolutions needed by downstream applications. The decoder training will be done through the optimizationof the above-mentioned physical models’ output. Reference bio/geo-physical products from operational mono-sensor processing chains will be used together with field surveys to validate the decoder outputs. To give a concrete example, one can start with 2 sources of data, high-resolution optical Sentinel-2 (10 m. res-olution every 5 days) and radar Sentinel-1 (10 m. resolution every 6 days) time series covering a full country likeFrance. These are encoded as spatio-temporal chunks of 1km × 1km × 10 days as vectors of 256 features. The goalis to decode these vectors in order to produce weekly Leaf Area Index maps at 10 m. resolution.

The originality of the proposed approach is two-fold.

• First of all, the use of physical models to constrain the learning (PINN, Physically Informed Neural Networks[2]) is very recent in EO [3, 4] and its extension to the multi-modal case is a challenging problem, but it will1allow the development of downstream application processors which are independent of the availability of aparticular sensor.

• Second, the generation of data at tailored temporal and spatial resolutions (including spatial resolutions notacquired at the chosen time stamps) allows to foresee an accuracy improvement of the downstream productsby avoiding repeated resampling steps which are known to degrade the input data. This will entail the devel-opment of neural architectures which are able to produce image time series from embeddings which don’thave explicit spatial nor temporal dimensions. Generative models either based on diffusion approaches [5],[6], normalizing flows [7] or similar approaches will be investigated.

Work plan

1 – State of the art on foundation models and PINN starting from what is already available at CESBIO.

2 – Method development: latent space decoding for on-demand resolution using physical model guided train-ing

3 – Validation and assessment using reference products and field surveys.

Work environment

The PhD will take place at Cesbio1 in Toulouse. The PhD candidate will be integrated into the Observation Systemsteam and more precisely, within the AI unit.The team works on CNES’ (the French Space Agency) high performance computing (HPC) infrastructure (250nodes with 8000 CPU, 53 GPU) which also hosts a full mirror of all Sentinel-1 and Sentinel-2 data.

Application procedure

Candidate profile: Masters in at least one of the following areas: applied mathematics, physics of measure, opti-mization, machine learning.

Skills in and eagerness for computer programming in the areas of scientific computingor machine learning.

Send Curriculum Vitae, motivation letter and recommendation letters to Jordi Inglada before February 28 2024.

PhD position: Marginal representation of streamedmulti-source remote sensing data using Gaussianprocess prior variational auto-encoder

This offer is part of the RELEO (REpresentation Learning for Earth Observation) of ANITI-2, the follow-onof the Interdisciplinary Artificial Intelligence Institute in the frame of the French ANR “AI Clusters”. Over the last ten years, Earth Observation (EO) has made enormous progress in terms of spatial andtemporal resolution, data availability and open policies for end-users. The increasing availability of com-plementary imaging sensors allows to observe land ecosystems state variables and processes at differentspatio-temporal scales. Big EO data can thus enable the design of new land monitoring systems providingcritical information in order to guide climate change monitoring, mitigation and adaptation. One maininformation is the land cover state and trend of continental surfaces, see Figure 1 for an example of landcover map.


Conventional machine learning methods are not well adapted to the complexity of multi-modal, multi-resolution Satellite Image Time Series (SITS) with irregular sampling, and therefore not suitable for ex-tracting and processing all the relevant information. On the other hand, methods based on deep neural1networks have shown to be very effective to learn low-dimensional representation of complex data for sev-eral tasks and come with high potential for EO data, but they often come from the Computer Vision (CV)and natural language processing (NLP) communities and need to be extended to handle the specificities ofEarth Observation data.Previous works at the CESBIO-lab have shown that generative encoder-decoder architectures such asthe Variational Auto-Encoder (VAE) models or the U-NET models perform very well for a variety of EOtasks: estimation of biophysical parameters or Sentinel-1 to Sentinel-2 translation, to cite a few.However, such approaches appear to be inadequate to handle data coming from more than 2 sourcesand acquired at different time and spatial resolutions, as prioritized in the RELEO chair within ANITI.

Inparticular, the generative capability of these models may generalize poorly to unseen region or temporalperiod. Processing such streams of data requires to jointly encode all source into a structured latent spacewhere each complementary information carried by each source can be embedded while ensuring long-termencoding of newly acquired data (from possibly new sensor).Besides, VAEs usually assume independence between samples and require that all latent variables aregenerated even if some data source is missing. These assumptions are generally made for sake of simplic-ity and computational efficiency of the training and inference steps. However, assuming independenceof samples amount to ignoring the correlation between adjacent pixels in the spatial and/or temporal do-mains. The second requirements is commonly addressed thanks to masking strategies. Because of thevery deterministic nature of such neural networks architectures, they do not properly encode uncertaintyrelated to missing data. Furthermore, they are not able to impute the resulting missing information in thelatent space.The objective of this PhD is to learn a low-dimensional probabilistic representation of multi-source EOdata addressing the problem of streamed multi-source data. Specifically, in this PhD, the candidate willinvestigate the relevance of the Gaussian process (GP) prior. Adopting this GP prior is expected to modelcorrelations for multiple (possibly infinite) sources of data in the latent space, i.e., when the number ofEO sources may vary during training or inference. Furthermore, building on the conditioning propertyinherent to GP, it might be possible to reconstruct missing data in the learned latent space.

Lastly, as forany generative deep model, sampling from the latent space is straightforward.The methodological challenges that will be considered during the PhD are three-fold:1. Defining a probabilistic generative model with GP prior for the latent space. This prior should encodeproperly time and space for any sensors. A possible starting solution will be based on an adaptationof existing works from multi-view objects reconstruction, such as [1] or [2].2. Developing a scalable training algorithm. Conventional GP are known to scale poorly but solutionsbased on sparse and variational approximations have been successfully employed (see [3] for a pastwork co-funded by the CNES).3. Maintaining fast inference as with VAE. This requires to amortize parameters of the model duringthe training process.This model will be evaluated and compared to state of the art generative representation learning algo-rithms for the different scenarios identified in the RELEO chair hosted by ANITI. In particular, the learnedlatent representations will be used to generate Essential Climate Variables in order to monitor land usesand land cover changes (see Figure 1), as well as vegetation state and trend, carbon cycle and water cycle.

Scientific environment

The PhD. student will benefit from a favorable context and will be able to rely on the most recent resultsand advances in machine learning and Earth observation signal & image processing. He/she will be mainlyco-advised by the following researchers within the CESBIO-lab:

  • Mathieu Fauvel, INRAe Researcher
  • Nicolas Dobigeon, Professor at Toulouse INP
  • Julien Michel, CNES Engineer

He/She will take advantage from the lab scientific actitivies (e.g. scientific seminar, as well as the ANITI dynamic in Toulouse. Also, the PhD is funded jointly by the CNES and CS-Group and, as such, the recruit will benefit of their expertise duringt he PhD.

Candidate background

Master or Engineering school students with major in applied mathematics, computer science or electricalengineering. The candidate must have a solid background in at least one of the following subjects:

  • Statistical signal and image processing
  • Machine learning or data science
  • Remote sensing data processing

A good knowledge of English and scientific programming skills (Python, git) are required. Broaderinterest in Earth observation will be appreciated

Application procedure

Applicants are also invited to send (as pdf files): A detailed curriculum – Official transcripts from each institution you have attended (in French or English)

You will be contacted if your profile meets the expectations. Review of applications will be closed on mid-March 2024.

Ne manquez rien !

Inscrivez-vous pour recevoir l'actualité d'ANITI chaque mois.

Nous n’envoyons pas de messages indésirables !