Certified programming framework for machine learning applications
Advisors : Claire Pagetti & Aurélien Plyer (Onera), Adrien Gauffriau (Airbus)
Contacts : email@example.com / firstname.lastname@example.org
Net Salary: 2096€ per month with some teaching (64 hours per year on average)
Duration: 36 months
Machine Learning gains an important consideration in the domain of safety critical systems, including aeronautical area. However, as those applications do not reach classical safety confidence levels and are not implemented with accepted development process [BCM+15, ABH+18], many research and engineering activities must be conducted before embedding these kinds of application in aircrafts.
Among them, the question of how to safely and reliably implement a neural network on an adequate hardware is of vital importance. Indeed, certification requirements, in particular those of the DO 178C [RTC11], impose strong guarantees on the quality of the code and expect the designer to compute the WCET (Worst Case Execution Time).
The scope of the PhD is the real-time implementation of neural networks on platforms. Thus, the purpose of the PhD is to answer the following questions:
- how to choose an adequate COTS (Commercial Off the Shelf) hardware that offers sufficient computer performance and that fulfills aeronautical constraints (e.g. dissipation); which COTS is an adequate choice for which type of ML method
- how to define an execution model [PMN+16] on the hardware so that it is possible to compute tight WCET of a Machine Learning model/applications
- how to code efficiently a Machine Learning model and compile it on the target
- how to parallelize the execution if the platform offers parallelism
First of all, aeronautical inputs will be clarified. The PhD will be in charge to investigate Airbus needs and future aircraft systems. The study will be restrained to the so called supervised learning (for instance no reinforcement learning), that is the embedded function is for inference only (model training is realized offline on ground in specialized frameworks). The families of machine learning will belong to Deep Neural Networks, Ensemble Methods and Convolutional Neural Networks. The first objective is to review existing COTS technologies available on the market to execute neural network applications. A non exhaustive preliminary list is: NXP QorIq family (e.g. T1042 [NXP15]), Kalray Coolidge [Kal19], NVIDIA GPU (e.g. Turing [NVI18]), Intel Movidius Neural Compute Stick [Int18] or TPU (e.g. Coral Dev Board [Goo19]). A design space exploration will be realized on those platforms with the use cases. In addition to the pure hardware part, it is also important to investigate the associated frameworks (e.g. TensorFlow), the available code generation approaches and compilation procedures of these frameworks. Their export formats is of great investigation interest, since these are the entry points to the embedded function generation phase. In addition to these technical considerations, the investigation will be made also in regards to the certification and industrial constraints.
From the exploration, one or two COTS will be chosen. The second objective is to define for the candidates a so called execution model, which is a set of rules to program and configure the platform in order to reduce non predictable behavior. In the real-time community, predictability is the ability to compute tight WCET. Recent generation of processors embeds highly complex, and often non documented, mechanisms making it hard to assess the maximal number of cycles required to execute a sequential program [WEE+08]. One solution to overcome this problem is to reduce the potential non predictable behavior by restraining the execution according to predefined programming rules (e.g. TDMA time driven multiple access to a shared resource). The execution will be done on a minimalist environment, e.g. bare metal, to fully understand and predict the low level behavior of the program and the corresponding computational resources requirement.
Once the target and its associated execution model have been defined, the last part of the PhD will consist in developing an automatic, and verified, framework to generate low level code (e.g.C code) and its associated binaries on the target. By verified, we mean that the semantics of the C code must be equivalent to the output of the neural network framework design (e.g. TensorFlow or PyTorch); and the semantics of the execution must be equivalent to the code. Understand the efficiency of various hardware architectures for different Machine Learning methods as well as to evaluate our capability to embed these methods on these platforms with avionics constraints in mind.
- [ASK+18] Tahmid Abtahi, Colin Shea, Amey Kulkarni, and Tinoosh Mohsenin. Accelerating Convolutional Neural Network With FFT on Embedded Hardware. IEEE transactions on very large scale integration (VLSI) systems. 2018.
- [ABH+18] E. Alves, D. Bhatt, B. Hall, K. Driscoll, A. Murugesan, et J. Rushby, « Considerations in Assuring Safety of Increasingly Autonomous systems », NASA, NASA/CR-2018-220080, 2018.
- [BCM+15] S. Bhattacharyya, D. Cofer, D. J. Musliner, J. Mueller et E. Engstrom, « Certification considerations for adaptive systems », NASA, NASA/CR-2015-218702, 2015. [Goo19] Google. Coral Dev Board. 2019 https://coral.withgoogle.com/docs/dev-board/datasheet/
- [Int18] Intel. Intel Movidius Neural Compute Stick. https://software.intel.com/en-us/neural-compute-stick
- [Joo13] Mohammad Hadi Jooybar. Deterministic Execution on GPU Architectures. Master thesis. 2013.
- [Kal19] Kalray. MPPA-3 – Coolidge. 2019. https://www.kalrayinc.com/release-of-third-generation-mppa-processorcoolidge/
- [NVI18] NVIDIA Emmett Kilgariff, Henry Moreton, Nick Stam and Brandon Bell. Turing. 2018. https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/
- [NXP15] Freescale Semiconductor. QorIQ T1042, T1022 Data Sheet. 2015. https://4donline.ihs.com/images/VipMasterIC/IC/PHGL/PHGL-S-A0002440345/PHGL-S-A0002440345-1.pdf
- [PMN+16] Quentin Perret, Pascal Maurère, Eric Noulard, Claire Pagetti, Pascal Sainrat, Benoit Triquet.,Temporal Isolation of Hard Real-Time Applications on Many-Core Processors. RTAS 2016: 37-47
- [RTC11] RTCA, Inc. DO-178 ED-12C – Software Considerations in Airborne Systems and Equipment Certification, 2011.
- [WEE+08] R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenstrom. The worst-case execution-time problem – overview of methods and survey of tools. ACM Transactions Embedded Computing Systems, 7(3):36:1–36:53, May 2008.
Artificial Intelligence for Ecosystem Monitoringusing Remote Sensing and Digital Agriculture Data
Advisors: Mathieu Fauvel, CESBIO, Inra Toulouse, France ; Jordi Inglada, CESBIO, Cnes Toulouse, France ; Mickaël Savinaud, CS, Toulouse France
In the last years, the advent of Earth observation satellite missions with short revisit time and in-creased spatial resolution has led to an unprecedented amount of remote sensing images of heteroge-neous physical nature (e.g., optical & radar Sentinel time series . . . ) at various scales (e.g., submetric,decametric . . . ). Furthermore, satellite image archives, such as Spot Heritage, are made available bymany space agencies. Such massive data extend existing satellite andin-situacquisition system usedto understand, explain and predict the states and trends of our environment.
At the same time, digital agriculture gathers more and more data with the development of includesensors, robotics and machinery. Such data, combined with weather information, provide a richfulland complementary information to remote sensing data. A join use of these two sources of valuableinformation is a crucial task to enhance the knowledge of our environment and our ability to makedecisions. However, the novel complexity of the data makes the conventional analytical methods notadapted, and therefore not suitable for extracting and for processing all the relevant information fromthe massive flow of data.
In order to address challenges raised by such applicative domains, the interdisciplinary institutein artificial intelligence of Toulouse, named the Artificial and Natural Intelligence Toulouse Institute(ANITI) from which the CNES is partner, has been proposed to develop innovative solutions usingtheoretical advances in core AI scientific areas. The CESBIO lab, with J. Inglada and M. Fauvel, is partof the ANITI Chair entitled “Data-driven approximate Bayesian computation for fusion-based inference fromheterogeneous (remote sensing) data” hold by Prof. N. Dobigeon.
Two main challenges issued from theANITI core tracks have direct application for this PhD proposal, co-funded by CS:
- Integration of massive multi-source/scale satellite image(optical & radar image time series,very high spatial resolution)and in-situ/field data(digital agriculture, meteorological or crowd-sourced data)in learning algorithm through large scale distributed optimization.
- Explainable and interpretable model.
The PhD thesis objectives are two-fold. First it aims to integrate into learning algorithms multi-scale&source data from Earth observation systems and from digital agriculture. Second, the definitionof spatially constraints and interpretable models will be considered.
Theoretical foundation of the PhD work will be based on last advances in Gaussian Processes (GP)and kernel algorithms. Such methods have regained attention from the machine learning communitythanks to last developments in optimization techniques. They allows GP to be used in large scalescenario, with many millions of points [1,2].
Current researches are bridging the gap between DeepNeural Network and GP, by adding theoretic results in large scale learning  and interpretabilityfrom the Bayesian modeling.From the current algorithms, it is not possible to integrate point data into the processing of satelliteimage time series because of the different spatial sampling (point versus grid sampling). The construc-tion of appropriate latent subspaces will be considered to properly use heterogeneous data by meansof appropriate vector-valued kernel function and multi-output GP.The construction of spatially interpretable model will be considered by a constrained spatial strat-ification. Large scale analysis of remote sensing often resort to such stratification (e.g., eco-climaticstratification): several models, one for each strata, are learned independently. However, no constraintsare imposed and the models could behave differently at the spatial region boundaries.
The objectiveis to include specific spatial constrains in the learning step to ensure a smooth transition between two(or more) spatial regions, i.e., to ensure similar prediction for models at boundaries.One key step of the proposed methodology is the learning step, i.e., the optimization of the variousparameters of the model. Using GP with spatial constraints on massive data set requires to solve largescale non-convex problems, which is not a trivial task. Hence, specific developments on computationalstatistic and optimization will be conducted to solve such problems efficiently. In particular, distributedoptimization strategy will be considered to cope with the possible dissemination over multiple datacenter of the data .
The validation context will be the land-cover production chain of CESBIO,iota2, and its annual landcover map OSO (http://osr-cesbio.ups-tlse.fr/~oso/). The proposed models will be integratedand validated with respect to the current standard of large scale land cover map production. In termsof products, land-cover maps as well as biodiversity indices will be considered.
Requirements – The candidate must have a solid background at least in one of the following items • Statistical signal and image processing,• Machine learning,• Optimization.
A good knowledge of English and scientific programming (Python, C/C++) is required.
Contact – Candidates should send an e-mail to email@example.com, firstname.lastname@example.org & email@example.com containing: • Full CV,• Motivation letter,• Contact information for2references, and/or recommandation letter.
The beginning of the thesis is scheduled on September2020. The Application is open until the positionis fulfilled. The recruit will be registred to the doctoral school ED173“Geosciences, Astrophysics,Space and Environmental Sciences” or ED475“Mathematics, Informatics and Telecommunications”.
Practical details – The position comes with health insurance and other social benefits. The recruit willbe located in the CESBIO lab, in Toulouse and will interact with people involved in the project (CS,ANITI, CNES & INRA). French is not mandatory
CESBIO – Research at CESBIO aims to develop knowledge on continental biosphere dynamics andfunc- tioning at various temporal and spatial scales and as such participates in the specification ofspace missions and the processing of remotely sensed data. CESBIO is or has been PI for2ESA satellitemissions (SMOS, the Soil Moisture and Ocean Salinity satellite, and BIOMASS, a P-band SAR systemto be launched in2020) and for the French-Israeli Venus satellite (2-day revisit,10m resolution, opticalsensor for vegetation monitoring, launched in2017). CESBIO has developed theiota2processing chainfor the operational production of land-cover maps at the national French scale. It has therefore a strongexperience in upscaling learning and classification processes. CESBIO has been committed over thelast two years in collecting feed- back, tailoringiota2outputs for various end-users, and disseminatingit for several research institutes in France.
ANITI – The ambition of the ANITI project is to develop a new generation of artificial intelligencecalled hybrid AI, combining data-driven machine learning techniques with symbolic and formal meth-ods for expressing properties and constraints and carrying out logical reasoning. This approach willprovide better guarantees in terms of reliability, robustness and the ability to explain and interpretthe results of the algorithms used, while ensuring social acceptability and economic viability. Suchguarantees are required by many applications targeted by the project, such as autonomous vehicles ofthe future. Starting operations this autumn, ANITI will bring together more than200researchers fromuniversities, engineering schools, scientific and technological research organizations, and about thirtycompanies in the Toulouse region.
Communication & Systèmes – CS Systèmes d’Information is a French ETI of more than2000peo-ple, a major European player in the integration of systems, including space and simulation systems.Within CS-SI, theBU Space, in charge of space-related activities, is composed of more than300peopleworking for35years for major players in the space sector in France and in Europe: CNES, ESA, AirbusDefense & Space and Thales among others. Within theBU Space, the PDA department brings its com-petences and expertise in image quality control, processing, production and operational exploitation ofgeospatial data (satellite images, geographic databases, etc.). The department’s activities cover almostthe entire space value chain, from sensor to applications. This broad positioning makes it possible tohave within a single department a multidisciplinary team with different but totally complementaryprofiles and experiences. As part of this positioning CS SI intervenes on various topics of R& D whoseAI is one of the main axes. In addition, CS SI has defined a highly structured open source policy thatinnervates these different activities of both R& D and development of operational applications. Forexample, the Remote Sensing and Toolbox team integrated into the PDA department develops opensource software such as the state-of-the-art of image processing: Orfeo ToolBox, SNAP, S2P, …
 Ke Alexander Wang, Geoff Pleiss, Jacob R. Gardner, Kilian Q. Weinberger, and Andrew Gordon Wilson.Exact gaussian processes on a million data points.arXiv pre-print arXiv:1903.08114,2019.
 Marc Deisenroth and Jun Wei Ng. Distributed gaussian processes. In Francis Bach and David Blei, edi-tors,Proceedings of the32nd International Conference on Machine Learning, volume37ofProceedings of MachineLearning Research, pages1481–1490, Lille, France,07–09Jul2015. PMLR.
 Julien Mairal.Large-Scale Machine Learning and Applications. Habilitation à diriger des recherches, UGA -Université Grenoble Alpes, October2017.
 Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization andstatistical learning via the alternating direction method of multipliers.Found. Trends Mach. Learn.,3(1):1–122,January2011.ANITI – ARTIFICIAL & N
Comment candidater ?
Envoyez votre CV détaillé, une lettre de motivation et une copie de vos diplômes à firstname.lastname@example.org
Des exemples de vos publications scientifiques et des lettres de recommandation seront un plus.