ANITI Principal Investigator’s M2 internship subjects
Each year, ANITI PIs and their research teams offer internship topics specifically for ANITI scholarship recipients. Here are the proposed M2 internship topics for 2025:
M2 internship co-supervised by RTE and LAAS (Starting in early 2025): polynomial optimization for time-delayed power systems, supervised by Didier Henrion (LAAS), Victor Magron (LAAS), Patrick Panciatici (RTE) and Manuel Ruiz (RTE). Please see the following description.
M2 internship co-supervised by RTE and LAAS (Starting in early 2025): learning-based optimization for power systems, supervised by Milan Korda (LAAS), Victor Magron (LAAS), Balthazar Donon (RTE) and Patrick Panciatici (RTE). Please see the following description.
“Towards flexible and adaptive LLMs”, Chaire ANITI C3PO (PI: Rufin VANRULLEN, vanrullen@cnrs.fr) Transformers have become foundational in modern AI. However, their rigid layer-by-layer processing in a fixed order may limit their flexibility and adaptability to varying contexts. This internship project (part of the ANITI C3PO chair and an ERC Advanced grant GLoW) proposes a routing mechanism that dynamically recruits and combines Transformer layers based on input context. This routing mechanism, inspired by the way processes in the brain are dynamically recruited depending on relevance, could unlock previously untapped flexibility and efficiency in Transformer architectures. Central to this project is the residual stream (Elhage, 2021), which carries representations forward across layers, each successive layer applying its “transformation” to the stream. By dynamically selecting which layers update the residual stream, the model can adapt its computation to specific tasks, much like the brain’s ability to recruit relevant processes in real-time. This perspective aligns with the Global Workspace Theory (see VanRullen, 2021), which describes how a shared workspace integrates contributions from specialized “modules” or “experts”. (The approach is thus also related to Mixture of Experts models, where components are selectively activated based on the input). For this initial project, we will train a router to select and combine pretrained and frozen layers from a small open-source model (like Mistral 7B or Llama 2). When trained on the initial LLM pre-training dataset, we expect the router to recover the original model’s fixed layer ordering for most inputs; but it might also learn to deploy novel layer arrangements in certain scenarios, thereby revealing potential improvements over the standard Transformer architecture. Combined with mechanisms for learning when to stop processing (Graves, 2016), this novel router could lead to models that are more adaptive, resource-aware, and interpretable. References : Elhage, N., Nanda, N., Olsson, C., Henighan, T., Joseph, N., Mann, B., … & Olah, C. (2021). A mathematical framework for transformer circuits. Transformer Circuits Thread, 1(1), 12. Graves, A. (2016). Adaptive computation time for recurrent neural networks. arXiv preprint arXiv:1603.08983. VanRullen, R., & Kanai, R. (2021). Deep learning and the global workspace theory. Trends in Neurosciences, 44(9), 692-704.
“A deep learning module for brain decoding and encoding”, Chaire ANITI C3PO (PI: Rufin VANRULLEN, vanrullen@cnrs.fr) This internship aims to build an interface between brain activity (recorded with fMRI) and an existing multimodal deep-learning representation system. The work is part of the ERC GLoW project that aims to build multimodal deep-learning models inspired by a neuro-cognitive theory called “Global Workspace”. In brief, such models are trained to integrate and distribute information among networks of specialized modules (e.g. vision, language, audio…) through a central amodal global latent workspace. The specific goal of the internship will be to integrate a new module for brain activity data into our current implementation of the Global Workspace (GW) model. This module will be trained from a large-scale multimodal fMRI dataset (recorded by our team) of people watching images of naturalistic scenes and reading short textual descriptions of such scenes. So far, we have successfully trained GW models on large-scale vision and language datasets to perform tasks like image captioning or text-to-image generation. The addition of a brain activity module will allow us to perform numerous additional operations, such as prediction of brain activity for a given stimulus, or reconstruction of image and/or text stimuli from brain activity data. Our team is working with a shared codebase based on PyTorch. We are looking for candidates with Python programming experience and formal machine learning training. Neuroscience expertise is not a requirement.