Internship Topics 2022-2023
Internship Topics 2022-2023
Internship Topics 2022-2023
A limited number of positions for a Master’s internship are available this year. These
internships should be for A MINIMUM OF 5-6 MONTHS. Please send your application by
email (CV + LM required) to syntheticlearner@gmail.com (unless other email specified).
Indicate in the email which topic(s) you are interested in.
1
Subject 1: Emergent Communication with graphs (Mathieu)
For now, EC experiments have mainly implemented the task of communicating about very
simple hand-designed objects (sequences of one-hot vectors) or images. Hand-designed
objects have the advantage of being easily controllable but are unfortunately not very complex ;
on the contrary, images are complex but hard to control.
The aim of the internship would be to study another set of objects - graphs - that could be both
controllable and increasingly complex. The role of the intern will be to:
- Identify different classes of graphs with increasing complexity and create simulated
datasets. The goal would be to find a controllable continuum of objects from sequences
of one-hot vectors to images.
- Benchmark different ways of encoding graphs as input vectors for neural nets (e.g. GNN
embeddings)
- Design a communication game with graphs (starting from an existing framework)
- Identify the theoretical solutions of the graph communication game (ie. the graph
linearization problem)
- Compare the emergent protocols to the theoretical solutions and real life graph
linearization systems (language, DNA, information theory encodings, etc.)
The intern should have a good knowledge of RL and DL (for simulations), basic knowledge on
information theory and optimization (for the theoretical part) and be interested in cognitive
science (for the lectures and grounding of the project).
References
2
- " LazImpa": Lazy and Impatient neural agents learn to communicate efficiently. M Rita, R
Chaabouni, E Dupoux. In Proceedings of CoNLL 2020 - https://arxiv.org/pdf/2010.01878
3
Subject 2: Supervised learning at the word level (Robin)
The Candidate must be in end-of-study internship (no first year master) in machine learning and
computer sciences
Supervision in machine learning is a paradigm that requires labelled datasets which is often the
result of substantial human time and efforts. For that reason, unsupervised or self-supervised
methods are becoming increasingly used in several areas of machine learning: vision, text, and
very recently speech. For instance, contrastive predictive coding or Wav2vec2.0 have been
used to discover speech representations without supervision [1,2]. These models can embed
fixed duration (usually 10ms) of speech into a vector but cannot represent variable-length
speech sequences. These latter representations can be very useful in a variety of tasks ranging
from information retrieval to speech segmentation into words (i.e can you find word boundaries
in an audio recording without label nor prior knowledge of the language?).
The aim of this internship is to search for new and more robust methods to build variable length
speech embeddings. You will implement new loss functions and regularization schemes in a
pre-existing deep learning model to improve its performance. The resulting model will be used
as input to a speech segmentation model and hopefully improve the current state-of-the-art in
that domain.
[1] Aaron van den Oord, Yazhe Li, Oriol Vinyals (2019). Representation Learning with Contrastive
Predictive Coding https://arxiv.org/pdf/1807.03748.pdf
[2] https://arxiv.org/abs/2006.11477
[3] https://arxiv.org/abs/2007.13542
4
Subject 3: Phylogeny of communication
(with Emmanuel Chemla ENS LSCP and Robin Ryder Paris Dauphine)
There exist large databases of animal calls and shrieks. We have consolidated our own
database of primate calls with their sounds and meanings, and there are many such databases
for birds. With modern tools, we can mine these databases to make inferences about how the
communication systems of these species have evolved: what led to the formation of the first
communicative sounds, what did these sounds sound like and what did they mean (based on
what their modern “descendants” mean), how were they passed or lost from one generation to
the next, etc. Standard historical linguistics can trace and reconstruct the history of words or
language sounds across periods that span a thousand years, the current project aims at doing
so across millions of years.
Machine learning tools, either related to speech (to encode the animal sounds into manipulable
objects) or simply to perform reconstruction inferences are key to address these questions at a
large scale. The goal of this project will be to exhibit the most likely history of animal calls, with
their meanings and their sounds. The tasks will be (i) to improve on our encoding of the animal
sound signals into manipulable objects for inference and reconstruction (low dimensional
vectors) and/or (ii) perform inferences on the resulting objects. The project will lead to the
generation of likely animal sounds from millions of years ago.
5
Subject 4: Prospects in NeuroAI, aligning Brains and Nets
Modern artificial neural networks are now very good at human tasks. To the point that some
researchers believe that artificial networks may become good models of how actual brains work.
As an illustration, Eickenberg et al. (2017) show that during a task of object recognition, brain
activations as collected through fMRI and artificial neural networks are well-aligned. Technically,
it means that brain fMRI data are well predicted from corresponding artificial neural network
activations. Similar results have been obtained in other domains, such as speech and music
perception (Kell et al, 2018), semantic and syntactic processing (Caucheteux et al., 2021;
Pasquiou et al., 2022) and even to the improvement of AI models by better aligning them with
brain data (Toneva et al., 2019).
This will be a one semester internship, hosted at the LSCP lab from Ecole Normale Supérieure.
Contingent on funding, the internship may be followed by a PhD.
Prerequisites. A good understanding of basic concepts in machine learning and deep learning
and a strong mastery of Python is essential for the project, and it will help to advance fast.
Technical familiarity with cloud computing is recommended, and also the ability to understand
and use existing code, and to create new ones to design, train and analyze neural networks
with, e.g., PyTorch. Math/stats background and familiarity with imagery are a bonus. Interest in
neuroscience and artificial intelligence, sure!
References:
Caucheteux, C., Gramfort, A., & King, J. R. (2021). Disentangling syntax and semantics in the brain with deep
networks. ICML.
6
Eickenberg, M., Gramfort, A., Varoquaux, G., & Thirion, B. (2017). Seeing it all: Convolutional network layers map the
function of the human visual system. NeuroImage.
Kell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V., & McDermott, J. H. (2018). A task-optimized neural
network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy.
Neuron.
Pasquiou, A., Lakretz, Y., Hale, J., Thirion, B., & Pallier, C. (2022). Neural Language Models are not Born Equal to Fit
Brain Data, but Training Helps. ICML 2022.
Toneva, M., & Wehbe, L. (2019). Interpreting and improving natural-language processing (in machines) with natural
language-processing (in the brain). Advances in Neural Information Processing Systems, 32.
7
Subject 5: Internship: NLP, cognitive science, or psychology Profile
Internship description:
Most research on early language acquisition has documented input and learning cues in only a
handful of cultures, the assumption being that the mechanisms postulated to explain acquisition
in these cultures are universal. However, there is too little research on some languages.
The team now has data from many languages, including some that are uncommonly studied
(such as Tsimane', Yélî Dnye, and many others). We now want to analyze these corpora:
- systematize and clean the data
- for corpora of texts, generate orthographic and phonological dictionaries
- for corpora of speech and texts, generate structured alignments
- where possible, generate tests at different linguistic levels as in the
- ZR Speech Challenge
- create an Android keyboard for low-resource languages
- analyze data from our citizen science project Zooniverse
- create (semi-)supervised classifiers (RNN, CNN, transform) to describe children's
language development
- apply whisper and other automatic speech processing tools on child data
Internship’s objectives:
- Develop speech and text tools for low-resource languages
- Join an interdisciplinary team
- Learn about open and cumulative science (a response to the replication crisis)
- Experience life in the Lab
- Exposure to research methods in experimental psychology and language science
Job location: LSCP - 29, rue d’Ulm - 75005 Paris; teleworking possible
Required profile:
We are looking for full-time interns, minimum 2 months, with the following profile:
❏ Organized, autonomous, rigorous;
❏ Knowledge of Python;
❏ Bachelor's or Master's degree in computational linguistics, data science,
cognitive science, psychology etc.
Application:
Submit a motivation letter and a CV to laac.lscp@gmail.com.
8
Subject 6: Spoken Language Modeling with Soft Speech Units
(TuAnh)
Pre-training language models (BERT, GPT) on large-scale text data have achieved tremendous
success and have become a standard in Natural Language Processing (NLP). Lately, language
models have also been successfully applied to other modality such as music (Jukebox) or
image (Parti). For speech, several works have introduced the task of spoken language
modeling, the learning of a language unsupervisedly from raw audio without any text labels
(gSLM [1, 2]). These works rely on transforming the audio into a sequence of discrete speech
units and training a language model on these speech units.
Recently, [3] investigated the importance of using these discrete speech units in training spoken
language models, they found that using discrete units are indeed important for spoken language
modeling as these units disentangle linguistic information from speaker information in the audio.
In a similar vein, [4] analyzed the use of soft speech units for the speech synthesis task, they
found that soft speech units contain more information than “one-hot” speech units and therefore
provide better speech synthesis. The objective of this internship is thus to study the effect of soft
speech units for the spoken language modeling task.
The intern should have good deep learning skills, and an interest in NLP.
References
[1] Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei
Baevski, Ewan Dunbar, Emmanuel Dupoux. The Zero Resource Speech Benchmark 2021: Metrics and
baselines for unsupervised spoken language modeling.
[2] Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu Anh
Nguyen, Jade Copet, Alexei Baevski, Abdelrahman Mohamed, Emmanuel Dupoux. On Generative
Spoken Language Modeling from Raw Audio.
[3] Tu Anh Nguyen, Benoit Sagot, Emmanuel Dupoux. Are discrete units necessary for Spoken Language
Modeling?
[4] Benjamin van Niekerk, Marc-André Carbonneau, Julian Zaïdi, Mathew Baas, Hugo Seuté, Herman
Kamper. A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion.
9
Subject 7: Multilevel U-statistics with applications to the evaluation
of representation learning algorithms (Thomas)
Level: M1 or M2 internship
Duration: 3 to 6 months
Topic:
10