Projects

MapAIE/CHAI
Since 2023
Mapping and exploring ethics of AI
Tiphaine Viard, Mélanie Gornet, Simon Delarue, Maria Boritchev, Matthieu Labeau, Aina Garì Soler
We collect and analyze charters, documents and manifestos about "ethical AI", and use them as a defining corpus for AI ethics topics of interest. More information here: https://mapaie.telecom-paris.fr/.
Relevant publications:
ANAGRAM
Since 2024
AMR annotation of the DinG corpus
Jeongwoo Kang, Maria Boritchev, Maximin Coavoux
We are annotating a semantic corpus in French. We annotate the DinG corpus of transcriptions of spontaneous dialogues between players of Catan, in Abstract Meaning Representation (AMR), one of the most popular meaning representation frameworks. Since AMR does not cover some features of dialogue dynamics, we extend the framework to better represent spoken language as well as sentence structures specific to French. In addition, we provide an annotation guide detailing these extensions. Finally, we publish our corpus under a Creative Commons license (CC-SA-BY). Our work contributes to the development of semantic resources for French dialogue. We collected and transcribed 10 recordings of people playing the boardgame Catan, in French. We share the transcribed and anonymised data. We use this data as support for studying questions and answers in spontaneous French oral multilogues.
Relevant publications:
  • To be presented at TALN 2025: ding-01 :ARG0 -- un corpus AMR pour le français parlé spontané, Jeongwoo Kang, Maria Boritchev and Maximin Coavoux.
DinG
Since 2019
Dialogues in Games corpus
Maria Boritchev, Maxime Amblard
We collected and transcribed 10 recordings of people playing the boardgame Catan, in French. We share the transcribed and anonymised data. We use this data as support for studying questions and answers in spontaneous French oral multilogues. More information here and here: https://team.inria.fr/semagramme/projects/corpus-ding/
Relevant publications:
AMR w/ Orange
Since 2023
Error analysis, questions, and AMR
Maria Boritchev, Johannes Heinecke, Frédéric Herlédan
We conduct an error analysis of AMR parsers' outputs in order to characterise these errors and come up with pre- and post- processing heuristics to patch these. We work on data in English, French, German, Spanish, Italian, and Polish.
We analyse how adding few sentence-type specific annotations can steer the model to improve parsing in the case of questions in English.
Relevant publications: