mar
English language and linguistics research seminar: Desmond Elliott, University of Copenhagen: Multimodal Machine Learning across Languages and Cultures
Abstract: Evaluations are crucial for measuring progress in machine learning, but datasets typically reflect Western-centric perspectives. I will argue in this talk that we need datasets that better reflect a variety of global perspectives, especially given the rise and widespread use of multimodal models in frontier generative AI systems. I will present three projects on collecting in-language multimodal data: the first part of the talk will describe the collection of culturally relevant multimodal data across five geographically and linguistically diverse languages. The second part of the talk concerns the collection of fine-grained private multimodal data for Chinese cuisine. In the final part of the talk; I will discuss late-breaking results of a project to collect In-language Exams for Massively Multilingual Vision Evaluation as part of a large-scale open science network.
Bio: Desmond is an Associate Professor and a Villum Young Investigator at the University of Copenhagen. His group currently focuses on tokenization-free language modelling, and multilingual and multimodal processing. His work received the Best Long Paper Award at EMNLP 2021 and an Area Chair Favourite paper at COLING 2018. His research is funded by the Velux Foundations, the Innovation Foundation Denmark, the Novo Nordisk Foundation, Meta, Google, and the Danish Ministry for Digitization.