Nov
Joint seminar in General Linguistics and English Linguistics: Tom Södahl, University of Gothenburg
Don't Mention the Norm - On Reporting Bias and Social Bias in Humans and Language Models
Human language is underspecified. In our utterances, we leave out information that we consider unimportant, inferrable from context or simply too obvious to warrant mentioning. For example, while most people agree that bananas are typically yellow, the bigram "green banana" tends to be a lot more frequent than "yellow banana" in text. This leads to systematic discrepancies between description and reality known as reporting bias. In the context of language modeling, reporting bias in the training data has been shown to affect model predictions regarding common-sense knowledge (such as the expected color of bananas). However, reporting bias as a phenomenon is not limited to descriptions of common objects; it also applies to descriptions of people. In this talk, I will discuss how reporting bias relates to and interacts with social biases in descriptions of individuals, and how studying this relationship can illuminate implicit biases and social norms both within speaker communities and language models. I will present findings on reporting bias with regards to marginalized group attributes in corpora as well as in large pretrained language models. I will talk about the challenges of studying the relationship between language and the world, and suggest some avenues for future research.
About the event:
Location: H402, virtually: https://lu-se.zoom.us/j/63263453894
Contact: Sandra.Debreslioskaling.luse