Belinda Zou Li
MIT
bzl@mit.edu
Bio
Belinda is a PhD candidate at MIT CSAIL, affiliated with the language & intelligence (LINGO) lab @ MIT. Belinda’s work focuses on improving the human-interpretability, reliability, and usability of language models through examining and utilizing models of the world and of users in language models. Belinda is funded by an NDSEG Fellowship and Clare Boothe Luce Graduate Fellowship.
Areas of Research
- Natural Language and Speech Processing
Eliciting Human Preferences with Language Models
Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts. But selecting examples or writing prompts can be challenging–especially in tasks that involve unusual edge cases, demand precise articulation of nebulous preferences, or require an accurate mental model of LM behavior. I will begin by discussing a set of experiments focusing on how well LMs themselves are able to ask questions to elicit useful information in dialogue with humans. Next, I will introduce an approach that uses Bayesian optimal experimental design to address areas of preference elicitation that LMs struggle with: quantifying uncertainty, modeling human mental states, and computing formal notions of information gain. I will conclude by discussing some high-level future directions towards building LMs that are compatible with humans, including ongoing work aimed at formalizing information seeking in the presence of uncertainty, and characterizing LM representations of beliefs.