Kaitlyn Zhou

Stanford University

Position: Ph.D. Candidate
Rising Stars year of participation: 2024
Bio

Kaitlyn Zhou is a final-year PhD in computer science at Stanford University, advised by Dan Jurafsky. Her research focuses on investigating the unintended consequences of the appropriation of natural language by language models. Her work delves into the fairness implications of the evaluation of natural language generation, the linguistic miscalibration displayed by language models, and the misplaced overconfidence of publicly deployed chatbots. Kaitlyn has previously spent summers at Microsoft Research and the Allen Institute for Artificial Intelligence. She has secured funding from IBM and Microsoft and is funded by the Stanford Graduate Fellowship. Her research has gained recognition in prominent publications like The New York Times and the Wall Street Journal. In 2018, Kaitlyn was appointed by Washington State Governor Jay Inslee to the University of Washington Board of Regents.

Areas of Research
  • Natural Language and Speech Processing
Augmenting Human Intelligence via Natural Language Interfaces

The overarching goal of my research is to augment human intelligence via natural language interfaces. I aim to bring us closer to a world where users across settings and technical literacies can access artificial intelligence to safely accomplish complex tasks. Natural language as an interface makes this possible and my work is to align natural language interfaces with human expectations to ensure safe and effective real-world interactions. Situated at the intersection of natural language processing and human-computer interactions, my work appraises the interaction safety risks of state-of-the-art language models, pinpoints the current origins of language misalignment, and presents interaction-based LM evaluation of natural language as an interface. The greatest safety risk of current state-of-the-art models is not that they hallucinate or produce false information, but that they do so without appropriately signaling the risks and limitations to downstream users. My research reveals that language models, including those fine-tuned with reinforcement learning from human feedback (RLHF), often provide inaccurate information with undue confidence, leading to potentially devastating consequences of human overreliance. My work delves into the origins of this overconfidence, uncovering that it typically arises during the RLHF alignment stage due to human biases against uncertainty, indicating a need for systematic evaluation and better annotation practices. To address these challenges, I developed an interaction-based evaluation framework that assesses not just the quality of language but also how language triggers user reliance behaviors in decision-making. Using this framework, I highlight the limitations of current calibration evaluations and how interactional contexts such as model presentation, subject matter, and interaction history can significantly impact human reliance behaviors. In the long term, I hope to apply these findings and use human-centered research methods towards the design of LLMs and reimagine safe and effective human-LM interactions for a broad and diverse audience of users.