Yixin Wu

CISPA Helmholtz Center for Information Security

Position: PhD Candidate
Rising Stars year of participation: 2025
Bio

Yating Wu is a Ph.D. student in the Department of Electrical and Computer Engineering at the University of Texas at Austin, co-advised by Prof. Jessy Li and Prof. Alex Dimakis. Her research lies in natural language processing and computational linguistics, with a focus on using questions to help language models better evaluate, organize, and generate text. She studies discourse relationships such as Questions Under Discussion and their applications to improving text accessibility through elaboration. Yating’s recent research experiences include continuous pre-training, fine-tuning, and evaluating large language models in scientific domains, as well as improving the memory and abilities of LLM agents. She received an Outstanding Paper Award at EMNLP 2024. Prior to UT Austin, she obtained a B.E. degree in Computer Science and a B.A. degree in Japanese from Dalian University of Technology.

Areas of Research
  • Artificial Intelligence
Emerging Data-Centric Risks Across the AI Lifecycle

Large language Models (LLMs) have changed how we generate and interact with text, while it’s still challenging problem to evaluate, adapt, and control LLMs. My research develops question-based representations as a way to guide evaluation, data selection, and supervision in language models. One part of my work focuses on evaluation. Standard metrics often capture local accuracy but miss higher-level qualities like coherence and factual consistency. I design methods that use questions to represent how ideas are structured, making it possible to measure discourse-level organization and text quality in a more interpretable way. I also work on improving data quality. Training on massive raw corpora means models often learn from noisy or irrelevant signals. Question-based representations help highlight the most informative parts of text and organize document-level signals, improving training, generalization, and robustness. Finally, I study how to make supervision more flexible. Instead of relying on rigid discourse trees, I build parsers that represent dependencies through questions, offering a more scalable and adaptable way to guide model behavior for tasks like generation and data construction. Looking forward, I aim to extend these ideas to long context models for compression, memory, and reasoning. My long term goal is to make LLMs more interpretable, efficient, and adaptable, and to apply these methods in domains such as education, accessibility, and scientific communication.