Simran Khanuja

Carnegie Mellon University

Position: PhD Candidate
Rising Stars year of participation: 2025
Bio

Simran Khanuja is a PhD student at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University since August 2022. Her research focuses on expanding the capabilities of multimodal systems to serve a wide range of users across languages and cultures, with applications in localization, information access, conversational AI, education, and assistive technologies. Previously, she was a Pre-Doctoral Researcher at Google Research and worked at Microsoft Research. She has made contributions towards advancing under-represented languages in NLP and her work has been published at top NLP conferences like ACL and EMNLP, including best paper awards at EMNLP 2024, IEEE BigData 2024, and SLT 2022. She is also a recipient of the Waibel Presidential Fellowship for 2024-25 and has been recognized as a Rising Star in AI by the University of Michigan.

Areas of Research
  • Natural Language and Speech Processing
Towards Culturally Inclusive Multimodal Systems

Current machine translation systems translate words, but yet fail to truly bridge cultures. In today’s digital ecosystem, content is multimodal, demanding not just linguistic adaptation but also visual and cultural alignment. My research operationalizes the extension of translation beyond language, addressing all modalities in content, starting with the visual modality.

I first developed test sets and robust evaluation tools to measure how well models localize images for diverse cultural contexts. Recognizing both the scarcity of data and limitations of current models in cultural localization, I then designed platforms that empower translators to adapt multimodal content. These platforms harness state-of-the-art LLMs and diffusion models, making the platform accessible through simple text prompts. In parallel, I have worked on advancing end-to-end multimodal models, enhancing their cultural understanding and reasoning capabilities.

My overarching goal is to build systems that can both generate and interpret culturally aligned multimodal content, while working in tandem with human experts whose contextual knowledge remains essential to make them work in the real world across varied domains and tasks.