MIT Rising Stars

Divya Shanmugam

Cornell Tech

Position: Postdoctoral Research Associate

Rising Stars year of participation: 2025

Contact:

divyas@mit.edu

Bio

Divya Shanmugam is a postdoctoral researcher at Cornell Tech. Her research builds tools for model development and deployment in the presence of imperfect data, motivated by challenges in healthcare. Her work has been published at leading ML venues including NeurIPS, CVPR, and CHI; has been recognized by awards from machine learning for health venues; and has been featured in both policy guidelines (e.g., Norway’s National AI Guidelines) and general-interest media (e.g., The New York Times). She earned her Ph.D. and B.S. in Computer Science from MIT.

Areas of Research

AI for Healthcare and Life Sciences

Learning reliable models from unreliable data

Machine learning promises to transform decision-making in healthcare, yet real-world datasets rarely meet the assumptions necessary to achieve that promise. My work develops tools for model development and deployment in the presence of imperfect data including noisy supervision, biased historical decisions, and incomplete ground truth motivated by challenges in healthcare. My research agenda thus centers on resolving the gap between the reality of health data and the reliability required of decision-making in high-stakes settings.

To bridge this gap, I study how data constraints shape model reliability and equity, and develop methods to audit and mitigate these effects. I do so in three ways. First, I develop methods to audit and improve data quality by explicitly modeling the human and systemic factors embedded in healthcare datasets, including underreporting, differential access to care, and financial incentives. These behavioral models identify subtle ways that disparities propagate through machine learning systems. Second, I develop methods to improve robustness post-hoc by augmenting the data used in evaluation and deployment. I challenge the fields overreliance on curated benchmarks by developing alternative techniques that more faithfully capture real-world performance. Finally, my work targets applications constrained by imperfect data, including those in womens health. By advancing methods for learning from imperfect data in these contexts, my work aims to serve the patients and conditions historically overlooked in both medicine and machine learning.

Shreya Shankar

University of California, Berkeley

Agentic Query Optimization for Unstructured Data Processing

Pratyusha Sharma

MIT

Understanding and designing structured representations: From LLMs to whales

Divya Shanmugam

Bio

Areas of Research

Learning reliable models from unreliable data

Previous

Shreya Shankar

Next

Pratyusha Sharma