Divya Shanmugam
Cornell Tech
divyas@mit.edu
Bio
Divya Shanmugam is a postdoctoral researcher at Cornell Tech. Her research builds tools for model development and deployment in the presence of imperfect data, motivated by challenges in healthcare. Her work has been published at leading ML venues including NeurIPS, CVPR, and CHI; has been recognized by awards from machine learning for health venues; and has been featured in both policy guidelines (e.g., Norway’s National AI Guidelines) and general-interest media (e.g., The New York Times). She earned her Ph.D. and B.S. in Computer Science from MIT.
Areas of Research
- AI for Healthcare and Life Sciences
Learning reliable models from unreliable data
Machine learning promises to transform decision-making in healthcare, yet real-world datasets rarely meet the assumptions necessary to achieve that promise. My work develops tools for model development and deployment in the presence of imperfect data  including noisy supervision, biased historical decisions, and incomplete ground truth  motivated by challenges in healthcare. My research agenda thus centers on resolving the gap between the reality of health data and the reliability required of decision-making in high-stakes settings.
To bridge this gap, I study how data constraints shape model reliability and equity, and develop methods to audit and mitigate these effects. I do so in three ways. First, I develop methods to audit and improve data quality by explicitly modeling the human and systemic factors embedded in healthcare datasets, including underreporting, differential access to care, and financial incentives. These behavioral models identify subtle ways that disparities propagate through machine learning systems. Second, I develop methods to improve robustness post-hoc by augmenting the data used in evaluation and deployment. I challenge the fields overreliance on curated benchmarks by developing alternative techniques that more faithfully capture real-world performance. Finally, my work targets applications constrained by imperfect data, including those in womens health. By advancing methods for learning from imperfect data in these contexts, my work aims to serve the patients and conditions historically overlooked in both medicine and machine learning.
