Laixi Shi
California Institute of Technology
shilaixi@gmail.com
Bio
I am currently a postdoctoral fellow in the Dept. of Computing + Mathematical Sciences (CMS) at California Institute of Technology, hosted by Prof. Adam Wierman and Prof. Eric Mazumdar. My research interests focus on exploring new frontiers of data-driven solutions in the real-world, establishing theoretical foundations and developing faithful practical solutions. Previously, I obtained my Ph.D in Electrical and Computer Engineering at Carnegie Mellon University in 2023, supervised by Prof. Yuejie Chi. I received my bachelor’s degree from Electronic Engineering, Tsinghua University in 2018. I have been fortunate to receive 4 Rising Star awards in machine learning, (computational) data science, and signal processing, and 2 Ph.D. Presidential Fellowships. My Ph.D. thesis won the CMU ECE A.G. Milnes Award (2024).
Areas of Research
- Machine Learning
Data-efficient and robust sequential decision making
Reinforcement learning (RL), which strives to learn desirable sequential decisions based on trial-and-error interactions with an unknown environment, has achieved remarkable success recently in a variety of domains including games and large language model alignment. While standard RL has been heavily investigated recently, a policy learned in an ideal, nominal environment might fail catastrophically when the deployed environment is subject to small changes in task objectives or adversarial perturbations, especially in high-stake applications such as robotics and clinical trials. This talk concerns the central issues of sample efficiency and model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice. We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimizes the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP. Despite recent efforts, the sample complexity of RMDPs remained mostly unsettled regardless of the uncertainty set in use. It was unclear if distributional robustness bears any statistical consequences when benchmarked against standard RL. Somewhat surprisingly, our results uncover that RMDPs are not necessarily easier or harder to learn than standard MDPs. The statistical consequence incurred by the robustness requirement depends heavily on the size and shape of the uncertainty set. In addition, we break down the sample barrier of robust RL in offline setting by providing the first provable near-optimal algorithm for offline robust RL that can learn under simultaneous model uncertainty and limited historical datasets.