Sanae Lotfi

New York University

Position: Ph.D. Candidate
Rising Stars year of participation: 2024
Bio

Sanae Lotfi is a PhD candidate at NYU, advised by Andrew Gordon Wilson. She works on the science of deep learning and focuses on understanding the generalization properties of deep neural networks using notions that relate to generalization such as model compression and loss surface analysis. Inspired by findings about generalization, Sanae works on building improved and robust deep learning models. Sanae’s PhD research has been recognized with numerous accolades including an ICML Outstanding Paper Award and is generously supported by the Microsoft and Google DeepMind Fellowships.

Areas of Research
  • Machine Learning
Understanding Generalization in Large Language Models through the Lens of Compression

Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular, we derive a compression bound for LLMs to bound their performance at the sequence-level and extend it to handle subsampling, accelerating bound computation by orders of magnitude on massive datasets. To achieve the extreme level of compression required for non-vacuous bounds, we devise SubLoRA, a simple low-dimensional nonlinear parameterization that leads to non-vacuous generalization bounds for models with nearly a billion parameters. We then use our bounds to understand LLM generalization and find that larger models have better generalization bounds and are more compressible than smaller models. When we extend our bounds to quantify generalization at the token-level instead of the sequence-level, we find that token-level bounds not only tolerate but actually benefit from far less restrictive compression schemes. With Monarch matrices, Kronecker factorizations, and post-training quantization, we achieve non-vacuous generalization bounds for LLMs as large as LLaMA2-70B. Our bounds teach us many lessons about the generalization properties of LLMs, highlighting the remarkable ability of transformer-based models in capturing longer range correlations and providing insights about memorization vs. reasoning in language models.