### Neha Gupta

Stanford University

nehagupta@cs.stanford.edu

###### Bio

Neha Gupta is a Ph.D. student in Computer Science at Stanford University co-advised by Gregory Valiant and Moses Charikar. She is broadly interested in problems at the intersection of theoretical computer science and machine learning. She is also a recipient of the JP Morgan AI Fellowship. Before that she obtained her Masters from Stanford University and her Bachelors from Indian Institute of Technology Delhi.

###### Theoretical approaches to machine learning challenges

Theoretical approaches to machine learning challenges

I am interested in the theoretical study of machine learning algorithms. Despite the empirical successes of machine learning there is little understanding about the workings and the failure modes of these models. My research focuses on discovering the underlying structure in data and algorithms which facilitate learning and using those insights to develop new methods which are provably sample and computation efficient. Some of my work has focused on understanding why deep neural networks generalize to unseen data despite having a larger number of parameters than available data points to learn them. We characterized the simple behavior of the functions learned by neural networks in certain settings which in turn can be shown to generalize well on unseen data. In another recent work we designed algorithms for provably learning decision trees inspired from the top-down heuristics commonly used in practice. The key idea was to use a small subset of features to determine the splitting criterion at each node of the decision tree instead of a single one as is currently used by practitioners. Additionally learning methods especially deep neural networks require loads of labelled training data to learn meaningful predictive models. In many domains such labeled data comes as a result of performing extensive experiments thus restricting the amount of data one can receive. Given limited labelled data one is often interested in knowing what useful conclusions can we reliably make. In recent works we show that given a particular test point along with a batch of unlabelled data set one needs much fewer label queries (independent of the complexity of the function class) to make a good prediction on this test point as compared to learning the optimal function for certain classes of functions.