Veronika Strnadova-Neeley
UC Santa Barbara
veronika@cs.ucsb.edu
Bio
I am a Ph.D. Candidate with a Computational Science and Engineering emphasis at UC Santa Barbara, working with adviser John R. Gilbert in the Combinatorial Scientific Computing Lab. For the past few years I have been collaborating with researchers at Lawrence Berkeley National Lab, UC Berkeley and the Joint Genome Institute to design scalable algorithms for genetic mapping. Broadly, my research interests include scalable clustering algorithms, bioinformatics, graph algorithms, linear algebra and scientific computing. I completed my BS in applied mathematics at the University of New Mexico.
Efficient Clustering and Data Reduction Methods for Large-Scale Structured Data
Efficient Clustering and Data Reduction Methods for Large-Scale Structured Data
The necessity for efficient algorithms in large-scale data analysis has become clear in the past few years, as unprecedented scales of information have become available in a variety of domains, from bioinformatics to social networks to signal processing. In many cases, it is no longer sufficient to use even quadratic-time algorithms for such data, and much of recent computer science research has focused on developing efficient methods to analyze vast amounts of information. My contribution to this line of research focuses on new algorithms for large-scale clustering and data reduction, by exploiting inherent low-dimensional structure to overcome the challenges of significant amounts of missing and erroneous entries. In particular, over the past few years, together with collaborators from Lawrence Berkeley National Lab, UC Santa Barbara, UC Berkeley, and the Joint Genome Institute, I have developed a fast algorithm for the linkage-group finding phase of genetic mapping, as well as a novel data reduction method for analyzing genetic mapping data. The efficiency of these algorithms has helped to produce accurate maps for large, complicated genomes, such as wheat, by relying on assumptions on the underlying ordered structure of the data. The efficiency and accuracy of these methods suggests that in order to further advance state-of-the-art clustering and data reduction methods, we should be looking closer at the structure of the data from a given application of interest. Assumptions on this structure may lead to much faster algorithms without losing much in terms of solution quality, even with high amounts of missing or erroneous data entries. In ongoing and future research, I will explore algorithmic techniques which exploit inherent data structure for faster dimensionality reduction methods in order to identify important and meaningful features of the data.