Sara Alspaugh

UC Berkeley

Position: PhD Candidate
Rising Stars year of participation: 2015
Bio

Sara Alspaugh is a computer scientist and PhD candidate at the UC Berkeley. In her research, she mines user interaction records logged from data analysis tools to better characterize data exploration behavior, identify challenges and opportunities for automation, and improve system and interface design. She also conducts qualitative research through interview studies with expert analysts and usability evaluations of data exploration tools; and has prototyped new interfaces to help users get an overview of their data. More broadly, her research interests include data science, data mining, visualization, and user interaction with data analysis tools. She is a member of the AMPLab and is advised by Randy Katz and Marti Hearst. She received her MS in Computer Science from UC Berkeley in 2012 and her BA in Computer Science from the University of Virginia in 2009. She is the recipient of an NSF Graduate Fellowship, a Tableau Fellowship, and a Department Chair scholarship.

Characterizing Data Exploration Behavior to Identify Opportunities for Automation

Characterizing Data Exploration Behavior to Identify Opportunities for Automation

Exploratory analysis is undertaken to familiarize oneself with a dataset. Despite being a necessary part of any analysis, it remains a nebulous art defined by an attitude and a collection of techniques, rather than a systematic methodology. It typically involves manually making hundreds to thousands of individual function calls or small interactions with a GUI in order to obtain different views of the data. It is not always clear which views will be effective for a given dataset or question, how to be systematic about which views to examine, or how to map a high-level question into a series of low-level actions to answer it. This results in unnecessary repetition, disrupted mental flow, ad hoc and hard-to-repeat workflows, and inconsistent exploratory coverage. Identifying useful, repeatable exploration workflows, opportunities for automation of tedious tasks, and intelligent interfaces better suited for expressing exploratory questions, all require a better understanding of data exploration behavior. We seek this through three means:

We analyze interaction records logged from data analysis tools–to identify behavioral patterns and assess the utility of log data for building intelligent assistance and recommendation algorithms that learn from user behavior. Preliminary results reveal that while logs can say which functions are used in which contexts, more comprehensive instrumentation and collection is likely needed to train intelligent exploration assistants.

We interview experts about their data exploration habits and frustrations–to identify good exploratory workflows and ascertain important features not provided by existing tools. Preliminary results reveal opportunities to make data exploration more thorough and efficient.

We design and evaluate a prototype for obtaining quick data overviews–to assess new interface elements designed to better match data exploration needs. Preliminary results suggest that small simple automation in existing tools would decrease user effort, increase exploratory coverage, and help users identify erroneous assumptions more readily.