Varada Kolhatkar
Privacy Analytics Inc
varada.kolhatkar@gmail.com
Bio
Varada Kolhatkar’s broad research area in the past eight years has been natural language processing and computational linguistics. She recently completed her Ph.D. in computational linguistics from the university of Toronto. Her advisor was Dr. Graeme Hirst. Prior to that, she did her Master’s with Dr. Ted Pedersen at the University of Minnesota Duluth. During her Ph.D. she focused primarily on the problem of anaphora resolution. Her Master’s thesis explores all-words-sense disambiguation, showing the effect of polysemy, context window size, and sense frequency on disambiguation. At the end of her Ph.D., Varada spent four months at the University of Hamburg, Germany, where she worked with Dr. Heike Zinsmeister on non-nominal anaphora resolution. Currently, Varada is working as a research analyst at a company called Privacy Analytics Inc, where she focuses on the problem of text de-identification, i.e., the process used to protect against inappropriate disclosure of personal information in unstructured data.
Resolving Shell Nouns
Resolving Shell Nouns
Shell nouns are abstract nouns, such as ‘fact’, ‘issue’, ‘idea’, and ‘problem’, which, among other functions, facilitate efficiency by avoiding repetition of long stretches of text. Shell nouns encapsulate propositional content, and the process of identifying this content is referred to as shell noun resolution.
My research presents the first computational work on resolving shell nouns. The research is guided by three primary questions: first, how an automated process can determine the interpretation of shell nouns; second, the extent to which knowledge derived from the linguistics literature can help in this process; and third, the extent to which speakers of English are able to interpret shell nouns.
I start with a pilot study to annotate and resolve occurrences of ‘this issue’ in the Medline abstracts. The results illustrate the feasibility of annotating and resolving shell nouns, at least in this closed domain. Next, I move to developing general algorithms to resolve a variety of shell nouns in the newswire domain. The primary challenge was that each shell noun has its own idiosyncrasies and there was no annotated data available. I developed a number of computational methods for resolving shell nouns that do not rely on manually annotated data. For evaluation, I developed annotated corpora for shell nouns and their content using crowdsourcing. The annotation results showed that the annotators agreed to a large extent on the shell content. The evaluation of resolution methods showed that knowledge derived from the linguistics literature helps in the process of shell noun resolution, at least for shell nouns with strict semantic and syntactic expectations.