Aishwarya Agrawal

Georgia Institute of Technology

Position: PhD Candidate
Rising Stars year of participation: 2018
Bio

Aishwarya Agrawal is a fifth-year PhD candidate in the School of Interactive Computing at Georgia Tech, advised by Dhruv Batra. She received a bachelor’s degree in electrical engineering with a minor in computer science and engineering from the Indian Institute of Technology (IIT) Gandhinagar in 2014. Her research interests lie at the intersection of computer vision, machine learning, and natural language processing, with a focus on developing artificial intelligence (AI) systems that that can “see” (i.e., understand the contents of an image: who, what, where, doing what?) and “talk” (i.e., communicate the understanding to humans in free-form natural language). She is a recipient of the NVIDIA Graduate Fellowship for 2018-2019.

Towards Intelligent Vision and Language Systems

Towards Intelligent Vision and Language Systems
My research goal is to develop artificial intelligence (AI) systems that can “see” (i.e., understand the contents of an image: who, what, where, doing what?) and “talk” (i.e., communicate the understanding to humans in free-form natural language). Applications of such vision and language systems include: 1) Aiding visually impaired users in understanding their surroundings (Human: “What is on the shelf above the microwave?” AI: “Canned containers.”), 2) Aiding analysts in making decisions based on large quantities of surveillance data (Human: “What kind of car did the man in red shirt leave in?” AI: “Blue Toyota Prius.”), 3) Teaching children through interactive demos. (Kid: “What animal is that?” AI: “That is a Dall sheep. You can find those in Alaska.”), 4) Interacting with personal AI assistants (such as Alexa or Siri) (Human: “Is my laptop in my bedroom upstairs?” AI: “Yes. Human: “Is the charger plugged in?”), and 5) Making visual social media content more accessible (AI: “Your friend Bob just uploaded a picture from his Hawaii trip.” Human: “Great, is he at the beach?” AI: “No, on a mountain”). As a first step towards making intelligent vision and language systems in my PhD research so far, I have worked on building datasets, models, and evaluation protocols for answering free-form and open-ended natural language questions about images. I will talk about the Visual Question Answering (VQA) task dataset, baseline neural models (Antol et al. ICCV15), various VQA challenges we have organized till now, and findings from the challenges, our findings from analyzing the behavior of VQA models (Agrawal et al. EMNLP16), and how we propose to build VQA models that are more visually grounded and can better deal with VQA under changing priors (Agrawal et al. CVPR18).