Zhou Yu

Carnegie Mellon University

Position: PhD student
Rising Stars year of participation: 2015
Bio

Zhou is a fifth-year Ph.D. student in the Language Technology Institute, School of Computer Science, Carnegie Mellon University, where she works with Prof. Alan Black and Prof. Alex Rudnicky. Zhou creates end-to-end interactive conversational systems that are aware of their physical situation and their human partners via real time multimodal sensing and machine learning techniques. Zhou holds a B.S. in computer science and a B.A. in English language with linguistic focus from Zhejiang University in 2011. Zhou also interned at Microsoft Research with Eric Horvitz and Dan Bohus, at Education Testing Service with David Suendermann-Oeft, and at Institute for Creative Technologies in USC with Louis-Philippe Morency. Zhou is also a receiver of the Quality of Life Fellowship.

Engagement in Multimodal Interactive Conversational Systems

Engagement in Multimodal Interactive Conversational Systems

Autonomous conversational systems, such as Apple Siri, Google Now, Microsoft Cortana, etc. act as personal assistants who set alarms, mark events on calendars, etc. Some systems provide restaurant or transportation information to users. Despite the capability of completing these simple tasks through conversations, they still act according to pre-defined task structures and do not sense or react to their human interacts’ nonverbal behaviors or internal states such as the level of engagement. This problem can also be found in other interactive systems.

Drawing knowledge from human-human communication dynamics, I use multimodal sensors and computational methods to understand and model user behaviors when interacting with a system that has conversational abilities (e.g. spoken dialog systems, virtual avatars, humanoid robots). By modeling the verbal and nonverbal behaviors, such as smiles, we infer high-level psychological state of the user, such as attention and engagement. I focus on maintaining engaging conversations by modeling users’ engagement states in real-time and making conversational systems adapt to their users via techniques, such as adaptive conversational strategies and incremental speech production. I apply my multimodal engagement model in both non-task-oriented social dialog framework and task-oriented dialog framework that I designed. I developed an end-to-end, non-task-oriented multimodal virtual chatbot, TickTock, which serves as a framework for controlled multimodal conversation analysis. TickTock can carry on free-form everyday chatting conversations with users in both English and Chinese languages. Together with ETS Speech and Dialog team, I developed task-oriented system, HALEF, which is also a distributed web-based system. HALEF has both visual and audio sensing capabilities for human behavior understanding. Users can access the system via a web browser, which in turn reduces the cost and effort in data collections. HALEF can be easily adapted to different tasks. We implemented an application so that the system acts as an interviewer to help users prepare for job interviews.

For demos, please visit my webpage: http://www.cs.cmu.edu/~zhouyu/