Shanghang Zhang
Carnegie Mellon University
shzhang.pku@gmail.com
Bio
Shanghang Zhang received a PhD degree in electrical engineering and computer science (EECS) from Carnegie Mellon University. Previously, she received a master’s degree from Peking University and a bachelor’s degree from Southeast University, both in EECS. Her research interests include deep learning and computer vision. She has been working on large scale traffic video analysis vehicle detection and counting salient object segmentation domain adaptation and Image Synthesis with GAN. She is the recipient of Adobe Academic Collaboration Funding Qualcomm Innovation Fellowship (QInF), Competition Finalist Award, and Chiang Chen Overseas Graduate Fellowship. She serves as the reviewer for PLOS One, CVPR, ICCV, ECML-PKDD, and CIS-RAM, among others. She has also interned at Adobe Research.
Deep Understanding of Urban Mobility from Cityscape Webcams
Deep Understanding of Urban Mobility from Cityscape Webcams
Deep understanding of urban mobility is of great significance for many real-world applications such as urban traffic management and autonomous driving. We develop deep learning methodologies to extract vehicle counts from streaming real-time video captured by multiple low resolution web cameras and construct maps of traffic density in a city environment; in particular, we focus on cameras installed in the Manhattan borough of NYC. The large-scale videos from these web cameras have low spatial and temporal resolution, high occlusion, large perspective, and variable environment conditions that cause most existing methods to lose their efficacy. To overcome these challenges, the thesis develops several techniques: a block-level regression model with a rank constraint to map the dense image feature into vehicle densities; a deep multi-task learning framework based on fully convolutional neural networks to jointly learn vehicle density and vehicle count; deep spatio-temporal networks for vehicle counting to incorporate temporal information of the traffic flow; and multi-source domain adaptation mechanisms with adversarial learning to adapt the deep counting model to multiple cameras. To train and validate the proposed system, we have collected a large-scale webcam traffic dataset, CityCam, which contains 60 million frames from 212 webcams installed in key intersections of NYC. Of there, 60,000 frames have been annotated with rich information leading to about 900,000 annotated objects. To the best of our knowledge, it is the first and largest webcam traffic dataset with such large number of elaborate annotations. The proposed methods are integrated into the CityScapeEye system that has been extensively evaluated and compared to existing techniques on different counting tasks and datasets with experimental results demonstrating the effectiveness and robustness of CityScapeEye.