Liyue Fan
University of Southern California
liyuefan@usc.edu
Bio
Liyue Fan is a postdoctoral research associate at the Integrated Media Systems Center at USC. She holds a PhD in Computer Science and Informatics from Emory University and a BSc in Mathematics from Zhejiang University in China. Her PhD dissertation research centers around the development of data publication algorithms which provide rigorous guarantee for individual privacy without compromising output utility. After joining USC, she also works on spatial crowd-sourcing, transportation, and healthcare informatics
Preserving Individual Privacy in Big Data Analytics
Preserving Individual Privacy in Big Data Analytics
We live in the age of big data. With an increasing number of people, devices, and sensors connected with digital networks, individual data now can be largely collected and analyzed by data mining applications for social good as well as for commercial interests. However, the data generated by individual users exhibit unique behavioral patterns and sensitive information, and therefore must be transformed prior to the release for analysis. The AOL search log release in 2006 is an example of privacy catastrophe, where the searches of an innocent citizen were quickly re-identified by a newspaper journalist. In this talk, I present a novel framework to release continuous aggregation of private data for an important class of real-time data mining tasks, such as disease outbreak detection and web mining, to name a few. The key innovation is that the proposed framework captures the underlying dynamics of the continual aggregate statistics with time series state-space models, and simultaneously adopts filtering techniques to correct the observed, noisy data. It can be shown that the new framework provides a rigorous, provable privacy guarantee to individual data contributors without compromising the output analysis results. I will also talk about my current research, including the extension of the framework to spatial crowd-sourcing and privacy-preserving machine learning in a distributed research network