MIT Rising Stars

Daphne Ippolito

University of Pennsylvania

Position: PhD Candidate

Rising Stars year of participation: 2021

Contact:

daphnei@seas.upenn.edu

Bio

Daphne Ippolito is a final-year PhD student at University of Pennsylvania being co-advised by Chris Callison-Burch at UPenn and Douglas Eck at Google. Her research interests focus on understanding the properties of large neural language models the text they generate and their possible applications to creative domains. Daphne recently started part-time as a research scientist at Google Brain. Daphne was a co-organizer for the Workshop on Enormous Language Models at ICLR 2021 and the Workshop on Creativity at NeurIPS 2020 and she will be serving as a chair for the NAACL 2022 Student Research Workshop. In her spare time she likes to garden read fantasy and science fiction and learn new musical instruments.

Deduplicating Training Data Makes Language Models Better

Deduplicating Training Data Makes Language Models Better
We find that existing language modeling datasets contain many near-duplicate examples and long repetitive substrings. As a result over 1% of the unprompted output of language models trained on these datasets is copied verbatim from the training data. We develop two tools that allow us to deduplicate training datasets–for example removing from C4 a single 61 word English sentence that is repeated over 60 000 times. Deduplication allows us to train models that emit memorized text ten times less frequently and require fewer train steps to achieve the same or better accuracy. We can also reduce train-test overlap which affects over 4% of the validation set of standard datasets thus allowing for more accurate evaluation.

Yuka Ikarashi

MIT

Exocompilation for Specialized Hardware

Rupamathi Jaddivada

Massachusetts Institute of Technology

Toward Economically-Efficient and Technically-Realizable Electric Energy Service

Daphne Ippolito

Bio

Deduplicating Training Data Makes Language Models Better

Previous

Yuka Ikarashi

Next

Rupamathi Jaddivada