Assistant Professor David Bamman has been awarded the prestigious National Science Foundation (NSF) CAREER award for his research designing computational methods to improve natural language processing for fiction. The NSF funds research and education in science and engineering, through grants, contracts, and cooperative agreements.
The NSF Faculty Early Career Development (CAREER) Program supports early-career faculty who have the potential to serve as academic role models in research and education, as well as lead advances in the mission of their organization. The award assists early-career faculty in building a strong foundation for a future of leadership by integrating research and education.
Natural Language Processing and Machine Learning for Literature
Bamman anticipates that the grant will allow him to expand the scope of his current research, which is focused on improving natural language processing (NLP) and machine learning for literature. While NLP has previously been optimized for domains like news and Wikipedia, literature and fiction remain neglected areas for NLP. Fiction is an untapped valuable data source, Bamman says, because it contains common-sense reasoning and knowledge about actions that take place in the real world — actions which aren’t described in news or textual data sets.
“A lot of the methods that work well for news don’t work well when you’re talking about a book that’s 200,000 words long and has a lot of figurative language, a lot of metaphors, which break a lot of the current systems,” Bamman explained.
By focusing exclusively on works of fiction, Bamman’s research seeks to understand common-sense reasoning in literature and evaluate how it can be used to improve real-world information systems. Many real-world information systems presume common-sense knowledge about the world; for example, in order to eat breakfast you have to be awake. “We never see those kinds of statements in news because they’re just assumed that people know them, but what we see in literature are depictions of people’s entire lives,” Bamman explains. “So, we have a lot of information there we could mine about what common-sense patterns look like that could improve the performance of systems that operate in the real world — including Siri on your phone.”
Bamman also hopes that his work will also give him insights into our culture. His past published work looked how descriptions of characters in novels vary as a function of those characters’ gender and the author’s gender; he noted disparities in the attention given to female characters by female and male authors. As he continues to develop methods to quantify and measure authors’ language, Bamman hopes to learn more about representation in works of fiction, and how those representations are tied to social and cultural standards.
Teaching Natural Language Processing
Bamman also plans to use the grant to support his teaching, making data science, machine-learning, and natural language processing methods accessible to students who don’t come from technical backgrounds. When he was a college undergraduate, Bamman studied the humanities, and he wants to bring researchers and students from the humanities and social sciences into this technical field.
“When I’m doing this research now, it’s very much focused on this area of literature that I’ve always been interested in for a long time,” Bamman stated, “So it’s great to be an information school where I can help improve the performance of these core algorithms but also apply them to empirical questions for this domain that I’ve always really enjoyed researching.”