UC Berkeley Campus between 1898 and 1905 (photo courtesy of the Library of Congress)
Special Event

105th Birthday Celebration

Friday, October 20, 2023
2:30 pm - 4:00 pm
Remote video URL

In 1918 UC Berkeley began a full-time program in library science. Join us for this year’s celebration of the founding and history of the School of Information, School of Information Management and Systems, School of Library and Information Studies, and School of Librarianship.


Program

Welcoming Remarks
Marti Hearst
Interim Dean, UC Berkeley School of Information

Achievements of Patrick Wilson
Howard D. White, Ph.D. ’74
Professor Emeritus, Drexel University

Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
David Bamman
Associate Professor, UC Berkeley School of Information

An Interdisciplinary Framework for Evaluating Deep Facial Recognition Technologies for Forensic Applications
Justin Norman
Ph.D. Student, UC Berkeley School of Information

Reception to follow.

CAPTCHA

Presentations

Achievements of Patrick Wilson

Howard D. White, Ph.D.

This talk will feature some remarks on the contemporary bearing of Wilson’s most highly cited paper, “Situational Relevance,” published exactly half-a-century ago. It will also touch on the tradition of “foundational studies” in Berkeley’s School of Information that starts with him, and on his paradigmatic influence on the field of knowledge organization.

Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4

David Bamman, Associate Professor
UC Berkeley School of Information

As ChatGPT and other large language models are transforming the space of research and a variety of industries, understanding the data on which those models have been trained provides an important lens to understand their behavior and the risks to validity they pose to downstream tasks. In this talk, I'll describe recent research by my group in carrying out a data archaeology to infer books that are known to ChatGPT and GPT-4; we find that OpenAI models have memorized a wide collection of copyrighted materials, and that the degree of memorization is tied to the frequency with which passages of those books appear on the web. The ability of these models to memorize an unknown set of books complicates assessments of measurement validity for cultural analytics by contaminating test data; we show that models perform much better on memorized books than on non-memorized books for downstream tasks. We argue that this supports a case for open models whose training data is known.

An Interdisciplinary Framework for Evaluating Deep Facial Recognition Technologies for Forensic Applications

Justin Norman

Much has been written about flaws in facial recognition, particularly in terms of gender and racial bias. With facial recognition systems seeing widespread use in law enforcement, it is also critical that we understand its accuracy, particularly in high-stakes forensic settings.

While precision and recall can be a reasonable way to assess the overall accuracy of a recognition system, an often overlooked aspect of these measurements is the composition of the comparison group. For example, a high precision may be relatively easy to achieve if the person X has highly distinct characteristics (age, race, gender, etc.) with regard to the other people in the dataset against which they are being compared. On the other hand, the same underlying recognition system may struggle if person X shares many characteristics with the comparison group. Alternatively, most facial recognition systems make strong assumptions about the image quality, pose, obfuscations present and sizes of images presented as both source images and datasets for comparison. In the real-world there are often dramatic variations in all of these variables for any given set of images. In the classic eyewitness setting, a witness is asked to identify a suspect in a six-person lineup consisting of the suspect and five decoys with the same general characteristics and distinguishing features (facial hair, glasses, etc.) as the suspect. 

We propose that a similar approach should be employed to assess the accuracy of a facial recognition system deployed in a forensic setting. This approach will ensure that the underlying facial recognition task is similar regardless of the differences or similarities between the probe and comparison faces. This allows for a more proper determination of accuracy (and thus feasibility or suitability) of the model for use in real-world, high-impact use cases.


Speakers

Howard D. White, Ph.D.

After taking his Ph.D. in librarianship at the University of California, Berkeley, in 1974, Howard D. White joined Drexel University's College of Computing & Informatics, where he is now professor emeritus. He co-authored For Information Specialists: Interpretations of Reference and Bibliographic Work with Marcia Bates and Patrick Wilson (Ablex, 1992). A later book is Brief Tests of Collection Strength (Greenwood, 1995). He originated author co-citation analysis in 1981, pennant diagrams in 2007, and libcitations in 2009. His topics have also included bibliometrics, visualization of literatures, evaluation of reference services, expert systems for reference work, innovative online searching, social science data archives, library publicity, American attitudes toward library censorship, and literature retrieval for meta-analysis and interdisciplinary studies. In 1993 he won the Research Award of the Association for Information Science and Technology (ASIS&T) for distinguished contributions. In 1998 he and Katherine McCain won the best JASIS paper award for Visualizing a Discipline: An Author Co-Citation Analysis of Information Science, 1972-1995. He was a Drexel Distinguished Professor during 1998-2002, using the associated grant to develop AuthorMap, a system for visualizing co-cited author links in the humanities. In 2004 he won ASIS&T’s highest honor for career achievement, the Award of Merit. In 2005 the International Society for Scientometrics and Informetrics honored him with the biennial Derek de Solla Price Memorial Medal for contributions to the quantitative study of science.

David Bamman

David Bamman is an associate professor in the School of Information at UC Berkeley, where he works in the areas of natural language processing and cultural analytics, applying NLP and machine learning to empirical questions in the humanities and social sciences. His research focuses on improving the performance of NLP for underserved domains like literature (including LitBank and BookNLP) and exploring the affordances of empirical methods for the study of literature and culture. Before Berkeley, he received his PhD in the School of Computer Science at Carnegie Mellon University and was a senior researcher at the Perseus Project of Tufts University. Bamman's work is supported by the National Endowment for the Humanities, National Science Foundation, an Amazon Research Award, and an NSF CAREER award.

Justin Norman

Justin is a Ph.D. student at UC Berkeley, where he is advised by Dr. Hany Farid. Justin’s research is centered on generative computer vision, computational photography, AI systems validation and forensics. He is a recipient of the Marcus Foster Fellowship. Alongside his research, Justin serves as the Technical Director for AI & ML at the Defense Innovation Unit (DIU).

Previously, Justin was VP, Data Science, Analytics and Data Product at Yelp. Before Yelp, Justin was the director of research and data science at Cloudera Fast Forward Labs, head of applied machine learning at Fitbit, the global head of Cisco’s Enterprise Data Science Office and a big data systems engineer with Booz Allen Hamilton. Prior to his work in industry, Justin served as a Marine Corps officer, with a focus in systems analytics and device intelligence. He is a graduate of the US Naval Academy with a degree in computer science and the University of Southern California with a master’s degree in business administration and business analytics.

Last updated: July 31, 2024