Information Access Seminar

Evaluation of a Data Science Environment

Friday, April 4, 2014
3:10 pm - 5:00 pm
Sebastian Benthall

Existing institutionally-recognized practices of transmitting and archiving scholarly communication are not well suited to empirical science that depends on rapid innovation in computational methods (“data science”). This is a problem because of the increased availability of observational data from sensors and the Internet, which makes data science a powerful way of getting new scientific understanding. If academic institutions don't accommodate this kind of work, data scientists will work elsewhere and academic science may have trouble keeping pace with industry (where this comparison is apt) — which is unfortunate for students. In this talk, I propose dissertation work that addresses this problem. Taking existing open source communities to be a promising model for how academic data science could work, I will study the Scientific Python communities (SciPy) through their ~15 years of openly available historical data and their present role, through I Python, in the Berkeley Institute of Data Science (BIDS). I will combine data science and participant observation to make recommendations about how open source practices should be maintained, changed, or integrated alongside other emerging practices in data management, publication, and program evaluation.

Sebastian Benthall is a 3rd year PhD student in the I School and an operations consultant at the D-Lab.

Last updated: March 26, 2015