Behind our favorite online applications are mega-, giga-, or even terabytes of data all about your location, your favorite topics, and who you are as a person. Things you search on the internet inevitably end up advertised to you on another platform like Instagram. TikTok’s algorithm can recommend videos to you based on prior videos watched.
Many academics and government officials have raised worries about companies’ ability to collect, store, and utilize personal information, especially in regards to its utilization in espionage and the creation of deep fakes. Microsoft recently spoke out about fears involving generative AI and discussed steps to not only responsibly build AI products but also to make them safe for consumers.
At the UC Berkeley School of Information, two educators have taken the initiative to begin incorporating data ethics considerations into capstone, the final course of the Master of Information and Data Science. Assistant adjunct professor Morgan Ames, who teaches Data Science 231: Behind the Data: Humans and Values, and MIDS continuing lecturer Joyce Shen, who is the capstone course lead, have created a partnership that allows students to regularly discuss and address issues regarding data privacy and customer information protection while students build their capstone project. The initiative began in Fall 2022 and is now incorporated into all ten capstone courses.
“I think this is an innovative approach to providing peer feedback; students in both courses benefit,” said interim dean Marti Hearst, who helped spearhead the collaboration. “Students in the ethics course benefit from having a real project to grapple with and comment on, and students in the capstone course get high-quality detailed feedback from their peers about how well they are addressing ethical considerations.”
Throughout fourteen weeks, Capstone students are expected to ideate and build a minimum viable product (MVP) with the help of data sources online. In week five, students will present their project ideas, plan, data, and research before submitting their presentations to the Data Science 231 instructional team. For four weeks, the students in 231 will extensively look over all the information they received and write an assessment report detailing their suggestions and highlighting ethical and data privacy issues to solve. That report then goes back to the Capstone students, who consider how to incorporate these recommendations into their Capstone projects.
“In the industry, there are regulators, compliance enforcement professionals, model governance professionals, and risk management as well as audit teams. Data scientists have to work with these other colleagues to be able to explain their projects clearly, the machine learning models, where the data come from, and how they’re using the data,” explained Shen.
“In Capstone, we are essentially replicating the process. The 231 students are colleagues in those functional areas, and the Capstone students are data scientists building these data and AI-enabled products. [The 231 students] work through the frameworks that they learn in the class with respect to ethics and data privacy to apply to the capstone projects, which represent real world problems using real world data.”
Samantha Williams, a MIDS ’23 graduate, was one of the people who received feedback from a Data Science 231 student during Capstone. Her team project, Sibyl, aimed to leverage generative AI to create an intelligence compliance product for cybersecurity professionals to enable them to navigate and implement complex cybersecurity regulations. “You’re so in the thick of making the model work for Capstone that just having the opinion of that one outside person focused on ethics and data privacy forced us to think about the long term effects and impacts of our MVP,” Williams stated. “My experience with my classmates [has] always been like ‘we can do that,’ but there’s little pushback on ‘should we?’ The integration of 231 makes you think critically about what information you’re collecting, storing, and sharing, which isn’t something you necessarily learn in the real world.”
This partnership between the two courses aims to ultimately make conversations around data privacy and protection a standard in the data science curriculum and embed ethics and responsible AI in training data scientists. “These are considerations and knowledge that the industries are expecting from data scientists. These are things that are being actively discussed in every industry as [small or large enterprises] look to incorporate more AI and data science. With this, our graduate students are much better prepared and have a much broader aperture and deeper understanding,” reiterated Shen.
“Data science projects never happen in a vacuum: they inevitably reflect and affect the world and the people around them,” added Ames. “It has also been a great opportunity to orient class discussions around the topics most exciting to students semester-to-semester, as those trends are reflected in capstone project choices. I hope to continue these collaborations with other MIDS course leads as well.”
As for next steps, Ames is looking into ways to integrate 231 content into more of the MIDS curriculum. She has been talking with Assistant Professor Michael Rivera about possibly sharing curriculum with his course — Data Science 201: Research Design and Applications for Data.
The integration of data ethics frameworks into capstone courses marks a significant step towards fostering ethical practices in the realm of data science. By encouraging critical thinking about the ethical implications of data science, this collaboration has created an avenue to better equip students to handle ever-changing data privacy and customer information protection challenges. As technology continues to innovate, this partnership will ensure the next generation of data scientists ask the right questions and consider the long-term impacts of their actions.