Data science: More than a buzzword
When Foundations of Data Science, or Data 8, was first launched in 2015, there were fewer than 100 students in the class. Today, more than 1,600 students are enrolled in what has become a staple in the UC Berkeley undergraduate’s must-take class list.
Although introduced as a stand-alone course, Data 8 was always part of a much larger plan developed by staff and faculty across the entire campus. According to current Data 8 associate teaching professor and co-creator of the course John DeNero, the hope was that once students took Data 8, they would be left asking, “What do we take next?”
That is exactly what happened. Students, DeNero said, were enthusiastic about the course.
“It was clear to a lot of us that this content was fundamental, something that any student should know,” DeNero said. “It made sense that a lot of people should take it.”
Data science degrees had previously only been offered at the graduate level, according to DeNero. UC Berkeley developed a bachelor’s program when campus members realized just how applicable the content is to any undergraduate’s education.
As UC Berkeley is located just east of Silicon Valley, the school has a tradition of producing leaders in the field of technology. However, despite the hype that surrounds Data 8 and the data science major, faculty and students alike say there is more to data science than the buzz.
It is about drawing conclusions from data, Data 8 graduate student instructor, or GSI, Kevin Miao said. It is about “making sense of the world,” said campus senior and Data 8 undergraduate GSI, or uGSI, Margaret Misyutina. To campus senior and Data 8 uGSI Varun Jadia, data science is about learning how to think and ask the right questions.
“Data science to me, means using what we know around us to make predictions about the future,” said campus junior Will Furtado, who is the pedagogy lead for Data 8.
Accessibility and representation: No outliers in data science
For many, Data 8 is the catalyst that initializes a discovery of the world through data.
“Before I enrolled in Data 8, I had no idea what data science was,” Misyutina said. “Data 8 was my first introduction to the field and I fell in love with it.”
Misyutina has now been on Data 8 course staff for 8 semesters.
This is a sentiment that mirrors the journey of hundreds of students who take the class as their first exposure to computational and statistical methods.
Jadia took the course in his very first semester at Berkeley as his first coding class ever.
“That is what kicked it off for me,” Jadia said.
He has been involved in Data 8 as a mentor ever since.
Aarushi Karandikar, a campus data science senior and Data 8 uGSI, came into UC Berkeley intending a pre-med track. However, after taking Data 8 her first semester, the “trajectory of (her) college career” changed. This is now Karandikar’s sixth semester on course staff.
As a class that has no prerequisites and makes no assumptions about prior knowledge, Data 8 serves as an accessible introduction to technical concepts which can be applied to any domain of a student’s interest. According to the Data 8 website, the course is designed specifically for students with no prior statistics or computer science knowledge.
While the course is designed for students to bring their own interests and domain knowledge, learning how to apply new computational lenses can be the scariest part, added Miao, who is also an instructor in the Data Scholars program.
Sometimes, it’s hard for students from a non-traditional background who are new to the field to “imagine themselves being data scientists,” Miao noted.
This hesitation is one of the reasons the Data Scholars program was created. The program aims to provide mentorship and community to students from historically underrepresented groups.
Campus senior Carlos Ortiz, who helped develop Data Scholars, said he pursued a teaching position in data science to inspire minority students in the field.
“I want you to feel welcome and I want you to feel supported,” Ortiz said. “After I took Data 8, I was like ‘I’ve never had any Latino TAs; I really want to be one.’ ”
A unique aspect of Data 8 — and the data science program as a whole — is the pipeline from student to mentor, DeNero said.
Furtado came into Berkeley not knowing how to code. Now, he says, his favorite kinds of students to mentor in Data 8 are those from non-technical backgrounds.
Being a mentor for underrepresented students new to the field has been instrumental in shaping Ortiz’s Berkeley experience — and a motivator to help students in these groups find success in data science.
“Mentoring these students is the best part of my week,” Ortiz said.
Ortiz said seeing Data Scholars grow from 30 students to over 100 in the span of one year makes him feel like he is “on top of the world.”
Ortiz noted his goal with the program is for every underrepresented student to see themself represented in the course staff.
“I want everyone to see someone on staff that they can relate to, whether that’s cultural, race, ethnicity or transfer status,” Ortiz said. “All of that matters.”
Finding correlations: Data science as a bridge between majors
UC Berkeley undergraduates often categorize majors into two types: STEM and humanities or social sciences. One side includes computer science, math and biology. Leap across to find psychology, art and history.
The promise of data science: bridging the gap between humanities and STEM.
“Data science is a tool that can be leveraged with anything else. For me, that’s econ,” said Alice Chen, a data science and economics senior and Data 8 uGSI. “Some people are intent on becoming data science majors, and others are looking at it as a toolkit for their other disciplines.”
Data science can also be used for more fun pursuits. For example, the summer after he took Data 8, Furtado used what he learned in the course to glean insights from his Spotify listening history data spanning three years.
With the influx of students from all majors enrolling in the class, especially over the past few semesters, Furtado said course faculty and staff have placed focus on both making the course accessible to all, along with showcasing the many applications of the content.
One project in the course helps students understand the impacts of population growth. Another project investigates climate change by analyzing historical temperature and precipitation data. Students also learn to use data from screenplays to predict movie genres.
Incorporating projects from vastly different disciplines requires a body of students with vastly different backgrounds and interests, united by the common thread of data, according to Ortiz. He is currently applying data science techniques in his thesis about understanding the relationship between race and ethnicity, population characteristics and pollution.
“When you talk about data, the same people having the same perspective is not going to get you anywhere,” Ortiz said. “But a diverse group of perspectives — that’s where you make a difference.”
Karandikar, who is a human rights minor, is working on a project which aims to identify potential biases in judges’ decisions when granting asylum using U.S. Department of Immigration data, a project she describes as “the intersection between data science and human rights.”
UC Berkeley also offers a host of “connector courses,” which are designed to aid students who have taken Data 8 in linking the techniques they learned to other concepts, according to the campus data science website. These include everything from “Race, Policing, and Data Science” to “Data Science for Genetics and Genomics.”
Linear growth: Berkeley as a blueprint
The first-ever cohort of undergraduates to complete the Data Science degree just 3 years ago in 2019 was 82 students. The subsequent year saw the completion of 438 majors and 89 minors. Last year, these numbers jumped to 668 majors and 369 minors.
And it’s spreading. The “wild popularity” at Berkeley has inspired numerous colleges and universities across the country to adopt some of campus’s data science curriculum into their own undergraduate programs, DeNero said.
One of the aspects that makes data science at UC Berkeley so special, Miao added, is that it was one of the first colleges in the country to create an undergraduate data science major. Campus, he said, has the power to actively be “setting the tone” for what should be included in all data science curriculum across the board.
“Berkeley faculty have a long history of writing textbooks that get used all over the world — the modern variant of that is to build out a whole course,” DeNero said. “There are universities that want to incorporate it and they are more than happy to use our video lectures as the core for their course and adapt it based on their curriculum.”
DeNero noted all UC Berkeley-created Data 8 materials — including slides, the online textbook, lecture videos, homeworks and projects — are available for free online.
Beyond Data 8, the larger data science program is also continuing to develop new paths for students, with courses such as Data 102, “Data, Inference, and Decisions” and Data 101, “Data Engineering,” whose concepts were previously only being taught at the graduate level, being introduced, DeNero said.
Data 8 is also constantly evolving, Furtado said. Professors and course staff work together to improve and diversify materials semester-to-semester, in an effort to draw examples from “as many disciplines that we can,” he added.
Rita Wang, a fourth-year data science and computer science major who has been on Data 8 course staff every semester since she took it as a freshman, said she learns something new in each iteration of the course.
What’s exciting to DeNero, as the program continues to expand, is the “reach and breadth” that is involved in creating such courses. It’s not the work of a few faculty; rather, it is a campuswide collaborative effort, he said.
“Data science is the place to be,” Ortiz said.
Lydia Sidhom is the lead academics and administration reporter. Contact her at [email protected], and follow her on Twitter at @SidhomLydia .