Inspiration
The following are some general concepts and motivations behind the choices that were made in teaching Data 8. They give an idea for why particular decisions were made about the topics covered and technologies used.
Major goals of Data 8
Accessibility and Equity: Students from all backgrounds should be able to take Data 8. As such, no prerequisites in statistics or programming are required for the course; only basic high-school algebra are necessary.
Diversity: Data 8 can be taken by students from any major across campus, and should be acceptable as a potential pre-requisite for statistics, math, or computing many majors. Currently, more than 50 different majors each semester, with no particular major comprising of more than 20% of the class.
Pedagogical Clarity: Data 8 is designed to first teach introductory programming, then statistics through a computation lens, and ultimately concludes with basic methods in inference.
Scalability: The course must meet the growing demand from students in order to be widely accessible at Berkeley. Data 8 has to grown be taken by more than 1200 students each semester.
Core concepts / inspirations
Leverage the combination of Computer Science and Statistics
Come away with practical data science skills applicable to any domain
Be able to conduct robust inference from limited data
Be able to run experiments and test hypotheses
Know to use the right statistical tools for different tasks
Quantify and understand uncertainty in data
Harness the power of computation and simulation in conducting data science
Illustrate the above concepts with real-world data from a variety of domains
Large decisions made in teaching Data 8
Shield the students from the topics that take away from the core concepts noted above
Aim the course for anybody, not just statistics or CS majors. Thus, Data 8 begins does not have a statistics or programming prerequisite.
The course begins with teaching basic programming in python.
Use the
datascience
module instead of learning (complex) package-specific APIs
Use a JupyterHub to not force students to set up their own environments, also creating equitable computing environments.
Provide pre-collected/cleaned data, allowing students to avoid data-cleaning
Carry out complex probabilistic and statistical concepts through simulation
Concepts such as industry-adopted packages or data-cleaning are covered in subsequent courses, and a more advanced formalization of the CS and statistical concepts will occur in later classes like Data 100 or Data 102.
Last updated