lect01  Next Lecture 
lect01, Tue 03/31
Introductions and Course Overview
Lecture 1
Starts about 10 min into the video
Course Info
 Instructors
 Alex Franks (Prof. Franks)
 Yekaterina Kharitonova (Prof. K)
 TAs:
 Franky (Meng)  STAT
 Brian (Lim)  CS
 Tutors:
 Anya
 Arthur
 Natalie
 Noemi
 Sergio
Course websites
 Piazza for all courserelated announcements and questions
 Course website: https://ucsbds.github.io/s20
 Jupyterhub (for labs and homework): ds100.lsit.ucsb.edu
 (+ Gradescope for submissions)
Textbook / references
 Reference text: Python Data Science Handbook
 Lecture notes
Grades
No exams.
 50%  approximately 5 homeworks
 25%  Labs on Wednesdays (not graded on attendance, submitted online, shorter than HW)
 20%  Final project (due Wed, June 10)
 like a “double homework”
 more info later in the quarter
 5%  Participation
Homework
 usually 12 weeks
 will be accepted up to 2 days late
 less than 24 hours late: 80% credit
 2448 hours late: 60% credit
 after 48 hours: no credit (0%)
 contact the instructors in case of an emergency
Labs
 Labs are required; need to be turned in at the end of the lab as a record of your attendance.
 Section attendance: try to stay with the time you are assigned on GOLD.
 Submit it by Friday at 5pm
Participation
 Lecture attendance is strongly encouraged but not the only way to participate.
 join the lecture and ask questions (raise your hand on Zoom to be unmuted to ask your question)
 join the lab and ask questions
 participate on Piazza (asking/answering questions)
 office hours
Inclass group work
Will be split into groups of about 3 to introduce yourself and discuss the question that was posed.
Timestamp in the video 32:36
Programming Languages for Data Science
 R (PSTAT 10), created specifically for statistical computing and graphics
 Python (CS 8), “general purpose” programming language + important packages
Who is this class for?
 advanced beginners
 not for those who took advanced stats or cs courses
Potential List of Topics
*
Taking computational / conceptional approach instead of the rigorous mathematical treatment
What is a data scientist?
T vs shaped
What is data science?
Domain expertise helps to know which tools to use and when.
Pokemoncardstyle collection of data science explanation visualizations
Learning from Data

The way to learn about the world is to take a claim and prove that it’s false.

Induction. Can we generalize from specific instances?

How is knowledge created from a sociological perspective?
 science is a communal activity
 e.g., threshold for pvalues
Falsification
(null hypothesis): “All swans are white.”
vs.
: “Not all swans are white.”
(null hypothesis): “The ivorybilled woodpecker is extinct.”
Can I be sure? Maybe I have been missing it.
Induction and Evidence
 data provides evidence for the truth of a conclusion
 argue about what is probable rather than what is impossible
 often about inferring general principles from specific observation
 Bayes’ Theorem for describing probabilities based on evidence
Inference: “Black swans are rare.” Fraction of black swans in the world?
The role of models
 Make assumption about how the data is generated.
 Models can still be used to develop statistical tests
 Can also be used to make predictions / forecasts and describe sources of variability
 Can (and should) be continuously refined
Statistical Inference
 probability (sampling distribution) ==> data ==> population
Given facts about the world, what might I see?
Learn the facts about the world
“Inverse probability”
DS 100 philosophy
4 steps:
 Raw data is not information (pandas)
 Information is not knowledge (EDA)
 …
 visualization (Altair)
 Knowledge is not understanding (domain expertise)
 Understanding is not wisdom (Ethical data science, consequences, data privacy)
Wisdom and Data Science
Questions to ask yourself
Mark Twain’s quote about statistics.
Discussion
UC Berkeley Gender Bias case (1973)
 graduate school admissions
What possible truths are consistent with this information? 1:10:11?
“Simpson’s paradox”