Data Science Applications and Analysis (UCSB, S20)
Important course links:
- Sync content to ds100.lsit.ucsb.edu
- Piazza
- Zoom link for lecture
- Zoom link for lab and office hours
- Office hours schedule
- Lecture slides
- Lecture recordings
- Gradescope
Link to Lecture Slides
Link to Lecture Recording
Lecture Notes:
lecture date | notes | ready? | description | reading |
---|---|---|---|---|
2020-03-31 | lect01 | true | Introductions and Course Overview | |
2020-04-02 | lect02 | true | Data Life Cycle | |
2020-04-07 | lect03 | true | The Data Science Lifecycle and Sampling | |
2020-04-09 | lect04 | false | Pandas | |
2020-04-14 | lect05 | true | Pandas and Question formulation |
num | ready? | description | assigned | due |
---|
num | ready? | description | assigned | due |
---|
In this course, we will explore the data science lifecycle: question formulation, data collection & cleaning, exploratory data analysis & visualization, statistical inference and prediction, and decision-making.
Instructors: Professors Kate Kharitonova (CS) and Alex Franks (PSTAT)
Prerequisites: PSTAT 120A, Math 4A, and knowledge of Python (at a minimum equivalent of CS 8, INT 5, PSTAT 10).
Catalog description: Overview and use of data science tools in Python for data retrieval, analysis, visualization, reproducible research and automated report generation. Case studies will illustrate practical use of these tools. This new course will focus on concepts that are relevant for data science by using some of the popular software tools in this area. Doing data science is more than using isolated methods. Creatively using a collection of concepts and domain knowledge is emphasized to clean, transform, analyze, and present data. Concepts in data ethics and privacy will also be discussed. Case studies will illustrate real usage scenarios.
Programming experience: This course is designed for students with a solid conceptual understanding of programming primitives (e.g., flow control, functions, 1D and 2D arrays, data types) and is comfortable in Python or in at least one programming or scripting language (C/C++, R, Python, etc.).
Software tools: Many software tools are used for data science. Tools we will use for this course include (but not limited to)
- Python
- Statistical/machine learning libraries
- Jupyter Notebook software for reproducibility
- Interactive visualization in the web browser.
Learning by doing will require software documentation, experimenting by trial-and-error, and lots of debugging. We are looking for self-motivated students with diverse interests in data science.
Where does this data science course fit in with the existing courses?
- Unlike INT 5 (which is open to all and is an intro course), this course is considered an “intermediate” data science course. As such, some experience with Python and some introductory probability are required.
- Unlike PSTAT 10, this course will be taught using Python but this is not a programming course. There will be a significant focus on statistical and data science concepts including exploratory analysis, uncertainty and ethics.
- This course is largely meant as a data-science focused complement to PSTAT 120B.
- The material in this course will be significantly less advanced than the content in PSTAT 134 and INT 15. If you have already taken these courses or courses beyond PSTAT 126, this course is not for you.
- An emphasis will be placed on learning from each other. You will be expected to pick up the material quickly and do a fair bit of self-learning. As part of our experimental approach, the course will be co-taught by professors from CS and PSTAT.
Link to this page: https://ucsb-ds.github.io/s20