Data Science Applications and Analysis (UCSB, S20)

Important course links:

Lecture Notes:

lecture date	notes	ready?	description
2020-03-31	lect01	true	Introductions and Course Overview
2020-04-02	lect02	true	Data Life Cycle
2020-04-07	lect03	true	The Data Science Lifecycle and Sampling
2020-04-09	lect04	false	Pandas
2020-04-14	lect05	true	Pandas and Question formulation

Homework

num	ready?	description	assigned	due

Lab

num	ready?	description	assigned	due

In this course, we will explore the data science lifecycle: question formulation, data collection & cleaning, exploratory data analysis & visualization, statistical inference and prediction, and decision-making.

Instructors: Professors Kate Kharitonova (CS) and Alex Franks (PSTAT)

Prerequisites: PSTAT 120A, Math 4A, and knowledge of Python (at a minimum equivalent of CS 8, INT 5, PSTAT 10).

Catalog description: Overview and use of data science tools in Python for data retrieval, analysis, visualization, reproducible research and automated report generation. Case studies will illustrate practical use of these tools. This new course will focus on concepts that are relevant for data science by using some of the popular software tools in this area. Doing data science is more than using isolated methods. Creatively using a collection of concepts and domain knowledge is emphasized to clean, transform, analyze, and present data. Concepts in data ethics and privacy will also be discussed. Case studies will illustrate real usage scenarios.

Programming experience: This course is designed for students with a solid conceptual understanding of programming primitives (e.g., flow control, functions, 1D and 2D arrays, data types) and is comfortable in Python or in at least one programming or scripting language (C/C++, R, Python, etc.).

Software tools: Many software tools are used for data science. Tools we will use for this course include (but not limited to)

Python
Statistical/machine learning libraries
Jupyter Notebook software for reproducibility
Interactive visualization in the web browser.

Learning by doing will require software documentation, experimenting by trial-and-error, and lots of debugging. We are looking for self-motivated students with diverse interests in data science.

Where does this data science course fit in with the existing courses?

Unlike INT 5 (which is open to all and is an intro course), this course is considered an “intermediate” data science course. As such, some experience with Python and some introductory probability are required.
Unlike PSTAT 10, this course will be taught using Python but this is not a programming course. There will be a significant focus on statistical and data science concepts including exploratory analysis, uncertainty and ethics.
This course is largely meant as a data-science focused complement to PSTAT 120B.
The material in this course will be significantly less advanced than the content in PSTAT 134 and INT 15. If you have already taken these courses or courses beyond PSTAT 126, this course is not for you.
An emphasis will be placed on learning from each other. You will be expected to pick up the material quickly and do a fair bit of self-learning. As part of our experimental approach, the course will be co-taught by professors from CS and PSTAT.

Link to this page: https://ucsb-ds.github.io/s20

Data Science Applications and Analysis (UCSB, S20)

Important course links:

Link to Lecture Slides

Link to Lecture Recording

Lecture Notes: