Previous Lecture lect05

lect05, Tue 04/14

Pandas and Question formulation

Question formulation

Data Acquisition / Cleaning

Related to metrics for success.

Population, frame, sample.

Exploratory Data Analysis

Inference and Prediction

Can we come up with a robust answer despite the uncertainty.

Pandas

Pandas Data Structures

Goals for today

Discuss aggregation: * *

Method chaining

Also sometimes called “piping” Making multiple method calls sequentially and returning the resulting object

Groupby

  1. Group by Major
  2. Mean of “Random Number”

x_i comes from random Normal distribution (mean 0, std 1)

E[x_i] = 0, Var(x_i) = 1

Y = 1/N_g * sum(x_i), where i = g

E(Y) =

Var(Y) = 1/(N**2)*N = 1/N

SD = sqrt(var)

STSDS =

Multi-index

groupby