A Tour of Python libraries: Scikit-learn and statsmodels in action

Website for the UCSB Data Science Capstone Preparation Workshop

A Tour of Python libraries: Scikit-learn and statsmodels in action

Welcome!

Prerequisites

NumPy was introduced to levitate the above problem and is the basic and essential building block for scientifc computing in Python. NumPy offers new data types (numpy.int16, numpy.int32, numpy.float, etc) that replace Python’s native data types. The NumPy’s n-dimensional array is the better version of Python’s lists (or nested lists) for scientific computing purpose. A NumPy array can only hold data of one numerical type while a Python list can hold data of different types at once, however, this restriction makes Numpy arrays better at storing and manipulating matrices. Note that NumPy is built upon C/C++ so its operations are quite fast.

You can read more here

From now on, you will see that any scientific library in Python is either built upon or intergratable with NumPy.

SciPY

SciPy (Science + Python) is the library developed for scientific library in Python. SciPy is built over NumPy, taking advantage of NumPy efficient data structures and computations. The whole SciPy library consists of a large number of modules, each of which corresponds to a particular scientific topic. Some useful modules that we may use in data science:

Pandas

Pandas is a library for handling tabular data in Python. It is also built on top of NumPy. Pandas main data structure is a pandas.DataFrame, which is similar to the R.frame. Pandas can be integrated well with other scientific libraries like SciPy, Matplotlib, or Scikit-learn.

Matplotlib

Matplotlib is originally a visualization component of Matlab, which used to be the go-to language for scientific computing before Python’s data science ecosystem became popular. As people moved to Python, they wanted to bring the visualization package to the new language so Matplotlib was adapted for Python. As a result, it does not feel very Pythonic when programming with Matplotlib in Python.

Basic syntax and Functions.

We have put up a comprehensive slide that summarizes the basic syntax and funtions of the above Python libraries.

You can access the slide here.

Check your understanding