Installation:In this class, we will be using Python as the programming language for all assignments. Please refer to this url for simple instructions about installing Python and Scientific Python. I strongly suggest using Anaconda distribution system. If you want to manually install the packages, use the instructions here. Remember to install other packages such as BeautifulSoup, Pattern, Seaborn and MrJob. Instructions are in the first url itself.
IPython Notebook Basics: Please spend some time exploring IPython Notebook.
This is a fast, 10 minute video introducing the basics.
As of now, it is perfectly fine if you do not understand what the Python code that are being executed means.
Once you are done, please spend some time reading this tutorial
which might require around 30 minutes of your time.
We will be spending lot of time in IPython Notebooks. So an initial investment of time will save lot of your time later.
[Optional] This is a great post from Philip Guo
about his impressions about IPython Notebook.
This is an article
from Nature magazine (a prestigious publisher in Science domain)
about how IPython Notebook is becoming popular in non-CS domain.
Python Basics:
Here is a GREAT IPytho Notebook that covers most important ideas in Python. For most of you, it should not take more than 1-2 hours.
[Optional]
This IPython Notebook provides a non-technical introduction to entire Scientific Python stack.
Verification: It is important to have a common baseline so that the code we will be using have same output. If you have installed all packages correctly, you should have version numbers that are higher than what is given below. If not, install the necessary package and restart IPython Notebook/console. (Courtesy: Harvard, CS 109)
#IPython is what you are using now to run the notebook
import IPython
print "IPython version: %6.6s (need at least 1.0)" % IPython.__version__
# Numpy is a library for working with Arrays
import numpy as np
print "Numpy version: %6.6s (need at least 1.7.1)" % np.__version__
# SciPy implements many different numerical algorithms
import scipy as sp
print "SciPy version: %6.6s (need at least 0.12.0)" % sp.__version__
# Pandas makes working with data tables easier
import pandas as pd
print "Pandas version: %6.6s (need at least 0.11.0)" % pd.__version__
# Module for plotting
import matplotlib
print "Mapltolib version: %6.6s (need at least 1.2.1)" % matplotlib.__version__
# SciKit Learn implements several Machine Learning algorithms
import sklearn
print "Scikit-Learn version: %6.6s (need at least 0.13.1)" % sklearn.__version__
# Requests is a library for getting data from the Web
import requests
print "requests version: %6.6s (need at least 1.2.3)" % requests.__version__
#BeautifulSoup is a library to parse HTML and XML documents
import bs4
print "BeautifulSoup version:%6.6s (need at least 4.0)" % bs4.__version__
#MrJob is a library to run map reduce jobs on Amazon's computers
import mrjob
print "Mr Job version: %6.6s (need at least 0.4)" % mrjob.__version__
#Pattern has lots of tools for working with data from the internet
import pattern
print "Pattern version: %6.6s (need at least 2.6)" % pattern.__version__
#Seaborn is a nice library for visualizations
import seaborn
print "Seaborn version: %6.6s (need at least 0.3.1)" % seaborn.__version__