Installation:In this class, we will be using Python as the programming language for all assignments. Please refer to this url for simple instructions about installing Python and Scientific Python. I strongly suggest using Anaconda distribution system. If you want to manually install the packages, use the instructions here. Remember to install other packages such as BeautifulSoup, Pattern, Seaborn and MrJob. Instructions are in the first url itself.
IPython Notebook Basics: Please spend some time exploring IPython Notebook.
This is a fast, 10 minute video introducing the basics.
As of now, it is perfectly fine if you do not understand what the Python code that are being executed means.
Once you are done, please spend some time reading this tutorial
which might require around 30 minutes of your time.
We will be spending lot of time in IPython Notebooks. So an initial investment of time will save lot of your time later.
[Optional] This is a great post from Philip Guo about his impressions about IPython Notebook. This is an article from Nature magazine (a prestigious publisher in Science domain) about how IPython Notebook is becoming popular in non-CS domain.
Here is a GREAT IPytho Notebook that covers most important ideas in Python. For most of you, it should not take more than 1-2 hours.
[Optional] This IPython Notebook provides a non-technical introduction to entire Scientific Python stack.
Verification: It is important to have a common baseline so that the code we will be using have same output. If you have installed all packages correctly, you should have version numbers that are higher than what is given below. If not, install the necessary package and restart IPython Notebook/console. (Courtesy: Harvard, CS 109)
#IPython is what you are using now to run the notebook import IPython print "IPython version: %6.6s (need at least 1.0)" % IPython.__version__ # Numpy is a library for working with Arrays import numpy as np print "Numpy version: %6.6s (need at least 1.7.1)" % np.__version__ # SciPy implements many different numerical algorithms import scipy as sp print "SciPy version: %6.6s (need at least 0.12.0)" % sp.__version__ # Pandas makes working with data tables easier import pandas as pd print "Pandas version: %6.6s (need at least 0.11.0)" % pd.__version__ # Module for plotting import matplotlib print "Mapltolib version: %6.6s (need at least 1.2.1)" % matplotlib.__version__ # SciKit Learn implements several Machine Learning algorithms import sklearn print "Scikit-Learn version: %6.6s (need at least 0.13.1)" % sklearn.__version__ # Requests is a library for getting data from the Web import requests print "requests version: %6.6s (need at least 1.2.3)" % requests.__version__ #BeautifulSoup is a library to parse HTML and XML documents import bs4 print "BeautifulSoup version:%6.6s (need at least 4.0)" % bs4.__version__ #MrJob is a library to run map reduce jobs on Amazon's computers import mrjob print "Mr Job version: %6.6s (need at least 0.4)" % mrjob.__version__ #Pattern has lots of tools for working with data from the internet import pattern print "Pattern version: %6.6s (need at least 2.6)" % pattern.__version__ #Seaborn is a nice library for visualizations import seaborn print "Seaborn version: %6.6s (need at least 0.3.1)" % seaborn.__version__