Getting started with Python
Some of the quantitative parts of the materials on this site include example code in the Python programming language. Why Python? For the same reasons that explain its widespread use in industry and academia:
Python is designed to be easy to read and to minimize the time needed to write programs (another way of thinking about this is that many programming languages provide structure and formalisms aiming to prevent you from writing very poor programs, whereas Python aims to get out of your way and let you express your intentions).
It’s a real, general-purpose programming language, unlike alternative software packages that are also used for data analysis, such as R and Matlab.
You can install Python for free on your own computer, or run it “in the cloud” (see below). You don’t need a licence for proprietary and expensive software packages.
Most of our examples are distributed as Python/Jupyter notebooks, which are an increasingly popular method of publishing demonstrations and executable examples that you can 🏗 try out on your own machine. They are live computational documents which allow the embedding of rich media (anything that a web browser can display). You can modify the examples and check the impact of your changes (the famous philosopher and educational reformer John Dewey wrote of “experimental doing for the sake of knowing”).
Alongside the Python programming languages, our examples use a number of very useful libraries that implement lots of functionality for data analysis and statistical processing:
statsmodels with various statistical models
Pandas for data processing
SymPy for symbolic processing
Installing Python on your computer
If your computer runs Microsoft Windows, install one of
If your computer runs MacOS, install one of
If your computer runs Linux (great choice!), the Python packages available via your distribution should work fine. You’ll want the packages named
numpy (or perhaps something like
Python 2 or Python 3? Python version 2 reached end-of-life in January 2020. You should only use Python 3 now.
You can run Python via various graphical user interfaces, or the shell/commandline, or via a notebook interface (for this, you need to start Jupyter using a command such as
jupyter notebook in your shell, or some menu entry in the graphical user interface).
Running Python notebooks in the cloud
You can also run Python notebooks on computers “in the cloud”, with some services kindly offered for free by a number of organizations. In this way, you don’t need to install any software on your computer (or your tablet), but access the notebooks via a web browser. Notebooks are live computational documents, great for “experimenting” with your analysis and models to test your understanding.
Some services that are available for free in September 2020 (most of these will require you to create an account):
JupyterHub/MyBinder, run by the non-profit Jupyter project with computing resources provided by sponsors
Google CoLaboratory run by Google, which allows you to save notebooks and analysis results to your Google Drive. Also provides access to dedicated GPU/TPU hardware for accelerated machine learning.
CoCalc, which also offers an interface to free tools Sage and R for statistical analysis
Github Codespaces (currently in early access beta) allows you to run Python notebooks.
There are many online courses on data analysis using Python, such as the EdX Analyzing data with Python introductory course run by IBM.