Home Course Concepts About

Getting started with Python

Some of the quantitative parts of the materials on this site include example code in the Python programming language. Why Python? For the same reasons that explain its widespread use in industry and academia:

  • Python is designed to be easy to read and to minimize the time needed to write programs. Another way of thinking about this is that many programming languages provide structure and formalisms aiming to prevent you from writing very poor programs, whereas Python aims to get out of your way and let you express your intentions.

  • It’s a real, general-purpose programming language, unlike alternative software packages that are also used for data analysis, such as R and Matlab.

  • You can install Python for free on your own computer, or run it “in the cloud” (see below). You don’t need a licence for proprietary and expensive software packages.

Most of our examples are distributed as Python/Jupyter notebooks, which are an increasingly popular method of publishing demonstrations and executable examples that you can 🏗 try out on your own machine. They are live computational documents which allow the embedding of rich media (anything that a web browser can display). You can modify the examples and check the impact of your changes (the famous philosopher and educational reformer John Dewey wrote of “experimental doing for the sake of knowing”).

Alongside the Python programming languages, our examples use a number of very useful libraries that implement lots of functionality for data analysis and statistical processing:

Installing Python on your computer

If your computer runs Microsoft Windows, install one of

If your computer runs MacOS, install one of

If your computer runs Linux (great choice!), the Python packages available via your distribution should work fine. You’ll want the packages named python3, numpy (or perhaps something like python3-numpy), matplotlib, scipy, statsmodels and sympy.

Python 2 or Python 3? Python version 2 reached end-of-life in January 2020. You should only use Python 3 now.

You can run Python via various graphical user interfaces, or the shell/commandline, or via a notebook interface (for this, you need to start Jupyter using a command such as jupyter notebook in your shell, or some menu entry in the graphical user interface).

Running Python notebooks in the cloud

You can also run Python notebooks on computers “in the cloud”, with some services kindly offered for free by a number of organizations. In this way, you don’t need to install any software on your computer (or your tablet), but access the notebooks via a web browser. Notebooks are live computational documents, great for “experimenting” with your analysis and models to test your understanding.

Some services that are available for free as of July 2022 (most of these will require you to create an account):

  • JupyterHub/MyBinder, run by the non-profit Jupyter project with computing resources provided by sponsors.

  • Google CoLaboratory run by Google, which allows you to save notebooks and analysis results to your Google Drive. Also provides access to dedicated GPU/TPU hardware for accelerated machine learning.

  • Deepnote, which allows you to run Jupyter Python and R notebooks in the cloud and includes some collaboration features for working as a team.

  • CoCalc, which also offers an interface to free tools Sage and R for statistical analysis.

Running Python notebooks directly in your web browser

There is experimental support for running Python and most of the scientific libraries we are using (NumPy, SciPy, matplotlib) directly in your web browser, thanks to the Pyodide project. Try it out in your browser with Jupyterlite!

Not all of our notebooks will work with Pyodide as of July 2022 (for example, the seaborn package is not available in the standard build, and network access to CSV data won’t work), but quite a bit will work fine. This will run a little more slowly than native Python installed on your computer, but it’s a promising way to run Python notebooks without any local installation needed, and without depending on cloud services.

More information

There are many online courses on data analysis using Python, such as the EdX Analyzing data with Python introductory course run by IBM. There are also numerous online resources on scientific computing using Python and the Numpy ecosystem. Concerning scientific visualization using the matplotlib library, don’t miss the amazing free book Scientific Visualization: Python + Matplotlib by Nicolas Rougier.