Getting started with Python
Some of the quantitative parts of the materials on this site include example code in the Python programming language. Why Python? For the same reasons that explain its widespread use in industry and academia:
Python is designed to be easy to read and to minimize the time needed to write programs. Another way of thinking about this is that many programming languages provide structure and formalisms aiming to prevent you from writing very poor programs, whereas Python aims to get out of your way and let you express your intentions.
It’s a real, general-purpose programming language, unlike alternative software packages that are also used for data analysis, such as R and Matlab.
You can install Python for free on your own computer, or run it “in the cloud” (see below). You don’t need a licence for proprietary and expensive software packages.
Most of our examples are distributed as Python/Jupyter notebooks, which are an increasingly popular method of publishing demonstrations and executable examples that you can 🏗 try out on your own machine. They are live computational documents which allow the embedding of rich media (anything that a web browser can display). You can modify the examples and check the impact of your changes (the famous philosopher and educational reformer John Dewey wrote of “experimental doing for the sake of knowing”).
Alongside the Python programming languages, our examples use a number of very useful libraries that implement lots of functionality for data analysis and statistical processing:
the NumPy and SciPy libraries for numerical and scientific computing
statsmodels with various statistical models
Pandas for data processing
matplotlib and seaborn for plotting and graphical displays
SymPy for symbolic processing
Installing Python on your computer
If your computer runs Microsoft Windows, install one of
If your computer runs MacOS, install one of
If your computer runs Linux (great choice!), the
Python packages available via your distribution should work fine. You’ll
want the packages named python3
, numpy
(or
perhaps something like python3-numpy
),
matplotlib
, scipy
, statsmodels
and sympy
.
You can run Python via various graphical user interfaces, or the
shell/commandline, or via a notebook interface (for this, you need to
start Jupyter using a command such as jupyter notebook
in
your shell, or some menu entry in the graphical user interface).
Running Python notebooks in the cloud
You can also run Python notebooks on computers “in the cloud”, with some services kindly offered for free by a number of organizations. In this way, you don’t need to install any software on your computer (or your tablet), but access the notebooks via a web browser. Notebooks are live computational documents, great for “experimenting” with your analysis and models to test your understanding.
Some services that are available for free as of October 2024 (most of these will require you to create an account):
JupyterHub/MyBinder, run by the non-profit Jupyter project with computing resources provided by sponsors.
Google CoLaboratory run by Google, which allows you to save notebooks and analysis results to your Google Drive. Also provides access to dedicated GPU/TPU hardware for accelerated machine learning.
Deepnote, which allows you to run Jupyter Python and R notebooks in the cloud and includes some collaboration features for working as a team.
Running Python notebooks directly in your web browser
There is experimental support for running Python and most of the scientific libraries we are using (NumPy, SciPy, matplotlib) directly in your web browser, thanks to the Pyodide project. Try it out in your browser with Jupyterlite!
Not all of our notebooks will work with Pyodide as of October 2024 (for example, the seaborn package is not available in the standard build, and network access to CSV data won’t work), but quite a bit will work fine. This will run a little more slowly than native Python installed on your computer, but it’s a promising way to run Python notebooks without any local installation needed, and without depending on cloud services.
More information
There are many online courses on data analysis using Python, such as the EdX Analyzing data with Python introductory course run by IBM. There are also numerous online resources on scientific computing using Python and the Numpy ecosystem. Concerning scientific visualization using the matplotlib library, don’t miss the amazing free book Scientific Visualization: Python + Matplotlib by Nicolas Rougier.
To learn Python syntax, check out the book called Think Python by Allen B. Downey, which can be purchased from O’Reilly, or viewed online for free.
Published:
Last updated: