Linear regression analysis
Predicting future performance and explaining observations
Linear regression analysis means “fitting a straight line to data”. It’s a widely used technique to help model and understand real-world phenomena, which is easy to use and to understand intuitively. It allows prediction of future outputs from the phenomenon you are modelling. Learn how to use plots for exploratory data analysis, to determine whether a linear model might be suitable for your data. Learn how to build univariate and multivariate linear models using the Python statsmodel library.
We also present a number of possible pitfalls when using linear regression, including sample size issues, treatment of outliers and order of effect problems.
This submodule is a part of the risk analysis module.
Regression analysis using Python
Python notebook on regression analysis of health impact of smoking
Python notebook on regression analysis of a combined cycle power plant
Python notebook with bootstrap regression of helmet performance data
In these course materials, applications are presented using the NumPy, SciPy and statsmodels libraries for the Python programming language. We have some material on getting started with Python that explains how to install Python on your computer or try out our computational notebooks using free online services.
We recommend the following sources of further information on this topic:
Application of multivariate linear regression to predict esophagus cancer, using Python
The Stanford Online (via EdX) class on Statistical Learning introduces supervised learning with a focus on regression and classification methods
Khan Academy material on regression
EdX course The Analytics Edge from MIT
Textbook Regression and other stories by Andrew Gelman, Jennifer Hill, Aki Vehtari (Cambridge University Press, 2020) goes into lots of detail on the difficulties of building regression models (including examples dealing with risk) on real-world data. A PDF version of the book can be downloaded for free.
The online, open-access textbook Forecasting: principles and practice, (uses R rather than Python)