Regression to the mean (RTM) is a statistical phenomenon wherein, if a sample of a random variable is extreme, the next sample is expected to be closer to the mean. This project aimed to analyse the impacts of RTM to better understand the bias it creates.
The dataset used was collected for a study exploring childhood obesity and heart disease. The selection criteria meant the initial weights of the participants were extreme. By mathematically describing the effect of RTM, evidence was found to suggest that RTM was acting upon the predictor, unimpacted by an external factor.
To analyse the effects of RTM on model outputs, new samples were simulated using the parameters of the original data, upon which varying levels of RTM were artificially induced.
It was found that, when a sample was more heavily affected by RTM, the distribution of its model coefficients was more spread out. This effect was stronger in models that contained interaction terms and differed between variable choice. Thus, considering the model complexity and variable choice is essential when attempting to minimise the effects of RTM.
These findings have implications for any study that involves truncated samples. An important area for future research is to use these results should in tandem with existing literature to determine methods of minimising the effects of RTM when analysing data from such studies.