How to check for a good regression model by R-squared?

Sanchita Paul
Nov 11, 2020

The smaller the regression error the better is the regression.

Mathematically speaking,

where SST= Total variability, SSR= Explained variability and SSE=Unexplained variability.

Relationship between SST,SSR and SSE can be explained visually by :

If SSR=SST then regression model captures all the observed variability and is perfect

Given a constant total variability lower the error, better the regression power of estimator and higher the power will result in a less powerful regression.

R2 is a intuitive and practical tool that helps statisticians understand variability of data.

R2 describes the proportion of variance of the dependent variable explained by the regression model. If the regression model is “perfect”, SSE is zero, and R2 is 1. If the regression model is a total failure, SSE is equal to SST, no variance is explained by regression, and R2 is zero.

Logically, ‘0’ explains none of the variability and ‘1’ explains entire variability of the data. It is rare and practically R2 exists between 0.2–0.9

What is a good R2? Well, there is no rule of thumb to explain that.

In Physics and Chemistry scientists consider 0.7–0.99 a good R2 value and in social sciences 0.2 is considered fantastic taking the numerous factors into consideration.

We should include more factors for increase in R-squared and reduce errors to make a good model.

--

--

Sanchita Paul

Hi I am Sanchita, an engineer, a math enthusiast, an AlmaBetter Datascience trainee and writer at Analytics Vidhya