Linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression. In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models. The conditional mean of the response given the values of the explanatory variables is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictor.
Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications. This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine. Linear regression has many practical uses. If the goal is prediction, forecasting, or error reduction, linear regression can be used to fit a predictive model to an observed data set of values of the response and explanatory variables. After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make a prediction of the response. If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables, and in particular to determine whether some explanatory variables may have no linear relationship with the response at all, or to identify which subsets of explanatory variables may contain redundant information about the response. The different types of linear regression include the following :-
Simple linear regression 1 dependent variable (interval or ratio), 1 independent variable (interval or ratio or dichotomous)
Multiple linear regression 1 dependent variable (interval or ratio), 2+ independent variables (interval or ratio or dichotomous)
Logistic regression 1 dependent variable (dichotomous), 2+ independent variable(s) (interval or ratio or dichotomous)
Ordinal regression 1 dependent variable (ordinal), 1+ independent variable(s) (nominal or dichotomous) Multinomial regression 1 dependent variable (nominal), 1+ independent variable(s) (interval or ratio or dichotomous)
Discriminant analysis 1 dependent variable (nominal), 1+ independent variable(s) (interval or ratio)
When selecting the model for the analysis, an important consideration is model fitting. Adding independent variables to a linear regression model will always increase the explained variance of the model. However, overfitting can occur by adding too many variables to the model, which reduces model generalizability. Statistically, if a model includes a large number of variables, some of the variables will be statistically significant due to chance alone.
Various methods of estimation can be used to determine the estimates of the parameters of a linear regression model. One of the more commonly used is the ordinary least squares method. In statistics, ordinary least squares is a type of linear least squares method for estimating the unknown parameters in a linear regression model. Ordinary least squares, or linear least squares, estimates the parameters in a regression model by minimizing the sum of the squared residuals. This method draws a line through the data points that minimizes the sum of the squared differences between the observed values and the corresponding fitted values.
Like many statistical analyses, Ordinary Least Squares (OLS) regression has underlying assumptions. When these classical assumptions for linear regression are true, ordinary least squares produces the best estimates. However, if some of these assumptions are not true, you might need to employ remedial measures or use other estimation methods to improve the results. There are 6 mandatory assumptions for linear regression.
OLS Assumption 1: The regression model is linear in the coefficients and the error term This assumption addresses the functional form of the model. In statistics, a regression model is linear when all terms in the model are either the constant or a parameter multiplied by an independent variable. You build the model equation only by adding the terms together.
OLS Assumption 2: The error term has a population mean of zero The error term accounts for the variation in the dependent variable that the independent variables do not explain. Random chance should determine the values of the error term. For your model to be unbiased, the average value of the error term must equal zero.
OLS Assumption 3: All independent variables are uncorrelated with the error term.