Linear regression is one of the most basic methods used to make predictions. It is a very widely used method used to predict the relation between the variables which satisfy the linear equation. The equation is as follows,
y = β1*x + β0
In the above equation, ‘y’ is the dependent variable or the value of the outcome, ‘x’ is the independent variable (explanatory variable) or the predictor of the analysis, β0 is the intercept, and β1 is the slope of the line. If there exists a single explanatory variable it is called simple linear regression, and if there exists more than one explanatory variable it is called multivariate linear regression. Regression models which depend linearly on the equation’s parameters are simpler to plot and understand when compared to non-linear models.
The linear regression equation is represented by y = β1*x1 + β2*x2 + … + βn*xn,
where, ‘y’ is the dependent variable, while x1, …xn are the independent variables, and β1, … βn are the regression coefficients.
In the following plot on Skin Cancer’s dependency on State Latitude, we can say that the Mortality (dependent variable) depends on the state location (independent variable) according to the equation y = 389.2 – 5.98 * x. Each data point represents a relationship between the independent and dependent variable. The independent variable is plotted on the x-axis and the dependent variable is plotted on the y-axis. The plot depicts a negative linear relationship between the two variables. At higher latitudes, the risk of death due to skin cancer is more likely to be lower than that at lower latitudes. The red line is a line which best fits the scatter plot and is known as the regression line.
For the above relationship, we can use the simple linear regression equation to estimate the equation of the regression line. The regression equation describes the dependency of the mortality rate on the state latitude within the date set’s range of values.
Linear regression models are commonly fitted using the Least Squares Method. This method can also be used to fit non-linear models too. The Least Squares Method is a type of regression analysis that finds the best fit line for a data set by minimizing the sum of squares of errors produced by the equations.