- Sheikh Aman

# Top 7 regression techniques in machine learning you should know.

Updated: Aug 10, 2020

**Topics you will learn.**

**What is the regression analysis?****Why we use regression analysis?****How many sorts of regression techniques can we have?**

Linear Regression

Logistic Regression

Polynomial Regression

Stepwise Regression

Ridge Regression

Lasso Regression

ElasticNet Regression

**What is Regression Analysis?**

Regression analysis is a sort of predictive modelling technique which investigates the connection between a dependent (target) and experimental variable (s) (predictor). this system is employed for forecasting, statistic modelling and finding the causal effect relationship between the variables. Just like, the connection between rash driving and therefore the number of road accidents by a driver is best studied through regression.

Regression analysis is a crucial tool for modelling and analyzing data. Here, we fit a curve/line to the info points, in such a fashion that the differences between the distances of knowledge points from the curve or line are minimized. I’ll explain this in additional details within the coming sections.

**Why we use Regression Analysis?**

As mentioned above, it estimates the connection between two or more variables. Let’s understand this with a simple example:

Let’s say, you would like to estimate growth in sales of a corporation supported current economic conditions. you've got the recent company data which indicates that the expansion in sales is around two and a half times the expansion within the economy. Using this insight, we will predict future sales of the corporate supported current & past information.

There are multiple benefits of using multivariate analysis. they're as follows:

It indicates the many relationships between the variable and the experimental variable.

It indicates the strength of the impact of multiple independent variables on a variable.

It also helps us to match the consequences of variables measured on different scales, like the effect of price changes and a number of promotional activities. These benefits help market researchers/data analysts/data scientists to eliminate and evaluate the simplest set of variables to be used for building predictive models.

**How many sorts of regression techniques can we have?**

There are various sorts of regression techniques available to form predictions. These techniques are mostly driven by three metrics (number of independent variables, sort of dependent variables and shape of regression line). We’ll discuss them intimately within the following sections.

**Regression types in machine learning.**

For the creative ones, you'll even cook up new regressions, if you are feeling the necessity to use a mixture of the parameters above, which individuals haven’t used before. But before you begin that, allow us to understand the foremost commonly used regressions:

__1. Linear regression__

It is one of the foremost widely known modelling technique. rectilinear regression is typically among the primary few topics which individuals pick while learning predictive modelling. during this technique, the variable is continuous, the independent variable(s) are often continuous or discrete, and therefore the nature of the regression curve is linear.

Linear Regression establishes a relationship between the variable (Y) and one or more independent variables (X) employing the best fit line (also referred to as a regression line).

It is represented by an equation Y=a+b*X + e, where a is that the intercept, b is that the slope of the road and e is that the error term. This equation is often wont to predict the worth of the target variable supported a given predictor variable(s).

**How to obtain best-fit line ( for Value of a and b)?**

This task is often easily accomplished by the smallest amount of Square Method. it's the foremost common method used for fitting a regression curve. It calculates the best-fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each datum to the road. Because the deviations are first squared, when added, there's no cancelling out between positive and negative values.

**Important Points:**

There must be a linear relationship between independent and dependent variables

Multiple regression suffers from multicollinearity, autocorrelation, heteroskedasticity.

Linear Regression is extremely sensitive to Outliers. It can terribly affect the regression curve and eventually the forecasted values.

Multicollinearity can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes within the model. The result's that the coefficient estimates are unstable

In the case of multiple independent variables, we will accompany forwarding selection, backward elimination and stepwise approach for selection of the most vital independent variable.

__2. Logistic Regression__

Logistic regression is employed to seek out the probability of event=Success and event=Failure. we should always use logistic regression when the variable is binary (0/ 1, True/ False, Yes/ No) in nature. Here the worth of Y ranges from 0 to 1 and it is often represented by the subsequent equation.

```
odd= p/ (1-p) = probability of event_occurrence / probability of not_event_occurrence
ln(odd) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk
```

Above, p is the probability of the presence of the characteristic of interest. an issue that you simply should ask here is “why have we used to log within the equation?”.

Since we are working here with a Bernoulli distribution (dependent variable), we'd like to settle on a link function which is best fitted to this distribution. And, it's a logit function. within the equation above, the parameters are chosen to maximise the likelihood of observing the sample values instead of minimizing the sum of squared errors (like in ordinary regression).

**Important Points:**

Logistic regression is widely used for classification problems

Logistic regression doesn’t require a linear relationship between dependent and independent variables. It handles various sorts of relationships because it applies a non-linear log transformation to the anticipated odds ratio.

To avoid overfitting and underfitting, we should always include all significant variables. an honest approach to make sure this practice is to use a stepwise method to estimate the logistic regression

The independent variables shouldn't be correlated with one another i.e. no multicollinearity. However, we have the choices to incorporate interaction effects of categorical variables within the analysis and within the model.

If the values of the variable are ordinal, then it's called Ordinal logistic regression

If a variable is multi-class then it's referred to as Multinomial Logistic regression.

## 3. Polynomial Regression

A regression of y on x may be a polynomial regression of y on x if the facility of the experimental variable is quite 1. The equation below represents a polynomial equation:

`y=a+b*x^2`

In this regression technique, the simplest fit line isn't a line. it's rather a curve that matches into the info points.

**Important Points:**

While there could be a temptation to suit a better degree polynomial to urge lower error, this will end in over-fitting. Always plot the relationships to ascertain the fit and specialise in ensuring that the curve fits the character of the matter. example, how plotting can help:

Especially look out for curve towards the ends and see whether those shapes and trends add up. Higher polynomials can find yourself producing weird results on extrapolation.

## 4. Stepwise Regression

This form of regression is employed once we affect multiple independent variables. during this technique, the choice of independent variables is completed with the assistance of an automatic process, which involves no human intervention.

This feat is achieved by observing statistical values like AIC, T- stats and R-square metric to discern significant variables. Stepwise regression basically fits the regression model by adding/dropping co-variates one at a time supported a specified criterion. a number of the foremost commonly used Stepwise regression methods are listed below:

Standard stepwise regression does two things. It adds and removes predictors as required for every step.

Forward selection starts with the most vital predictor within the model and adds variable for every step.

Backward elimination starts with all predictors within the model and removes the smallest amount of significant variable for every step.

The aim of this modelling technique is to maximise the power of prediction with a minimum number of predictor variables. it's one among the tactic to handle higher dimensionality of knowledge set.

## 5. Ridge Regression

Ridge Regression may be a technique used when the info suffers from multicollinearity (independent variables are highly correlated). In multicollinearity, albeit the smallest amount squares estimates (OLS) are unbiased, their variances are large which deviates the observed value faraway from truth value. By adding a degree of bias to the regression estimates, ridge regression reduces the quality errors.

Above, we saw the equation for rectilinear regression. Remember? It is often represented as:

`Y=a+ b*X`

This equation also has a mistaken term. the entire equation becomes:

```
Y=a+b*Y+e (error term), [error term is that the value needed to correct for a prediction error between the observed and predicted value]
=> Y=a+Y= a+ b_1 X 1+ b_2 X 2+....+e, for every multiple independent variables.
```

In an equation, prediction errors are often decomposed into two sub-components. First is thanks to the biased and second is thanks to the variance. Prediction error can occur thanks to anybody of those two or both components. Here, we’ll discuss the error caused thanks to variance.

Ridge regression solves the multicollinearity problem through a shrinkage parameter called lambda (λ). check out the equation below.

In this equation, we've two components. First one is a least-square term and the other one is lambda of the summation of β2 (beta- square) where β is that the coefficient. this is often added to least-square term so as to shrink the parameter to possess a really low variance.

**Important Points:**

The assumptions of this regression are the same as least squared regression except normality isn't to be assumed

Ridge regression shrinks the worth of coefficients but doesn’t reach zero, which suggests no feature selection feature

This method uses

__L2 regularizations.__

## 6. Lasso Regression

Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection Operator) also penalizes absolute dimensions of the regression coefficients. additionally, it's capable of reducing the variability and improving the accuracy of rectilinear regression models. check out the equation below: Lasso regression differs from ridge regression during a way that it uses absolute values within the penalty function, rather than squares. This leads to penalizing values which cause a variety of the parameter estimates to point out exactly zero. Larger the penalty applied, further the estimates get shrunk towards temperature. This results in variable selection out of given n variables.

**Important Points:**

The assumptions of lasso regression are the same as least squared regression except normality isn't to be assumed

This regression shrinks coefficients to zero (exactly zero), which certainly helps in feature selection

Lasso may be a regularization method and uses

__L1 regularization__If a group of predictors are highly correlated, lasso picks just one of them and shrinks the others to zero

## 7. ElasticNet Regression

This regression technique is a hybrid of Lasso and Ridge Regression techniques. it's trained with L1 and L2 prior as regularizer. Elastic-net is beneficial when there are multiple features which are correlated. Lasso is probably going to select one among these random, while elastic-net is probably going to select both.

A practical advantage of trading-off between Lasso and Ridge regression is that it allows Elastic-Net to inherit a variety of Ridge’s stability under rotation.

**Important Points:**

It encourages group effect just in case of highly correlated variables

There are not any limitations on the number of selected variables

It can suffer from double shrinkage

Beyond these 7 most ordinarily used regression techniques, you'll also check out other models like Bayesian, Ecological and Robust regression.

If you love this then don't forget to give heart by clicking like and sharing.