Logistic Regression in Machine Learning | Easy explanation with code
Updated: Aug 10, 2020
What is Logistic Regression in Machine Learning?
Logistic Regression is a technique used by machine learning and it has been borrowed from statistics. It is basically preferred for binary classifications, which means problems related to two class values.
This is a classification algorithm. It is not a regression algorithm as the name says.
It predicts discrete values or two class values like 0/1, T/F, Y/N.
It also estimates the probability of occurring of an event by data fitting to function called logit function. Therefore, it is also called Logit Regression. It predicts probability so the output lies between 0 and 1.
Logistic Regression with example
Suppose there is a system that detects spam email. So, there are two possibilities YES / NO.
Here logistic regression algorithm performs the task and predicts that if the condition is NO then email is spammed and if the condition is YES then not spammed.
There are many examples out there. This just a simple one if you want more then click me.
Important terms for Logistic Regression
The output is predicted using a non-linear function called Logistic Function.
This function appears like big "S" and it changes every value into 0 and 1. you can see the above figure to understand this line more clearly.
Predictions made by this algorithm can also be used as a probability of given data because it gives the output as 0 OR 1.
It works better when all the unrelated attributes of output and similar attributes are removed. (It will be mentioned in Python code)
Logistic Regression cost function
Logistic Regression in Python
In this given dataset there is information like EstimatedSalary, Purchased, UserID, Gender, Age. We will use this to predict whether a user will buy the company's newly launched product or not.
Logistic Regression Data set
Here I have used this dataset. If you wanna try your own then click me to get the dataset for free.
Logistic Regression Python code
import pandas as pd import numpy as np import matplotlib.pyplot as plt
Loading dataset – User_Data
dataset = pd.read_csv("../.../User_Data.csv")
Extracting depending And independent variables
# input x = dataset.iloc[:, [2, 3]].values # output y = dataset.iloc[:, 4].values
Splitting Dataset into Train and Test
from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
from sklearn.preprocessing import StandardScaler st_x= StandardScaler() x_train= st_x.fit_transform(x_train) x_test= st_x.transform(x_test)
Fitting Logistic Regression to the training set
from sklearn.linear_model import LogisticRegression classifier= LogisticRegression(random_state=0) classifier.fit(x_train, y_train)
Predicting the test set
Creating Confusion Matrix
from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred) print ("Confusion Matrix : \n", cm)
from sklearn.metrics import accuracy_score print ("Accuracy : ", accuracy_score(y_test, y_pred))
Visualizing the performance of the model.
from matplotlib.colors import ListedColormap X_set, y_set = x_test, y_test X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01), np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01)) plt.contourf(X1, X2, classifier.predict( np.array([X1.ravel(), X2.ravel()]).T).reshape( X1.shape), alpha = 0.75, cmap = ListedColormap(('red', 'green'))) plt.xlim(X1.min(), X1.max()) plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)): plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j) plt.title('Classifier (Test set)') plt.xlabel('Age') plt.ylabel('Estimated Salary') plt.legend() plt.show()
Here you have spent your quality time. You have now basic knowledge about logistic regression, examples, equations, codes in python. Have a great time and thank you for giving you valuable time.