Logistics Regression is one of the most popular techniques borrowed by Machine Learning from statistics. It models the probabilities for binary classification problems ie. problems with two possible outcomes. It is used to predict the probabilities of the categorical dependent variable. In simple words logistics, regression predicts the probability of Y being 1 as a function of X. It is the extended version of the linear regression model. Linear regression cannot efficiently perform for classification on the other hand logistics regression can handle classification as well as regression. Linear regression can manage classification by passing the output to the logistic regression. This approach uses a logistics function to squeeze the output of a linear equation into probabilities. It is dichotomous in nature. That logistic function also known as sigmoid function looks like this:
Logistic Regression for Machine Learning
Logistics Regression

Logistic Regression for Machine Learning

The linear regression can classify by applying sigmoid function to it. Here is how we apply sigmoid function to linear regression:

Applying Sigmoid Function to Linear Regression

The function is developed by the statisticians to explain the relationship between population growth and the ecology. The key point to notice in the above equation is that whatever the value of n is given the output will always be between 0 and 1. It is an S-shaped curve that can take any real-valued number and map them into 0's and 1's. And the curve looks like this:
Logistic Regression for Machine Learning


There are mainly three types of logistic regression. They are:

  • Binary Logistic Regression

There can only be two possible outcomes. For Example; It will rain or not, You will win or lose, etc.

  • Multinomial Logistic Regression

It generalizes logistic regression problem to multiclass problems ie. having more than two possible outcomes. For Example predicting which bike is best, etc.

  • Ordinal Logistic Regression

Used when there are multiple solutions in an ordered format. For Example, Finding which route is better to travel.

Some Assumptions of Logistic Regression are:
  • The dependent variable must be either binary or ordinal.
  • Observations should be independent of each other.
  • There is minimal or multicollinearity between independent variables.
  • They require a large sample size to predict.
  • The independent variable is linearly correlated to the log of odds.
Below we will implement the Logistic Regression in python:
For the implementation, we will be working on the dataset by Andres Ng on his  Machine Learning course of Coursera. You can download the data from here.

# imports importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def load_data(path, header):
marks_df = pd.read_csv(path, header=header)
return marks_df
if __name__ == "__main__":
# load the data from the file
data = load_data("data/marks.txt", None)
# X = feature values, all the columns #except the last column
X = data.iloc[:, :-1]
# y = target values, last column of the data frame
y = data.iloc[:, -1]
# filter out the applicants that got admitted
admitted = data.loc[y == 1]
# filter out the applicants that din't get admission
not_admitted = data.loc[y == 0]
# plots
plt.scatter(admitted.iloc[:, 0], admitted.iloc[:, 1], s=10, label='Admitted')
plt.scatter(not_admitted.iloc[:, 0], not_admitted.iloc[:, 1], s=10, label='Not Admitted')
plt.legend()
plt.show()





Post a Comment

Previous Post Next Post