In statistics and machine learning, linear regression is one of the most famous and surely understood algorithms. Most data science lovers and machine learning enthusiasts start their ML tour with linear regression algorithms. In this article, we will investigate how linear regression functions and how it very well may be effectively utilized in your machine learning ventures to construct better models.

### Hypothesis for linear regression:

Here,
• X = input data
• Y = labels to data
• θ1 = intercept
• θ2 = coefficient of X

Linear Regression is one of the machine learning algorithms where the outcome is predicted by the utilization of known parameters that are correlated with the output. It is utilized to predict values inside a continuous range instead of attempting to arrange them into classes. The realized parameters are utilized to make a continuous and constant slope that is utilized to predict the obscure or the outcome.
Let’s say we have a dataset that contains information about the relationship between ‘the number of hours studied’ and ‘marks obtained’. A number of students have been observed and their hours of study along with their grades are recorded. This will be our training data. Our goal is to design a model that can predict the marks if the number of hours studied is provided. Using the training data, a regression line is obtained which will give the minimum error. This linear equation is then used to apply for new data. That is, if we give the number of hours studied by a student as an input, our model should be able to predict their mark with minimum error.
The majority of the machine learning algorithms fall under the supervised learning category. It is the process where an algorithm is used to predict a result based on the previously entered values and the results generated from them. Suppose we have an input variable ‘x’ and an output variable ‘y’ where y is a function of x (y=f{x}). Supervised learning reads the value of entered variable ‘x’ and the resulting variable ‘y’ so that it can use those results to later predict a highly accurate output data of ‘y’ from the entered value of ‘x’. A regression problem is when the resulting variable contains a real or a continuous value. It tries to draw the line of best fit from the data gathered from a number of points.
Before Implementing linear regression you should first have knowledge of

#### Cost Function

By accomplishing the best-fit regression line, the model plans to predict y value to such an extent that the error distinction between the predicted value and real value is minimum. Along these lines, it is essential to update the θ1 and θ2 values, to arrive at the best value that limits the error between predicted y value (pred) and real y value (y).

Cost function(J) of Linear Regression is the Root Mean Squared Error (RMSE) between predicted y value (pred) and real y value (y).

To update θ1 and θ2 values in order to reduce Cost function (minimizing RMSE value) and achieving the best fit line the model uses Gradient Descent. The purpose is, to begin with, random θ1 and θ2 values and then iteratively recondition the values, reaching minimum cost.

### How Linear Regression Works?

How about we take a gander at a situation where linear regression may be helpful: getting more fit. Let us think about that there's an association between what number of calories you take in and the amount you gauge; regression analysis can assist you with understanding that association. Regression analysis will furnish you with a connection that can be imagined into a diagram so as to make predictions about your data. For instance, in the event that you've been gaining weight in the course of the most recent couple of years, it can predict the amount you'll say something the following ten years on the off chance that you keep on expending a similar measure of calories and consume them at a similar rate.
The goal of regression analysis is to make a pattern line dependent on the data you have accumulated. This at that point permits you to determine whether different factors separated from the measure of calories expended influence your weight, for example, the number of hours you sleep, work pressure, level of pressure, kind of activities you do and so forth. Prior to considering, we have to take a gander at these elements and attributes and determine whether there is a correlation between them. Linear Regression would then be able to be utilized to draw a pattern line which would then be able to be utilized to verify or refute the relationship between attributes. In the event that the test is done over quite a while length, broad data can be gathered and the outcome can be assessed all the more precisely. Before the finish of this article, we will construct a model that resembles the underneath picture i.e, determine a line that best fits the data.

### When to use Linear Regression?

Linear Regression’s power lies in its simplicity, which means that it can be used to solve problems across various fields. At first, the data collected from the observations need to be collected and plotted along a line. If the difference between the predicted value and the result is almost the same, we can use linear regression for the problem.

### Assumptions in Linear Regression:

You have to make the following assumptions before implementing linear regression:
• The connection between the dependent and independent variables ought to be practically linear.
• The data is homoscedastic, which means the difference between the results should not be excessive.
• The results got from a perception should not be impacted by the results got from past perceptions.
• The residuals ought to be normally distributed. This supposition implies that the probability distribution function of the residual values is normally distributed at every independent value.
You can determine whether your data meets these conditions by plotting it and afterward doing a touch of delving into its structure.

### Implementing simple linear regression:

For the implementation, I used this dataset from Kaggle. Here is the simplest method to implement linear regression on those data sets.

```
import pandas as pd
import numpy as np

x_test = df_test['x']
x_train = df_train['x']
y_train = df_train['y']

y_train = np.array(y_train)
y_test = df_test['y']

x_train = np.array(x_train)

x_train = x_train.reshape(-1,1)
x_test = np.array(x_test)
y_test = np.array(y_test)

x_test = x_test.reshape(-1,1)
```