Simple Linear Regression: Machine Learning

Joseph James (JJ)
3 min readJan 24, 2019

--

Once we have the data with two variables, one very important question is how the variables are related. For example, we could ask for the relationship between people’s weights and heights, or study time and test scores, or years experience and salary.

Regression is a set of techniques for estimating their relationships, and in this article, we’ll focus on finding one of the simplest type of relationship: “linear”. This process is unsurprisingly called linear regression, and it has many different applications. For example, take the case of sales:

  • To find the relationship between the advertising budget and sales.
  • To find how strong the relationship is between the advertising budget and sales.
  • To find which media contribute to sales.
  • To find how accurately we can predict future sales.

Now let’s have a quick look into an example which demonstrates the implementation of simple linear regression.

Here we have employee data of a company. Our aim is to predict the salary of an employee given his experience.

Salary is the dependent variable “y”

Years of experience is the independent variable “X”, henceforth we will speak in terms of X and y.

As said earlier we try to make a relation between X and y and this is how the scatterplot looks like

In Simple Linear Regression, we draw a line in a scatter plot in such a way that the distance between all points and the line is minimal. This process is called Fitting.

The above-mentioned fitting line will be in the form:

y = C + wX

C => Constant(Bias)

W => Coefficient of Independent variable.

In other words, we draw a line y = C + wX which is having least distances with the above data points in the scatter plot (by adjusting C and w). In simple words, C is the minimum Salary you get with no experience and w is the increase in salary per year. C and w determine the relationship between X and y.

One way of representing the fitted line is

Now let's see the coding part: of Simple Linear Regression in Python(using sklearn).

prerequisite:

  • python 3.6+ Download Python
  • pandas library (pip3 install pandas)
  • matplotlib library (pip3 install matplotlib)
  • scikitlearn library (pip3 install sklearn)
  • dataset : Github
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cross_validation import train_test_split
data_set = pd.read_csv(“Salary_Data.csv”)
X = data_set.iloc[:,:-1].values
y = data_set.iloc[:,-1:].values
from sklearn.linear_model import LinearRegressionregressor = LinearRegression()
regressor.fit(X_train,y_train)
regressor = LinearRegression()
regressor.fit(X_train,y_train)
plt.scatter(X_train,y_train, color = ‘green’)
plt.plot(X_train,regressor.predict(X_train),color = ‘blue’)

--

--

Joseph James (JJ)
Joseph James (JJ)

No responses yet