# Linear Regression Explained In this article we will go over what linear regression is, how it works and how you can implement it using Python. First of we will take a look  at simple linear regression and after then we will look at multivariate linear regression.

If you would rather watch a video tutorial you can watch my explanation of Linear Regression here.

## What is Linear Regression?

In statistics, linear regression is a linear approach to modelling the relationship between a dependent variable(y) and one or more independent variables(X). In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated  from the data. Linear Regression is one of the most popular algorithms in Machine Learning. That’s due to its relative simplicity and well known properties.

## Simple Linear Regression

Linear Regression is called simple if you are only working with one independent variable.

$$f(x)=mx+b$$

#### Cost Function

We can measure the accuracy of our linear regression algorithm using the mean squared error (mse) cost function. MSE measures the average squared distance between the predicted output and the actual output (label).

$$Error(m, b)=\frac{1}{N}\sum_{i=1}^{N}(actual\:output - predicted\:output)^{2}$$

The implementation of MSE is pretty straight forward and we can easily code it up only using Python.

def cost_function(m, b, x, y):
totalError = 0
for i in range(0, len(x)):
totalError += (y[i]-(m*x[i]+b))**2
return totalError/float(len(x))

#### Optimization

To find the coefficients that minimize our error function we will use gradient descent. Gradient descent is a optimization algorithm which iteratively takes steps to the local minimum of the cost function.

To find the way towards the minimum we take the derivative of the error function in respect to our slope m and our y intercept b. Then we take a step in the negative direction of the derivative.

$$\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)$$

Gradient Descent Formulars for simple linear regression:

$$\frac{\partial}{\partial m}=\frac{2}{N} \sum_{i=1}^{N}-x_i(y_i-(mx_i+b))$$
$$\frac{\partial}{\partial b}=\frac{2}{N} \sum_{i=1}^{N}-(y_i-(mx_i+b))$$

The implementation of gradient descent is a little bit more involved but it’s also easily doable in pure Python.

def gradient_descent(b, m, x, y, learning_rate, num_iterations):
N = float(len(x))
for j in range(num_iterations): # repeat for num_iterations
for i in range(0, len(x)):
b_gradient += -(2/N) * (y[i] - ((m * x[i]) + b))
m_gradient += -(2/N) * x[i] * (y[i] - ((m * x[i]) + b))
if j%50==0:
print('error:', cost_function(m, b, x, y))
return [b, m]

#### Running Linear Regression

In order to run our linear regression model we need to create a dataset and define a few initial variables. We will use numpy for creating simple random data and matplotlib to visualize it.

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 100, 50)
delta = np.random.uniform(-10, 10, x.size)
y = 0.5*x + 3 + delta

plt.scatter(x, y)

Now that we have our data we are ready to train our model using the functions we defined above:

# defining some variables
learning_rate = 0.0001
initial_b = 0
initial_m = 0
num_iterations= 100

print('Initial error:', cost_function(initial_m, initial_b, x, y))
[b, m] = gradient_descent(initial_b, initial_m, x, y, learning_rate, num_iterations)
print('b:', b)
print('m:', m)
print('error:', cost_function(m, b, x, y))

Now that we have trained our model we can use it to make predictions. In our simple example we will just predict on the training data and use the results to plot a best fit line using matplotlib.

predictions = [(m * x[i]) + b for i in range(len(x))]
plt.scatter(x, y)
plt.plot(x, predictions, color='r')

## Multivariate Linear Regression

Linear Regression is called multivariate if you are working with at least two   independent variables. Each of the independent variables also called features gets multiplied with a weight which is learned by our linear regression algorithm.

$$Formula: f(x)=b+w_1x_1+w_2x_2+...+w_nx_n=b+\sum_{i=1}^{n}w_ix_i$$

#### Cost Function

As a loss function we will use mean squared error just like we did for simple linear regression. The only difference is that now we are getting our predicted output from a different function.

Because we are now working with multiple features and weights it’s easier to code our cost function using numpy:

def cost_function(x, y, w):
dif = np.dot(x,w)-y # difference between f(x) and y output
cost = np.sum(dif**2) / (2*np.shape(x))
return dif, cost

#### Optimization

For optimization purposes we will still use Gradient Descent only that now we need to update not only m and b like we needed to do for simple linear regression but now we need to update each weight.

As an reminder here is the Gradient Descent formula again:

$$\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)$$

We will implement gradient descent using Numpy:

def multivariate_gradient_descent(x, y, w, learning_rate, num_iterations):
for i in range(num_iterations):
dif, cost = cost_function(x, y, w)
gradient = np.dot(x.transpose(), dif) / np.shape(x)
w = w - learning_rate * gradient
if i%500==0:
print('error:', cost)
return w

#### Running Multivariate Linear Regression

As a dataset we will just use the popular Iris dataset from the UCI Machine Learning Repository. Normaly this is a classification problem but we will treat it as an regression problem by using the pental width as our label.

import pandas as pd
from sklearn.preprocessing import LabelEncoder
iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'label'])
le = LabelEncoder()
iris['label'] = le.fit_transform(iris['label'])
X = np.array(iris.drop(['petal_width'], axis=1))
y = np.array(iris['petal_width'])
iris.head()

Now that we have our dataset we are ready to train our model:

learning_rate = 0.0001
num_iterations= 10000
_, num_features = np.shape(X)
initial_weights = np.zeros(num_features)#initialize all weights as 0
weights = multivariate_gradient_descent(X, y, initial_weights, learning_rate, num_iterations)
print(weights)
dif, cost = cost_function(X, y, weights)
print('error: ', cost)

## Regularization

Regularization are techniques used to reduce overfitting. This is really important to create models that generalize well on new data.

Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. For Linear Regression we can decide between two techniques – L1 and L2 Regularization.

For more information on the difference between L1 and L2 Regularization check out the following article:

You can add regularization to Linear Regression by adding regularization term to either the loss function or to the weight update.

L1 regularization:

$$J(\theta)=\frac{1}{2m}\left[\sum_{i=1}^{m}\left(h_{\theta}(x^{(i)})-y^{(i)}\right)^2+\lambda\sum_{j=1}^{n}\left|\theta_j\right|\right]$$

L2 regularization:

$$J(\theta)=\frac{1}{2m}\left[\sum_{i=1}^{m}\left(h_{\theta}(x^{(i)})-y^{(i)}\right)^2+\lambda\sum_{j=1}^{n}\theta_j^2\right]$$

## Resources

Lastly I will like to a few great resources which you can use to learn more about linear regression.

## Conclusion

In this article, we went over what Linear Regression is, how it works and how we can implement it using Python and Numpy.

If you liked this article consider subscribing on my Youtube Channel and following me on social media. 