If you missed part one in the series, you can start here (Machine learning basics part 1: An overview).
Linear Regression is a straightforward way to find the linear relationship between one or more variables and a predicted target using a supervised learning algorithm. In simple linear regression, the model predicts the relationship between two variables. In multiple linear regression, additional variables that influence the relationship can be included. Output for both types of linear regression is a value within a continuous range.
Simple Linear Regression: Linear Regression works by finding the best fit line to a set of data points.
For example, a plot of the linear relationship between study time and test scores allows the prediction of a test score given the amount of hours studied.
To calculate this linear relationship, use the following:
In this example, ŷ is the predicted value, x is a given data point, θ1 is the feature weight, and θ0 is the intercept point, also known as the bias term. The best fit line is determined by using gradient descent to minimize the cost function. This is a complex way of saying the best line is one that makes predictions closest to actual values. In linear regression, the cost function is calculated using mean squared error (MSE): #331b9
Mean Squared Error for Linear Regression1
In the equation above, the letter m represents the number of data points, ????T is the transpose of the model parameters theta, x is the feature value, and y is the prediction. Essentially, the line is evaluated by the distance between the predicted values and the actual values. Any difference between predicted value and actual value is an error. Minimizing mean squared error increases the accuracy of the model by selecting the line where the predictions and actual values are closest together.
Gradient descent is the method of iteratively adjusting the parameter theta (????) to find the lowest possible MSE. A random parameter is used initially and each iteration of the algorithm takes a small step—the size of which is determined by the learning rate—to gradually change the value of the parameter until the MSE has reached the minimum value. Once this minimum is reached, the algorithm is said to have converged.
Be aware that choosing a learning rate that is smaller than ideal will result in an algorithm that converges extremely slowly because the steps it takes with each iteration are too small. Choosing a learning rate that is too large can result in a model that never converges because step size is too large and it can overshoot the minimum.
Learning Rate set too small1
Learning Rate set too large1
Multiple Linear Regression3
Thanks for reading our machine learning series, and keep and eye out for our next blog!
- Geron, Aurelien (2017). Hands-On Machine Learning with Scikit-Learn & TensorFlow. Sebastopol, CA: O’Reilly.