Linear Regression using Automobile mpg Data

Published in

Dev Genius

4 min readOct 28, 2022

The following will be discussed in this article. We will use a linear regression model to discover a relationship between two variables in an automobile mpg dataset.

Linear Regression
Least Squares Method
Automobile mpg Dataset
Implementing Simple Linear Regression
Implementing Piecewise Linear Regression

Linear Regression

Linear regression models the relationship between two variables by fitting a linear equation to observed data.

Or, as Josh Starmer explains, the concept behind linear regression is simply fitting a line to data with least squares and R-squared.

Least-Squares Method

Credit: https://www.jmp.com/en_in/statistics-knowledge-portal/what-is-regression/the-method-of-least-squares.html

The least-Squares method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line.

We will use these methods to find a relationship between two variables in the automobile mpg data.

Automobile mpg Dataset

The Automobile mpg dataset is from the UC Irvine Machine Learning Repository, which maintains 622 data sets as a service to the machine learning community.

https://archive.ics.uci.edu/ml/datasets/auto+mpg

As shown in the detailed description of the dataset found on the website, there are 398 rows and 9 columns and missing values. The 9 attributes information is described below.

1. mpg: continuous
2. cylinders: multi-valued discrete
3. displacement: continuous
4. horsepower: continuous
5. weight: continuous
6. acceleration: continuous
7. model year: multi-valued discrete
8. origin: multi-valued discrete
9. car name: string (unique for each instance)

The data concerns city-cycle fuel consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5 continuous attributes (Quinlan, 1993).