Linear regression
Linear regression
Linear regression is a fundamental statistical technique used for predictive modelling the relationship between a dependent variable (also called the outcome or target variable) and one or more independent variables (also called features).
- Linear regression assumes a linear relationship between the independent variables and the dependent variable
- The goal of linear regression is to find the best-fitting linear equation that predicts the dependent variable based on the independent variables
Variables
- Dependent Variable (y): The variable that you want to predict or explain. It’s the outcome or response variable.
- Independent Variable(s) (X): The variable(s) used to predict the dependent variable. In simple linear regression, there’s one independent variable. In multiple linear regression, there are multiple independent variables.
Linear equation (one feature)
The linear regression equation with one independent variable can be represented as: $$y = θ_0 + θ_1*x$$
- is the dependent variable (prediction) y
- is the independent variable (feature) x
- is called intercept or bias, i.e. the value of when θ0 y x = 0
- is the slope coefficient, indicating the change of for a unit change of θ1 y x
Linear equation (multiple features)
The linear regression equation with multiple features can be represented as: $$\mathbf{y} = \theta_0 \mathbf{x}_0 + \theta_1 \mathbf{x}_1 + \dots + \theta_n \mathbf{x}_n$$ where n is the number of features.
Vectorised equation
Vectorised form: y = hθ(x) = θ ⋅ x
is the dot product of the two vectors and θ ⋅ x θ x
is the hypothesis function, using the model parameters hθ(x) θ
Matrix multiplication: y = hθ(x) = θ⊤x
In machine learning, vectors are often represented as column vectors
- is the matrix multiplication of and θ⊤x θ⊤ x The result is the same, except, formally, it is a one-cell matrix instead of a scalar value
Cost function
Training a Linear Regression model means finding the vector of parameters so that the model best fits the training data θ
The understand how good (or poorly) a linear regression model fits the training data, we use the Mean Squared Error (MSE), a common metric used for regression problems.
$$MSE(\mathbf{X}, h_{\theta}) = \frac{1}{m} \sum_{i=1}^{m} (\theta^{\top} \mathbf{x}^{(i)} - \mathbf{y}^{(i)})^2$$
where is the label (desired output) of the training data point y(i) x(i).
Training the model means finding the vector that minimises the MSE on the training data θ.
Polynomial regression
Polynomial regression is a type of regression analysis where the relationship between the independent variable and the dependent variable can be modelled as an -degree polynomial x y d
It extends linear regression by allowing for curves in the data, making it useful for capturing non-linear relationships between the two variables.
The polynomial regression equation of degree can be expressed as follows: d
$$\mathbf{y} = \theta_0 + \theta_1 \mathbf{x} + \theta_2 \mathbf{x}^2 + \dots + \theta_n \mathbf{x}^d$$
Properties of polynomial regression
- Polynomial terms: Polynomial regression includes higher-order terms ( , etc.) in addition to the linear term ( ). These higher-order terms allow the model to capture curvature and nonlinear patterns in the data x2 , x3 x
- Overfitting: While polynomial regression can fit a wide range of curves, using a high-degree polynomial (large d) can lead to overfitting. Overfitting occurs when the model fits the noise in the data rather than the underlying relationship, resulting in poor performance on new, unseen data.
Polynomial regression with multiple features
With multiple features, the equation contains the high-order terms of each variable, plus all the combinations between them. For instance, with two features and and degree , the set of high-order terms includes , , and , as well as the combinations , and . x1 x2 d = 3 x2 1 x2 2 x3 1 x3 2 x1x2 x2 1 x2 x1x2 2
Note: Given n features, the resulting number of polynomial features with degree is d (n + d)! d!n!
Beware the combinatorial explosion of the number of features!
Overfitting the training data