Multiple Linear Regression (MLR) allows us to find a linear relationship between multiple input variables and a single dependent output variable. This tutorial is going to be given in matrix notation. For more on matrix notation, including the rules of matrix multiplication, I suggest visiting the wikipedia page on the subject here. We are again going to have to define some additional notation.
- is the number of observations in the data set.
- is the number of predictor variables used in the regression.
- is the vector of dependent variable observations. It is observations long, 1 variable wide.
- is the matrix of dependent variables. It includes a vector of 1’s as the first column to represent the intercept. Its dimensions are observations long, and observations wide. (Note: If we are purposely not including the intercept in the model, then we omit the leading vector of 1’s, and the dimensions of are .)
- is the vector of coefficients. Its dimensions are long, 1 wide. (Also note: If not including the intercept, is dropped, and the dimensions of are .)
- is the vector of errors.
I will not be keeping up the subscript notation all the way through this tutorial. Just remember that it is there. The model in multiple linear regression in matrix form is as follows:
Expanded, it appears as
Deriving The Vector
We are still using Ordinary Least Squares (OLS) regression, so we will use the assumptions of that to generate the estimates of the coefficients. Because we don’t know what the actual errors are, the observed errors will be denoted by an . So we want to minimize .
Take the derivative with respect to to yield
This gives us the ‘Normal Equations’.
If the inverse of exists, then we can solve the normal equations, yielding our estimator for , .
Now, This gives us the estimator for . We need to verify that it is unbiased, and develop the variance of this estimator.
So the estimator is unbaised. The Variance can be calculated as follows.
So, . This will be necessary for estimating the Confidence Interval
Confidence Interval for
The Confidence Interval for is derived from the relationship between the normal distribution an the Student’s T distribution. The Confidence Interval for any element of the vector, can be defined as
Where is the sample standard deviation of the residuals, and is the jj’th element of
The Hat (or Projection) Matrix
The projection matrix will be important later on. For now, let us derive it.