Multiple Linear Regression

Multiple Linear Regression (MLR) allows us to find a linear relationship between multiple input variables and a single dependent output variable. This tutorial is going to be given in matrix notation. For more on matrix notation, including the rules of matrix multiplication, I suggest visiting the wikipedia page on the subject here. We are again going to have to define some additional notation.

  • n is the number of observations in the data set.
  • p is the number of predictor variables used in the regression.
  • Y_{n\times 1} is the vector of dependent variable observations. It is n observations long, 1 variable wide.
  • X_{n\times p'} is the matrix of dependent variables. It includes a vector of 1’s as the first column to represent the intercept. Its dimensions are n observations long, and p' = p+1 observations wide. (Note: If we are purposely not including the intercept in the model, then we omit the leading vector of 1’s, and the dimensions of X are n \times p.)
  • \beta_{p'\times 1} is the vector of coefficients. Its dimensions are p'=p+1 long, 1 wide. (Also note: If not including the intercept, \beta_0 is dropped, and the dimensions of \beta are p\times 1.)
  • \err_{n\times 1} is the vector of errors.

I will not be keeping up the subscript notation all the way through this tutorial. Just remember that it is there. The model in multiple linear regression in matrix form is as follows:

    \begin{equation*} Y = X\beta + \err \end{equation*}

Expanded, it appears as

    \begin{equation*} \begin{bmatrix} y_{1} \\ y_2\\ \vdots \\ \y_n\\ \end{bmatrix}  =  \begin{bmatrix} 1 & x_{1,1} & \cdots & x_{1,p} \\ 1 & x_{2,1} & \cdots & x_{2,p} \\ \vdots & \vdots & \ddots &\vdots \\ 1 & x_{n,1} & \cdots & x_{n,p} \\ \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \\ \end{bmatrix}  + \begin{bmatrix} \err_{1}\\ \err_{2}\\ \vdots\\ \err_{n}\\ \end{bmatrix} \end{equation*}

Deriving The \beta Vector

We are still using Ordinary Least Squares (OLS) regression, so we will use the assumptions of that to generate the estimates of the coefficients. Because we don’t know what the actual errors are, the observed errors will be denoted by an e. So we want to minimize \sum(e^2) = e'e.

    \begin{align*} e'e &= (Y-X\bhat)'(Y-X\bhat)\\ &= Y'Y - \bhat'X'Y - Y'X\bhat + \bhat'X'X\bhat\\ &= Y'Y - 2\bhat'X'Y + \bhat'X'X\bhat \end{align*}

Take the derivative with respect to \bhat to yield

    \begin{equation*} \frac{\partial e'e}{\partial \bhat} = -2X'Y + 2X'X\bhat = 0 \end{equation*}

This gives us the ‘Normal Equations’.

    \begin{equation*} X'X\bhat = X'Y \end{equation*}

If the inverse of X'X exists, then we can solve the normal equations, yielding our estimator for \beta, \bhat.

    \begin{align*} (X'X)^{-1}(X'X)\bhat &= (X'X)^{-1}X'Y\\ \bhat &= (X'X)^{-1}X'Y \end{align*}

Now, This gives us the estimator for \beta. We need to verify that it is unbiased, and develop the variance of this estimator.

    \begin{align*} \bhat &= [(X'X)^{-1}X']Y\\ E(\bhat) &= [(X'X)^{-1}X']E(Y)\\ &=(X'X)^{-1}(X'X)\beta\\ &=\beta \end{align*}

So the estimator is unbaised. The Variance can be calculated as follows.

    \begin{align*} Var(\bhat) &= [(X'X)^{-1}X']Var(Y)[(X'X)^{-1}X']'\\ &= [(X'X)^{-1}X']I\sigma^2[(X'X)^{-1}X']'\\ &= (X'X)^{-1}\sigma^2 \end{align*}

So, \bhat \sim N(\beta,(X'X)^{-1}\sigma^2). This will be necessary for estimating the Confidence Interval

Confidence Interval for \bhat
The Confidence Interval for \bhat is derived from the relationship between the normal distribution an the Student’s T distribution. The Confidence Interval for any element of the \bhat vector, \bhat_j can be defined as

    \begin{equation*} 100(1-\alpha)\%\text{ CI }\bhat_j = \bhat_j \pm t_{\frac{\alpha}{2},n-p'}SC_{jj} \end{equation*}

Where S is the sample standard deviation of the residuals, and C_{jj} is the jj’th element of (X'X)^{-1}

The Hat (or Projection) Matrix
The projection matrix will be important later on. For now, let us derive it.

    \begin{align*} \yhat &= X\bhat\\ &= X[(X'X)^{-1}X'Y]\\ &= [X(X'X)^{-1}X']Y\\ &= PY\\ P &= [X(X'X)^{-1}X'] \end{align*}