Nested Data

Nested data is data for which a variable (or set of variables) signifies an observation as belonging to a group. We might refer to simple nesting (with 1 layer of groups) as categorical. Use of categorical data with linear models is called ANOVA. [i](Analysis of Variance)[/i]

To illustrate, let’s start with the linear model in matrix notation.

    \begin{equation*} Y = X\beta + \varepsilon\\ \end{equation*}

    \begin{equation*} \begin{pmatrix} y_1\\ y_2\\ \vdots\\ y_n\\ \end{pmatrix} = \begin{pmatrix} 1 & x_{1,1} & x_{1,2} & \ldots & x_{1,p}\\ 1 & x_{2,1} & x_{2,2} & \ldots & x_{2,p}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & X_{n,1} & x_{n,2} & \ldots & x_{n,p}\\ \end{pmatrix} \begin{pmatrix} \beta_0\\ \beta_1\\ \beta_2\\ \vdots\\ \beta_p\\ \end{pmatrix} + \begin{pmatrix} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_n\\ \end{pmatrix} \end{equation*}

  • Y is the vector of dependent observations.
  • X is the detailing the observed values of independent variables. The first column is the mean or independent variable intercept.
  • \beta describes the parameters of the model. It is a vector of coefficients, meant to be multiplied against X to get the vector of predicted values, \hat{Y}.
  • \varepsilon is the vector of random errors. We assume that \varepsilon \sim N(0, I\sigma^2).

ANOVA
Classical Effects Model with One Way ANOVA

I had said before that use of categorical variables within the linear regression framework can be termed as ANOVA. Let’s look at how that works. Suppose we have a categorical variable, which defines what group an observation is in. Also suppose that variable can take one of 4 possible levels, signifying 4 possible groups. Within each group, we have 2 observations.

Membership within each group is signified by a dummy variable, taking on the values of either 0 or 1. Because we are treating the model parameters as fixed but unknown, we call the Classical Effects model a Fixed Effects Model.

    \begin{equation*} Y_{i,j} = \mu + \alpha_i + \varepsilon_{i,j}\hspace{1cm}t = 4,r= 2\hspace{.1cm}\forall t\\ \end{equation*}

    \begin{equation*} \begin{pmatrix} y_{1,1}\\ y_{1,2}\\ y_{2,1}\\ \vdots\\ y_{4,2}\\ \end{pmatrix} = \begin{pmatrix} 1 & 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 & 1 \\ \end{pmatrix} \begin{pmatrix} \mu\\ \alpha_1\\ \alpha_2\\ \alpha_3\\ \alpha_4\\ \end{pmatrix} + \begin{pmatrix} \varepsilon_{1,1}\\ \varepsilon_{1,2}\\ \varepsilon_{2,1}\\ \vdots\\ \varepsilon_{4,2}\\ \end{pmatrix} \end{equation*}

  • \alpha_i is the coefficient associated with group i.
  • \varepsilon_{i,j} is the per-observation error associated with group i, individual j.

This method of defining the variable presents a problem, though. When we attempt to take the affine transformation and solve the standard regression problem, we must take the inverse of the matrix X'X. Using the above X matrix, the Affine transformation X'X is singular, and thus not invertible.

The solution is in realizing that in order to know what of 4 groups an observation belongs to, we only need 3 dummy variables. If we treat one group as control, and all other groups as deviations from that control, we can accomplish the task.

    \begin{equation*} \begin{pmatrix} y_{1,1}\\ y_{1,2}\\ y_{2,1}\\ \vdots\\ y_{4,2} \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} \mu\\ \alpha_2\\ \alpha_3\\ \alpha_4 \end{pmatrix} + \begin{pmatrix} \varepsilon_{1,1}\\ \varepsilon_{1,2}\\ \varepsilon_{2,1}\\ \vdots\\ \varepsilon_{4,2} \end{pmatrix} \end{equation*}

Thus, we interpret \alpha_i as deviation from the control group. The matrix X'X is now invertible, and thus we can solve for \alpha‘s 2,3,4 as a deviation from \mu.

An alternate, though less used parameterization would be to remove the intercept \mu and have individual intercepts per group.

    \begin{equation*} \begin{pmatrix} y_{1,1}\\ y_{1,2}\\ y_{2,1}\\ \vdots\\ y_{4,2} \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} \alpha_1\\ \alpha_2\\ \alpha_3\\ \alpha_4 \end{pmatrix} + \begin{pmatrix} \varepsilon_{1,1}\\ \varepsilon_{1,2}\\ \varepsilon_{2,1}\\ \vdots\\ \varepsilon_{4,2} \end{pmatrix} \end{equation*}

This parameterization is called the Means Model. The weakness of this parameterization is we are given no information about the significance of differences between any particular group and the control group, though it does allow simple interpretation of the means of the various groups.

Two Way ANOVA

Dimensions of ANOVA are defined by the way in which the categories are stacked. One Way ANOVA refers to a single layer of groups. Two Way ANOVA refers to two layers, nesting sets of groups within groups.

As an example, let’s think of students in a classroom. Each individual student is an observation. These observations are correlated by the fact they belong to the same class. At this level, they are uncorrelated to students in other classes. We identify individual observations with the subscript notation i,j, denoting the i‘th group and j‘th observation within that group.

    \begin{equation*} \text{Classroom}_i\hspace{.5cm}\ni\hspace{.5cm}\text{Student}_j \end{equation*}

Let’s say a school has 5 classrooms, with 30 kids per class. In this case, we can make comparitive statements between classrooms using One-Way ANOVA. If we decided to include other schools, and wanted to make comparitive statements between schools, then we would need to nest the groups of classrooms per school within each school. This adds another layer, and we now call this [i]Two Way ANOVA[/i]. We identify individual observations using the subscript i,j,k, denoting the i‘th top level group, the j‘th nested group, and the k‘th observation within that group.

    \begin{equation*} \text{School}_i\hspace{.5cm}\ni\hspace{.5cm}\text{Classroom}_j\hspace{.5cm}\ni\hspace{.5cm}\text{Student}_k \end{equation*}

This idea of nesting can be extended ad-infinitum.