Category Archives: Theory

The theoretical background for the applied techniques.

Nested Data

Nested data is data for which a variable (or set of variables) signifies an observation as belonging to a group. We might refer to simple nesting (with 1 layer of groups) as categorical. Use of categorical data with linear models is called ANOVA. [i](Analysis of Variance)[/i] To illustrate, let’s start with the linear model in […]

Power and Sample size for Proportion Data

An often used method in applied statistics is determining the sample size necessary to view statistically significant results. Given the intended power, we can calculate the required sample size. Given the intended sample size, we can calculate the resulting power. Before we go in to how this works, we need to define a few things. […]

Power and Sample Size for Normally Distributed Data

An often used method in applied statistics is determining the sample size necessary to view statistically significant results. Given the intended power, we can calculate the required sample size. Given the intended sample size, we can calculate the resulting power. Before we go in to how this works, we need to define a few things. […]

Logistic Regression

Logistic Regression is a method of classification using the regression framework. In logistic regression, the output (or target, or dependent) variable is a binary variable, taking values of either . The predictor (or input, or independent) variables are not limited in this way, and can take any value. Logistic regression is based on the logistic […]

Bagging

Ensemble methods combine multiple classifiers into a single output. Some ensemble methods may combine different types of classifiers, but the ones we will focus on here combine multiple iterations of the same type of classifier. These methods belong to a family of ensemble methods called “Perturb and Combine”. Perturb and Combine Some methods of classification […]

Decision Trees

Introduction to Tree Methods Terminology CART Methodology Grow a Large Initial Tree Binary Questions Goodness of Split Criterion Goodness of Split Measure Pruning the Tree Cost Complexity Measure Tree Size Selection Test Sample Method Cross-Validation Method v-Fold Cross-Validation Introduction to Tree Methods Tree methods are a supervised learning method. This means that there is a […]

Artificial Neural Network

Artificial Neural Networks are methods of classification and/or regression meant to emulate our belief about how a brain or nervous system functions. There exists a network of nodes, or neurons, in which various input values are calculated on. If the end value matches some condition, the neuron fires. Network topology refers to the structure of […]

Model Selection Schema

There are various model selection criteria in use for picking variables in linear regression. Some are applicable to other models outside of linear regression. Akaike’s Information Criterion – A useful criterion for indicating the amount of information contained within variables, and deciding whether to omit certain variables. AIC draws its justification from Information Theory. Coefficient […]

Hypothesis Testing

Hypothesis testing allows us to evaluate a hypothesis, or compare two hypotheses. I include it here as part of the theoretical framework necessary for validation of the statistical models offered. In hypothesis testing, there are two hypotheses. is the null hypothesis. This is generally the hypothesis we are trying to disprove. We attempt to mount […]