Tag Archives: Theory

Nested Data

Nested data is data for which a variable (or set of variables) signifies an observation as belonging to a group. We might refer to simple nesting (with 1 layer of groups) as categorical. Use of categorical data with linear models is called ANOVA. [i](Analysis of Variance)[/i] To illustrate, let’s start with the linear model in […]

Power and Sample size for Proportion Data

An often used method in applied statistics is determining the sample size necessary to view statistically significant results. Given the intended power, we can calculate the required sample size. Given the intended sample size, we can calculate the resulting power. Before we go in to how this works, we need to define a few things. […]

Power and Sample Size for Normally Distributed Data

An often used method in applied statistics is determining the sample size necessary to view statistically significant results. Given the intended power, we can calculate the required sample size. Given the intended sample size, we can calculate the resulting power. Before we go in to how this works, we need to define a few things. […]

Logistic Regression

Logistic Regression is a method of classification using the regression framework. In logistic regression, the output (or target, or dependent) variable is a binary variable, taking values of either . The predictor (or input, or independent) variables are not limited in this way, and can take any value. Logistic regression is based on the logistic […]

Decision Trees

Introduction to Tree Methods Terminology CART Methodology Grow a Large Initial Tree Binary Questions Goodness of Split Criterion Goodness of Split Measure Pruning the Tree Cost Complexity Measure Tree Size Selection Test Sample Method Cross-Validation Method v-Fold Cross-Validation Introduction to Tree Methods Tree methods are a supervised learning method. This means that there is a […]

Artificial Neural Network

Artificial Neural Networks are methods of classification and/or regression meant to emulate our belief about how a brain or nervous system functions. There exists a network of nodes, or neurons, in which various input values are calculated on. If the end value matches some condition, the neuron fires. Network topology refers to the structure of […]

Hypothesis Testing

Hypothesis testing allows us to evaluate a hypothesis, or compare two hypotheses. I include it here as part of the theoretical framework necessary for validation of the statistical models offered. In hypothesis testing, there are two hypotheses. is the null hypothesis. This is generally the hypothesis we are trying to disprove. We attempt to mount […]

Classification Systems

Statistical classification involves the use of various methods and metrics to discriminate outcome variables into their correct groups using input variables. The algorithms used to do this are called classification systems, or classifiers. There are various metrics we use to gauge the performance of our classification systems. If we are referring exclusively to binary outputs, […]

Multiple Linear Regression

Multiple Linear Regression (MLR) allows us to find a linear relationship between multiple input variables and a single dependent output variable. This tutorial is going to be given in matrix notation. For more on matrix notation, including the rules of matrix multiplication, I suggest visiting the wikipedia page on the subject here. We are again […]