Category Archives: Tutorials

Tutorials and guides of how some statistical techniques work and how to apply them.

Decision Trees

Introduction to Tree Methods Terminology CART Methodology Grow a Large Initial Tree Binary Questions Goodness of Split Criterion Goodness of Split Measure Pruning the Tree Cost Complexity Measure Tree Size Selection Test Sample Method Cross-Validation Method v-Fold Cross-Validation Introduction to Tree Methods Tree methods are a supervised learning method. This means that there is a […]

Artificial Neural Network

Artificial Neural Networks are methods of classification and/or regression meant to emulate our belief about how a brain or nervous system functions. There exists a network of nodes, or neurons, in which various input values are calculated on. If the end value matches some condition, the neuron fires. Network topology refers to the structure of […]

Multiple Linear Regression (R)

In this tutorial, we are going to be walking through multiple linear regression, and we are going to recognize that it’s not a lot different than simple linear regression. The model in multiple linear regression looks similar to that in simple regression. We’re adding more coefficients to modify the additional independent variables.     We […]

Model Selection Schema

There are various model selection criteria in use for picking variables in linear regression. Some are applicable to other models outside of linear regression. Akaike’s Information Criterion – A useful criterion for indicating the amount of information contained within variables, and deciding whether to omit certain variables. AIC draws its justification from Information Theory. Coefficient […]

Simple Linear Regression (R)

In this tutorial we will conduct simple linear regression on a dataset on an example dataset. In the dataset there are 62 individuals, and we will be regressing brain weight over body weight. As stated before, in simple linear regression we are trying to find a linear relationship between the dependent and independent variable, that […]

Hypothesis Testing

Hypothesis testing allows us to evaluate a hypothesis, or compare two hypotheses. I include it here as part of the theoretical framework necessary for validation of the statistical models offered. In hypothesis testing, there are two hypotheses. is the null hypothesis. This is generally the hypothesis we are trying to disprove. We attempt to mount […]

Classification Systems

Statistical classification involves the use of various methods and metrics to discriminate outcome variables into their correct groups using input variables. The algorithms used to do this are called classification systems, or classifiers. There are various metrics we use to gauge the performance of our classification systems. If we are referring exclusively to binary outputs, […]

Rstudio, Sweave, and LaTeX

(pronounced Lay-tech, the X is supposed to be a capital ) is a typesetting environment used to generate a professional looking output, and automate some of the more tedious tasks of writing a paper. It is the de-facto standard for writing scientific papers, and it is a good idea to learn it if you intend […]

R Environment

R is one of the more commonly used statistical analysis software. The ease with which methods can be prototyped and brought to production makes it very popular for research. The fact that it is open source, and free to use also contributes to its appeal. The official manual for R is available here. I recommend […]