Category Archives: Tutorials

Tutorials and guides of how some statistical techniques work and how to apply them.

Nested Data

Nested data is data for which a variable (or set of variables) signifies an observation as belonging to a group. We might refer to simple nesting (with 1 layer of groups) as categorical. Use of categorical data with linear models is called ANOVA. [i](Analysis of Variance)[/i] To illustrate, let’s start with the linear model in […]

Linear Regression in R: Abalone Dataset

This tutorial will perform linear regression on a deceptively simple dataset. The abalone dataset from UCI Machine Learning Arvhives comes with the goal of attempting to predict abalone age (through the number of rings on the shell) given various descriptive attributes of the abalone (Shell sizes, weights of whole abalone and parts of shucked abalone). […]

Power and Sample size for Proportion Data

An often used method in applied statistics is determining the sample size necessary to view statistically significant results. Given the intended power, we can calculate the required sample size. Given the intended sample size, we can calculate the resulting power. Before we go in to how this works, we need to define a few things. […]

Power and Sample Size for Normally Distributed Data

An often used method in applied statistics is determining the sample size necessary to view statistically significant results. Given the intended power, we can calculate the required sample size. Given the intended sample size, we can calculate the resulting power. Before we go in to how this works, we need to define a few things. […]

Pretty Graphs with ggplot2 (R)

The native graphics options in R are very powerful and useful for generating output. However, the packages available for R extend your capability far beyond what is natively available in R. The most commonly used package for non-native graphics is ggplot2. Getting Started with qplot – A brief introduction to qplot(), the training wheels plotting […]

Proc SQL – SAS

SQL stands for “Structured Query Language”. It’s a language created for interacting with relational databases. SAS implements a version of this inside proc sql. This allows us to leverage the power of SQL to solve our problems. proc sql; create table out.projdata2 as select a,b,c,d, /* vars from source */ a^2 as aa, /* New […]

Data Processing in SAS

Libraries Libraries are directories where your datasets are stored. It is a good idea to declare a local library where you want store the dataset after you’re done processing it. They can be declared with the libname function. Here i declare the library out. Any datasets stored in this library can be accessed by preceding […]

Multiple Linear Regression (SAS)

In this tutorial, we will be attempting linear regression and variable selection using the cirrhosis dataset. We attempt to predict incidence of cirrhosis on a population using a few descriptor variables from that population. The variables we have are: urbanpop – The size of the urban population lowbirth – the reciprocal of the number of […]

Simple Linear Regression (SAS)

In this tutorial we will conduct simple linear regression on a dataset on an example dataset. In the dataset there are 62 individuals, and we will be regressing brain weight over body weight. As stated before, in simple linear regression we are trying to find a linear relationship between the dependent and independent variable, that […]