# Category Archives: Tutorials

Tutorials and guides of how some statistical techniques work and how to apply them.

# Tutorial Papers Library (SAS)

Quick Results with PROC SQL (WUSS 2013) What’s Hot, What’s Not – Skills for SAS® Professionals (SAS Talks 2013) SAS® Programming Tips, Tricks and Techniques for Programmers (V1, 30-minutes) Point and Click Programming Using SAS® Enterprise Guide® (NESUG 2013) Hands-On SAS® Macro Programming Tips and Techniques (SCSUG 2013) Google® Search Tips and Techniques for SAS® […]

# Logistic Regression

Logistic Regression is a method of classification using the regression framework. In logistic regression, the output (or target, or dependent) variable is a binary variable, taking values of either . The predictor (or input, or independent) variables are not limited in this way, and can take any value. Logistic regression is based on the logistic […]

# Logistic Regression (R)

Logistic Regression is a type of classification model. In classification models, we attempt to predict the outcome of categorical dependent variables, using one or more independent variables. The independent variables can be either categorical or numerical. Logistic regression is based on the logistic function, which always takes values between 0 and 1. Replacing the dependent […]

# Bagging

Ensemble methods combine multiple classifiers into a single output. Some ensemble methods may combine different types of classifiers, but the ones we will focus on here combine multiple iterations of the same type of classifier. These methods belong to a family of ensemble methods called “Perturb and Combine”. Perturb and Combine Some methods of classification […]

# Random Forests (R)

We will apply the random forest method to the Adult dataset here. We will begin by importing the data, doing some pre-filtering and combining into classes, and generating two subsets of the data: The training set, which we will be using to train the random Forest model, and the evaluation set, which we will use […]

# Classification Trees (R)

Classification trees are non-parametric methods to recursively partition the data into more “pure” nodes, based on splitting rules. See the guide on classification trees in the theory section for more information. Here, we’ll be using the rpart package in R to accomplish the classification task on the Adult dataset. We’ll begin by loading the dataset […]

# R – Functions

Functions are methods of isolating tasks so that they can be repetitively applied. They follow a basic structure. The Name of the function is the unique identifier of that function. Within R, we can create functions named whatever we want Inputs are variables passed into a function. We can pass any number of variables we […]

# R – Loops

A loop is a method of repeating the same action over and over. We will separate loops into 2 types: For loops, and While loops. For Loops are used to iterate between bounds. We declare them with an iterator variable, similar to mathematical notation (sums, products, etc.) In fact, we can set the bounds of […]

# R – Variables

R is a powerful and free statistical programming language. It runs on a wide variety of operating systems and architectures, and has a huge wealth of plugins made possible by its free nature and simple language structure. New ideas can be prototyped and pushed out very quickly, resulting in R always being on the forefront […]

# Neural Networks (R)

In this R tutorial, we are going to be training a decision tree on the “adult” dataset hosted on UCI’s machine learning repository. In this dataset, there are 15 variables, and 32561 observations. I have prepared a tutorial on how I cleaned and blocked the data to prepare it for model building. I will start […]