# Category Archives: R

Applied statistical Techniques in R

# Linear Regression in R: Abalone Dataset

This tutorial will perform linear regression on a deceptively simple dataset. The abalone dataset from UCI Machine Learning Arvhives comes with the goal of attempting to predict abalone age (through the number of rings on the shell) given various descriptive attributes of the abalone (Shell sizes, weights of whole abalone and parts of shucked abalone). […]

# Pretty Graphs with ggplot2 (R)

The native graphics options in R are very powerful and useful for generating output. However, the packages available for R extend your capability far beyond what is natively available in R. The most commonly used package for non-native graphics is ggplot2. Getting Started with qplot – A brief introduction to qplot(), the training wheels plotting […]

# Logistic Regression (R)

Logistic Regression is a type of classification model. In classification models, we attempt to predict the outcome of categorical dependent variables, using one or more independent variables. The independent variables can be either categorical or numerical. Logistic regression is based on the logistic function, which always takes values between 0 and 1. Replacing the dependent […]

# Random Forests (R)

We will apply the random forest method to the Adult dataset here. We will begin by importing the data, doing some pre-filtering and combining into classes, and generating two subsets of the data: The training set, which we will be using to train the random Forest model, and the evaluation set, which we will use […]

# Classification Trees (R)

Classification trees are non-parametric methods to recursively partition the data into more “pure” nodes, based on splitting rules. See the guide on classification trees in the theory section for more information. Here, we’ll be using the rpart package in R to accomplish the classification task on the Adult dataset. We’ll begin by loading the dataset […]

# R – Functions

Functions are methods of isolating tasks so that they can be repetitively applied. They follow a basic structure. The Name of the function is the unique identifier of that function. Within R, we can create functions named whatever we want Inputs are variables passed into a function. We can pass any number of variables we […]

# R – Loops

A loop is a method of repeating the same action over and over. We will separate loops into 2 types: For loops, and While loops. For Loops are used to iterate between bounds. We declare them with an iterator variable, similar to mathematical notation (sums, products, etc.) In fact, we can set the bounds of […]

# R – Variables

R is a powerful and free statistical programming language. It runs on a wide variety of operating systems and architectures, and has a huge wealth of plugins made possible by its free nature and simple language structure. New ideas can be prototyped and pushed out very quickly, resulting in R always being on the forefront […]

# Neural Networks (R)

In this R tutorial, we are going to be training a decision tree on the “adult” dataset hosted on UCI’s machine learning repository. In this dataset, there are 15 variables, and 32561 observations. I have prepared a tutorial on how I cleaned and blocked the data to prepare it for model building. I will start […]