Tag Archives: Data Mining

Bagging

Ensemble methods combine multiple classifiers into a single output. Some ensemble methods may combine different types of classifiers, but the ones we will focus on here combine multiple iterations of the same type of classifier. These methods belong to a family of ensemble methods called “Perturb and Combine”. Perturb and Combine Some methods of classification […]

Random Forests (R)

We will apply the random forest method to the Adult dataset here. We will begin by importing the data, doing some pre-filtering and combining into classes, and generating two subsets of the data: The training set, which we will be using to train the random Forest model, and the evaluation set, which we will use […]

Classification Trees (R)

Classification trees are non-parametric methods to recursively partition the data into more “pure” nodes, based on splitting rules. See the guide on classification trees in the theory section for more information. Here, we’ll be using the rpart package in R to accomplish the classification task on the Adult dataset. We’ll begin by loading the dataset […]

Decision Trees

Introduction to Tree Methods Terminology CART Methodology Grow a Large Initial Tree Binary Questions Goodness of Split Criterion Goodness of Split Measure Pruning the Tree Cost Complexity Measure Tree Size Selection Test Sample Method Cross-Validation Method v-Fold Cross-Validation Introduction to Tree Methods Tree methods are a supervised learning method. This means that there is a […]

Artificial Neural Network

Artificial Neural Networks are methods of classification and/or regression meant to emulate our belief about how a brain or nervous system functions. There exists a network of nodes, or neurons, in which various input values are calculated on. If the end value matches some condition, the neuron fires. Network topology refers to the structure of […]

Classification Systems

Statistical classification involves the use of various methods and metrics to discriminate outcome variables into their correct groups using input variables. The algorithms used to do this are called classification systems, or classifiers. There are various metrics we use to gauge the performance of our classification systems. If we are referring exclusively to binary outputs, […]