Theory

Theoretical Basics of Statistics
Here include some fundamental basics of statistics that will be necessary for inference.

  • Hypothesis Testing – Forms the basis of evaluating statistical significance. We use hypothesis testing in almost every method explored later on
  • Power and Sample Size – Calculating required sample size given power

Linear Regression
Linear regression is one of the more basic applied statistical techniques. In linear regression, we assume linear relationships between input variables and 1 or more output variables. Here we will explore the theory behind it.

Data Mining and Machine Learning Methods
Data Mining is a field of applied statistics where we extract relationships from data that may not have been apparent before. Machine learning is a cross-disciplinary field involving applied statistics, and many other disciplines. In Machine Learning, we also attempt to extract useful inferences about the data. The difference between the fields is subtle and subject to disagreement.

  • Decision Trees – Trees are non-parametric methods of classification. They can also be used for regression
  • Artificial Neural Networks – Neural Networks are methods of classification and/or regression meant to emulate our belief about how the neurons in our brain work
  • Ensemble Methods – Esemble Methods are methods of combining the output from a group of models into a single output. They create an ensemble, with the goal of being less wrong than any individual model
    • Bagging – Bagging is a method of creating an ensemble, involving using bootstrapped samples to generate the models

Evaluating Model Performance
Depending on the method in question, there are several criteria by which we could evaluate model performance. Here I am grouping them by method used.