Logistic Regression is a method of classification using the regression framework. In logistic regression, the output (or target, or dependent) variable is a binary variable, taking values of either . The predictor (or input, or independent) variables are not limited in this way, and can take any value.
Logistic regression is based on the logistic function, which always takes values between 0 and 1. Replacing the dependent variable of the logistic function with a linear combination of dependent variables we intend to use for regression, we arrive at the formula for logistic regression.
Where is the vector of independent variable observations for subject with a preceding 1 to match . is the vector of coefficients. We can interpret as the probability . I.e., the probability that the dependent variable is of class 1, given the independent variables.
To get an idea of what’s happening here, we’re going to introduce an idea called “odds”. Odds are a ratio of probabilities. Let’s say is the probability of an event occuring. is the probability of that event not occuring. Odds is defined as . IE, the ratio of probabilities of that event occuring over that event not occuring.
The Logit function is given
or, the log-odds ratio. The logit function has the interesting property that . So now, we declare the logit equal to the linear combination of covariates.
But now we’re solving for the . We can solve this using the least squared error solution from linear regression.
This solves for the that minimizes the squared error. We should specify that in Logistic Regression, the errors are assumed to be binomial.