What is Logistic Regression?
In statistics, the logistic model (or logit model) is used to model the probability of a certain class or event existing, such as pass/fail, win/lose, alive/dead, or healthy/sick. This can be extended to model several classes of events, such as determining whether an image contains a cat, dog, lion, etc. Each object being detected in the image would be assigned a probability between 0 and 1, with the sum adding to one.
Despite its name containing “regression,” logistic regression is primarily a classification algorithm, not a regression algorithm in the traditional sense.
The Logistic Function (Sigmoid Function)
At the heart of logistic regression is the sigmoid function, which maps any real-valued number to a value between 0 and 1:
$$ \sigma(z) = \frac{1}{1 + e^{-z}} $$
Where $z = w^T x + b$ is a linear combination of input features $x$, weights $w$, and bias $b$.
Visualization
The sigmoid function has an S-shaped curve:
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(z):
return 1 / (1 + np.exp(-z))
z = np.linspace(-10, 10, 100)
plt.plot(z, sigmoid(z))
plt.xlabel('z')
plt.ylabel('σ(z)')
plt.title('Sigmoid Function')
plt.grid(True)
plt.show()
- When $z \to \infty$, $\sigma(z) \to 1$
- When $z \to -\infty$, $\sigma(z) \to 0$
- When $z = 0$, $\sigma(z) = 0.5$
Binary Logistic Regression
For binary classification (two classes: 0 and 1), logistic regression predicts the probability that an input belongs to class 1:
$$ P(y=1 | x) = \sigma(w^T x + b) = \frac{1}{1 + e^{-(w^T x + b)}} $$
Decision Boundary
We classify based on a threshold (typically 0.5):
- If $P(y=1 | x) \geq 0.5$, predict class 1
- If $P(y=1 | x) < 0.5$, predict class 0
Cost Function
Logistic regression uses the log loss (binary cross-entropy) as its cost function:
$$ J(w, b) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)}) \right] $$
Where:
- $m$ is the number of training examples
- $y^{(i)}$ is the true label (0 or 1)
- $\hat{y}^{(i)} = \sigma(w^T x^{(i)} + b)$ is the predicted probability
Training with Gradient Descent
We minimize the cost function using gradient descent:
$$ w := w - \alpha \frac{\partial J}{\partial w} $$
$$ b := b - \alpha \frac{\partial J}{\partial b} $$
Where $\alpha$ is the learning rate.
Multi-Class Logistic Regression (Softmax Regression)
For multi-class classification (more than 2 classes), we use the softmax function:
$$ P(y=k | x) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}} $$
Where $K$ is the number of classes, and $z_k = w_k^T x + b_k$ for class $k$.
The softmax ensures all probabilities sum to 1:
$$ \sum_{k=1}^{K} P(y=k | x) = 1 $$
Example: Image Classification
For classifying images into cat, dog, or lion:
import numpy as np
# Example logits (raw model outputs)
logits = np.array([2.0, 1.0, 0.1]) # Cat, Dog, Lion
# Softmax function
def softmax(z):
exp_z = np.exp(z - np.max(z)) # For numerical stability
return exp_z / exp_z.sum()
probabilities = softmax(logits)
print("Cat:", probabilities[0]) # ~0.659
print("Dog:", probabilities[1]) # ~0.242
print("Lion:", probabilities[2]) # ~0.099
Implementation Example
Here’s a simple binary logistic regression from scratch:
import numpy as np
class LogisticRegression:
def __init__(self, learning_rate=0.01, iterations=1000):
self.lr = learning_rate
self.iterations = iterations
self.weights = None
self.bias = None
def sigmoid(self, z):
return 1 / (1 + np.exp(-z))
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
for _ in range(self.iterations):
# Forward pass
linear_pred = np.dot(X, self.weights) + self.bias
predictions = self.sigmoid(linear_pred)
# Compute gradients
dw = (1 / n_samples) * np.dot(X.T, (predictions - y))
db = (1 / n_samples) * np.sum(predictions - y)
# Update parameters
self.weights -= self.lr * dw
self.bias -= self.lr * db
def predict(self, X):
linear_pred = np.dot(X, self.weights) + self.bias
y_pred = self.sigmoid(linear_pred)
return [1 if i > 0.5 else 0 for i in y_pred]
# Example usage
X_train = np.array([[1, 2], [2, 3], [3, 4], [4, 5]])
y_train = np.array([0, 0, 1, 1])
model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_train)
print(predictions) # [0, 0, 1, 1]
Using Scikit-Learn
For practical applications, use scikit-learn:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load data
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=42
)
# Train model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
# Evaluate
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2f}")
Advantages and Disadvantages
Advantages
- Simple and interpretable
- Works well for linearly separable data
- Outputs probabilities, not just class labels
- Less prone to overfitting with regularization
Disadvantages
- Assumes linear relationship between features and log-odds
- Doesn’t work well with non-linear decision boundaries (without feature engineering)
- Sensitive to outliers
When to Use Logistic Regression
- Binary or multi-class classification tasks
- When you need probability estimates
- As a baseline model before trying more complex algorithms
- When interpretability is important
Key Takeaways
- Logistic regression models probabilities using the sigmoid (binary) or softmax (multi-class) function.
- It’s a linear classifier optimized using gradient descent.
- Despite the name, it’s used for classification, not regression.
- It forms the foundation for neural networks (a single-layer neural network with sigmoid activation is logistic regression).
For non-linear problems, consider using kernel methods, decision trees, or neural networks.