Thursday, January 2, 2025

Multiclass Classification with Logistic Regression

Imagine your are in a candy store. There are three types of candies: chocolate, gummy bears, and lollipops. You want to teach a robot how to figure out what kind of candy it is looking at just by looking at its shape and color. This is a problem called mutliclass classification
Now, let's explain how we can use something called logistic regression ot help the robot decide. Don't worry about the fancy name-it's just a way of making choices based on some numbers!

What is Logistic Regression?

Think of logistic regression as a way of answering yes-or-no questions. For example, if the robot asks:
  • Is this candy chocolate? - it gets an answer that's a number between 0 and 1, like 0.8 (which means it's a 80% sure it's chocolate).
  • If the robot asks about gummy bears, it might get 0.1 which means only 10% sure.
But wait - we're dealing with three types of candy. So how can we handle more than one question at the time? That's where mutliclass logistic regression comes in. For more information about logistic regression and how it works please check Logistic Regression for Binary Classification.

How Does Multiclass Logistic Regression Work?

Instead of asking just one question, the robot asks three:
  • IS this candy chocolate?
  • Is this candy a gummy bear?
  • Is this candy a lollipop?
The robot looks at the answeres (let's call them probabilities) and picks the candy type with the highest probability. For example:
  • Chocolate - 0.7 (70%)
  • Gummy Bears - 0.2 (20%)
  • Lollipops - 0.1 (10%)
Since 0.7 is the biggest number, the robot decides it is chocolate.

The math behind the Multiclass Logistic Regression

Step 1: Features and weights
The robot uses the following parameters to caclulate the probabilities
  • \(x\) - Features of the candy (shape, color)
  • \(w\) - Weights corresponding to each feature, which determine their importance
  • \(b\) - a bias term to adjust the results.
  • \(e\) - Euler's number, a mathematical constant often used in probability and exponential calculations
  • \(K\) - the total number of candy types (e.g. 3)
The formula to calculate the score for each candy type \(j\) is: \begin{equation} z_j = \sum_i w_{j,i}x_i + b_j \end{equation} where:
  • \(z_j\) - the score for candy \(j\)
  • \(w_{j,i}\) - weight for feature \(i\) of candy type \(j\).
  • \(x_i\) - value of feature \(i\)
  • \(b_j\) - bias for candy type \(j\)
When the socre is calculated it is used in softmax function to calculate the probability. The softmax function can be written as: \begin{equation} P(y = j) = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \end{equation} where:
  • \(P(y=j)\) - the probability of the candy being type \(j\)
  • \(z_j\) - the score for candy \(j\).
  • \sum_{k=1}^K e^{z_k} - the sum of exponential scores for all K candy types, ensuring the probabilites sum to 1.

Example of multiclass logistic regression in Python

In this example we will train the Logistic regression on multiclass dataset. The dataset will also be created in this example and it is the Candy dataset.
The first step is to import the required libraries. We will need NumPy (to create the dataset), the LogisticRegression method from the sklearn.linear_model module, and the train_test_split method from the sklearn.model_selection module. Finally, we will use the classification_report method from the sklearn.metrics module.
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.moel_selection import train_test_split
from sklearn.metrics import classification_report
The second step is to create the candy data. The X variable will contain candy features (input variables) which are shape and color. The shape value 0 indicates a round candy while value 1 indicates square shaped candy. The color has three values where 0 idnicates red candy, 1 brown candy, and 2 yellow candy. So frist column in the dataset is shape an the second is color.
# Features: shape and color
X = np.array([
[0, 0], # Red round candy
[1, 1], # Brown square candy
[0, 2], # Yellow round candy
[1, 0], # Red square candy
[0, 1], # Brown round candy
[1, 2] # Yellow square candy
])
The labels y (target variable) contains three values where 0 is for chocolate, 1 for gummy bears and 2 for lollipops.
y = np.array([0, 0, 2, 1, 0, 2])
Now that dataset is defined you can split the dataset on train and test datasets using train_test_split method. The dataset (X,y) will be divided on train and test dataset in 70:30 ratio and to do that in train_test_split function we will set the test_size paramter to 0.3. We will also define the random_state = 42 to shuffle the data before splitting.
After splitting the train data (X_train, y_train) will be used to train the LgoisticRegression algorithm. The test dataset (X_test, y_test) will be used to test the trained model.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train the model
model = LogisticRegression(multi_class='multinomial', solver='lbfgs')
model.fit(X_train, y_train)
If you execute the code written so far nothing will happen. To show some results we will need to test the model. To do that we will use the built in function predict() using which the trained model will predict the output based on the provided input. The output will be stored under variable name y_predict. This variable will be used in the classification_report function alongside the y_test values to measure the performance of the trained model on unseen data.
y_pred = model.predict(X_test)
# Print results
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=["Chocolate", "Gummy Bears", "Lollipops"]))
The classification output is given below.
Classification Report:
            precision    recall  f1-score   support

 Chocolate       0.00      0.00      0.00       2.0
Gummy Bears       0.00      0.00      0.00       0.0
 Lollipops       0.00      0.00      0.00       0.0

  accuracy                           0.00       2.0
 macro avg       0.00      0.00      0.00       2.0
weighted avg       0.00      0.00      0.00       2.0
The resuls are all 0 since the test dataset contains only two samples that belong to class 0.
Finally we will give the robot new candy to classify. In this case we will define new candy sample with round shape and brown color (0,1).
new_candy = np.array([[0, 1]])  # Brown round candy
prediction = model.predict(new_candy)
print("Predicted Candy:", prediction)
The output of the previous code block is given below.
Predicted Candy: [2]
So the logistic regression predicted that brown round candy actually belongs to class 2. However, the true value should be equal to 0 since in the initial dataset the same samples has the label 0 i.e. it belongs to class 0.

No comments:

Post a Comment